RE: XPathentity processor on CLOB field
I got this working - the errors were due to a mistake in letter case - was using 'datasource' instead of 'dataSource' in the entity that was using XpathEntityProcessor. Hence this was being ignored and was inheriting the JDBC Datasource of the parent entity. I am pasting the complete data-config for anyone encountering the same problem. dataConfig dataSource name=xmldata type=FieldReaderDataSource/ dataSource name=mbdev driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@localhost:1521:orcl user=orcl password=orcl/ document name=insight entity name=input query=select * from test logLevel=debug dataSource=mbdev transformer=ClobTransformer onError=skip field column=LOAD_DATE name=load_date / field column=RESPONSE_XML name=RESPONSE_XML clob=true / field column=id name=id/ entity name=catReport dataSource=xmldata dataField=input.RESPONSE_XML processor=XPathEntityProcessor forEach=/DecisionServiceRs rootEntity=true logLevel=debug field column=event xpath=/DecisionServiceRs/@event/ field column=policyNumber xpath=/DecisionServiceRs/@policyNumber/ /entity /entity /document /dataConfig -Original Message- From: Pattabiraman, Meenakshisundaram [mailto:pattabiraman.meenakshisunda...@aig.com] Sent: Thursday, June 18, 2015 7:51 AM To: solr-user@lucene.apache.org Subject: RE: XPathentity processor on CLOB field This is the error cause reported. I also see that it has been reported earlier (http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201103.mbox/%3cd0f0d26c-3ac0-4982-9e2b-09dc96937...@535consulting.com%3E) but could not find a solution. I am nesting the FieldReaderDataSource within the Entity definition that has a CLOB field. With this it fails only after transforming the clob. If I do not nest, I get this error when the FieldReaderDataSource is initialized thus failing even before the SQL is executed. Either case, the error is happening at the same place. Caused by: java.sql.SQLException: SQL statement to execute cannot be empty or null at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70) at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:112) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:173) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:229) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:403) at oracle.jdbc.driver.OracleSql.initialize(OracleSql.java:110) at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1761) at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1739) at oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:298) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:314) ... 14 more Pattabi Meenakshisundaram -Original Message- From: Pattabiraman, Meenakshisundaram [mailto:pattabiraman.meenakshisunda...@aig.com] Sent: Wednesday, June 17, 2015 9:33 PM To: 'solr-user@lucene.apache.org' Subject: XPathentity processor on CLOB field My requirement is to read the XML from a CLOB field and parse it to get the entity. The data config is as shown below. I am trying to map two fields 'event' and 'policyNumber' for the entity 'catreport'. dataSource name=mbdev driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@localhost:1521:orcl user=xyz password=xyz/ document name=insight entity name=input query=select * from test logLevel=debug datasource=mbdev transformer=ClobTransformer, script:toDate field column=LOAD_DATE name=load_date / field column=RESPONSE_XML name=RESPONSE_XML clob=true / dataSource name=xmldata type=FieldReaderDataSource/ entity name=catReport dataSource=xmldata dataField=input.RESPONSE_XML processor=XPathEntityProcessor forEach=/*:DecisionServiceRs rootEntity=true logLevel=debug field column=event xpath=/dec:DecisionServiceRs/@event/ I am getting this error Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: null Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:321) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:278) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:53) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283) at
Re: Duplicate suggestions
I had the very same issue, because I had some document with a redundant field, and I was using the Infix Suggester as well. Because the Infix Suggester returns the whole field content, if you have duplicated fields across your docs, you will se duplicate suggestions. Do you have any intermediate API in your application ? In the case you can modify the API using a Collection that prevent duplicates to contain and return the suggestions. In the case you want it directly from Solr I assume it is a bug . I think the suggestions should return by default no duplicates ( because the only information returned is the field value and not the document id. Anyway could be a nice parameter to get better suggestions ( sending the avoidDuplicate parameter to the suggester 0. Cheers 2015-06-18 10:48 GMT+01:00 jon kerling jonkerl...@yahoo.com.invalid: Hi, I am using solr 5.1. I'm getting duplicate suggestions when using my solrsuggester. I'm using AnalyzingInfixLookupFactory DocumentDictionaryFactory. can i configure it to suggest me only different suggestions? here are details about my configuration: from schema.xml:searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namemySuggester1a/str str name=lookupImplAnalyzingInfixLookupFactory/str str name=indexPathsuggester_infix_dir1a/str str name=allTermsRequiredtrue/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldf1/str str name=weightFieldweightField/str str name=suggestAnalyzerFieldTypetext_general/str str name=buildOnStartupfalse/str /lst lst name=suggester str name=namemySuggester2a/str str name=lookupImplAnalyzingInfixLookupFactory/str str name=indexPathsuggester_infix_dir2a/str str name=allTermsRequiredtrue/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldf2/str str name=weightFieldweightField/str str name=suggestAnalyzerFieldTypetext_general/str str name=buildOnStartupfalse/str /lst /searchComponent requestHandler name=/suggest class=solr.SearchHandler startup=lazy lst name=defaults str name=suggesttrue/str str name=suggest.count6/str str name=suggest.dictionarymySuggester1a/str str name=suggest.dictionarymySuggester2a/str /lst arr name=components strsuggest/str /arr /requestHandler from schema.xml:field name=f1 type=string indexed=true stored=true required=false multiValued=false / field name=f2 type=string indexed=true stored=true required=false multiValued=false /Field name=weightField type=float indexed=true stored=true/ ** weightField is ignored by me, I'm not adding any values in it at all. document example:docstr name=f12015-04-01/strstr name=f212:06:00/strstr name=f3BOOO/strstr name=f4/str name=f57.52.11.212/strstr name=f67.52.11.213/strstr name=OID52358424/str/doc After i build the suggester I'm trying to get suggests like here: http://localhost/solr/core1/suggest?/suggest=truesuggest.q=12 ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime62/int /lst lst name=suggest lst name=mySuggester2a lst name=12 int name=numFound6/int arr name=suggestions lst str name=term18:34:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=term18:34:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=term18:35:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=term18:35:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=term18:35:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=termlt;bgt;12lt;/bgt;:06:02/str long name=weight0/long str name=payload / /lst /arr /lst /lst lst name=mySuggester1a lst name=12 int name=numFound0/int arr name=suggestions / /lst /lst /lst /response I would like to get this kind of suggester response ( no duplicates ): ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime62/int /lst lst name=suggest lst name=mySuggester2a lst
Solr 4.10.4: Could not create instance of 'SolrInputDocument'
Our web site is created using PaperThin's CommonSpot CMS in a ColdFusion 10 and Windows Server 2008 R2 environment, using Apache Solr 4.10.4 instead of CF Solr. We create collections through the CMS interface and they do appear in both the CMS and the Solr dashboard when created. However, when we try indexing our collections through the CMS interface, our CMS error logs show the entry 'Could not create instance of 'SolrInputDocument'' for each member of the collection. This is not a fatal error, as the indexing appears to cycle through all members, but each member errors out with log entries for each member. I've Googled this error message without success. What might this error message indicate please?? Paul
Help: Problem in customized token filter
Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* *import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* *public class ConcatenateWordsFilter extends TokenFilter {* * private CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class);* * private OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);* * PositionIncrementAttribute posIncr = addAttribute(PositionIncrementAttribute.class);* * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* * private StringBuilder stringBuilder = new StringBuilder();* * private boolean exhausted = false;* * /*** * * Creates a new ConcatenateWordsFilter* * * @param input TokenStream that will be filtered* * */* * public ConcatenateWordsFilter(TokenStream input) {* *super(input);* * }* * /*** * * {@inheritDoc}* * */* * @Override* * public final boolean incrementToken() throws IOException {* *while (!exhausted input.incrementToken()) {* * char terms[] = charTermAttribute.buffer();* * int termLength = charTermAttribute.length();* * if(typeAtrr.type().equals(ALPHANUM)){* * stringBuilder.append(terms, 0, termLength);* * }* * charTermAttribute.copyBuffer(terms, 0, termLength);* * return true;* *}* *if (!exhausted) {* * exhausted = true;* * String sb = stringBuilder.toString();* * System.err.println(The Data got is +sb);* * int sbLength = sb.length();* * //posIncr.setPositionIncrement(0);* * charTermAttribute.copyBuffer(sb.toCharArray(), 0, sbLength);* * offsetAttribute.setOffset(offsetAttribute.startOffset(), offsetAttribute.startOffset()+sbLength);* * stringBuilder.setLength(0);* * //typeAtrr.setType(CONCATENATED);* * return true;* *}* *return false;* * }* *}* With Regards Aman Tandon
RE: XPathentity processor on CLOB field
This is the error cause reported. I also see that it has been reported earlier (http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201103.mbox/%3cd0f0d26c-3ac0-4982-9e2b-09dc96937...@535consulting.com%3E) but could not find a solution. I am nesting the FieldReaderDataSource within the Entity definition that has a CLOB field. With this it fails only after transforming the clob. If I do not nest, I get this error when the FieldReaderDataSource is initialized thus failing even before the SQL is executed. Either case, the error is happening at the same place. Caused by: java.sql.SQLException: SQL statement to execute cannot be empty or null at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70) at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:112) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:173) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:229) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:403) at oracle.jdbc.driver.OracleSql.initialize(OracleSql.java:110) at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1761) at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1739) at oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:298) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:314) ... 14 more Pattabi Meenakshisundaram -Original Message- From: Pattabiraman, Meenakshisundaram [mailto:pattabiraman.meenakshisunda...@aig.com] Sent: Wednesday, June 17, 2015 9:33 PM To: 'solr-user@lucene.apache.org' Subject: XPathentity processor on CLOB field My requirement is to read the XML from a CLOB field and parse it to get the entity. The data config is as shown below. I am trying to map two fields 'event' and 'policyNumber' for the entity 'catreport'. dataSource name=mbdev driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@localhost:1521:orcl user=xyz password=xyz/ document name=insight entity name=input query=select * from test logLevel=debug datasource=mbdev transformer=ClobTransformer, script:toDate field column=LOAD_DATE name=load_date / field column=RESPONSE_XML name=RESPONSE_XML clob=true / dataSource name=xmldata type=FieldReaderDataSource/ entity name=catReport dataSource=xmldata dataField=input.RESPONSE_XML processor=XPathEntityProcessor forEach=/*:DecisionServiceRs rootEntity=true logLevel=debug field column=event xpath=/dec:DecisionServiceRs/@event/ I am getting this error Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: null Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:321) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:278) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:53) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224) I see that the Clob is getting converted to String correctly and the log has this entry where xml is printed Exception while processing: input document : SolrInputDocument(fields: [RESPONSE_XML=dec:Deci I do not know why the error is thrown at Jdbc when the Clob is converted to string and passed to the FieldReader and do not know how to make this work. Thanks Pattabi
Re: Suggester for text array
Hi Advait , First of all I suggest you to study Solr a little bit [1]. because your requirements are actually really simple : 1) You can simply use more than one suggest dictionary if you care to keep the suggestions separated ( keeping if a term is coming from the name or from the the category) if you don't care to keep them separated, simply use a copy field to copy both the fields in. 2) Solr supports multi valued fields since the beginning. I really suggest you to split by comma in your indexer application, providing to Solr the multi values already separated. Because they are multi values for the category field ( so it's nor analysis responsibility to split them) Cheers [1] https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide 2015-06-18 13:43 GMT+01:00 Advait Suhas Pandit adv...@retailwave.com: Hi, We run an ecommerce company and would like to use SOLR for our product database searches. We have products along with the categories that they belong to. In case the product belongs to more than 1 category, we have a comma separated field of categories. How do we do auto complete on - 1. Multiple fields - product name, category 2. On categories which are not first in the list in the case of the comma separated values E.g. If a product belongs to Hair Care Products, Personal Care Products how do we ensure that the suggester will even suggest if someone starts typing in Personal Care. Also, how do we show only Personal Care in the auto complete and not as Hair Care Products, Personal Care Products. Thanks, Advait -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Suggester for text array
Hi, We run an ecommerce company and would like to use SOLR for our product database searches. We have products along with the categories that they belong to. In case the product belongs to more than 1 category, we have a comma separated field of categories. How do we do auto complete on - 1. Multiple fields - product name, category 2. On categories which are not first in the list in the case of the comma separated values E.g. If a product belongs to Hair Care Products, Personal Care Products how do we ensure that the suggester will even suggest if someone starts typing in Personal Care. Also, how do we show only Personal Care in the auto complete and not as Hair Care Products, Personal Care Products. Thanks, Advait
Re: Solr 5.2.1 on Solaris
On 6/18/2015 8:05 AM, Bence Vass wrote: Is there any documentation on how to start Solr 5.2.1 on Solaris (Solaris 10)? The script (solr start) doesn't work out of the box, is anyone running Solaris 5.x on Solaris? I think the biggest problem on Solaris will be the options used on the ps command. The ps usage in the solr script appears to be formulated for the version of ps found on Linux and other free UNIX-like operating systems, and I know from experience that those options don't work on Solaris. The solr script also uses lsof, which I don't think is normally installed on Solaris. I'm not sure whether lsof is actually required, or if the script will work without it. I won't have time right away, but I will be able to look into this at some point in the next few days and come up with a patch to make the script work on Solaris. If anybody else has the time and skill to do so immediately, feel free to step in. Thanks, Shawn
Re: Help: Problem in customized token filter
Please help, what wrong I am doing here. please guide me. With Regards Aman Tandon On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* *import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* *public class ConcatenateWordsFilter extends TokenFilter {* * private CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class);* * private OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);* * PositionIncrementAttribute posIncr = addAttribute(PositionIncrementAttribute.class);* * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* * private StringBuilder stringBuilder = new StringBuilder();* * private boolean exhausted = false;* * /*** * * Creates a new ConcatenateWordsFilter* * * @param input TokenStream that will be filtered* * */* * public ConcatenateWordsFilter(TokenStream input) {* *super(input);* * }* * /*** * * {@inheritDoc}* * */* * @Override* * public final boolean incrementToken() throws IOException {* *while (!exhausted input.incrementToken()) {* * char terms[] = charTermAttribute.buffer();* * int termLength = charTermAttribute.length();* * if(typeAtrr.type().equals(ALPHANUM)){* * stringBuilder.append(terms, 0, termLength);* * }* * charTermAttribute.copyBuffer(terms, 0, termLength);* * return true;* *}* *if (!exhausted) {* * exhausted = true;* * String sb = stringBuilder.toString();* * System.err.println(The Data got is +sb);* * int sbLength = sb.length();* * //posIncr.setPositionIncrement(0);* * charTermAttribute.copyBuffer(sb.toCharArray(), 0, sbLength);* * offsetAttribute.setOffset(offsetAttribute.startOffset(), offsetAttribute.startOffset()+sbLength);* * stringBuilder.setLength(0);* * //typeAtrr.setType(CONCATENATED);* * return true;* *}* *return false;* * }* *}* With Regards Aman Tandon
Solr 5.2.1 on Solaris
Hello, Is there any documentation on how to start Solr 5.2.1 on Solaris (Solaris 10)? The script (solr start) doesn't work out of the box, is anyone running Solaris 5.x on Solaris? - Thanks
Re: Error when submitting PDF to Solr w/text fields using SolrJ
We would like more information, but the first thing I notice is that hardly would make any sense to use a string type for a file content. Can you give more details about the exception ? Have you debugged a little bit ? How does the solr input document look before it is sent to Solr ? Furthermore please give us all the stack trace. THe message you post is almost useless without all the details ... 2015-06-18 15:39 GMT+01:00 Paden rumsey...@gmail.com: Hello, I'm using Solr to pull information from a Database and a file system simultaneously. The database houses the file path of the file in the file system. It pulls all of those just fine. In fact, it combines the metadata from the database and the metadata from the file system great. The problem occurs when I try to index the text. The error does not occur at the point when it tries to add the field text to the document. The error occurs when I try to submit that document to Solr. It gives me this error, org.apache.solr.common.SolrException: Exception writing document id /some/filepath to the index; possible analysis error. This is how the field is defined in schema: field name=text type=string indexed=true stored=false required=false multiValued=true / and this is the code I use to add it to the document: File file = new File(filepath); ContentHandler textHandler = new BodyContentHandler(); Metadata metadata = new Metadata(); ParseContext context = new ParseContext(); Input Stream = new FileInputStream(file); try{ autoParser.parse(input, textHandler, metadata, context); } catch (Exception e) { //prints out error message continue; } if(textHandler != null){ doc.addField(text,textHandler.toString()); } try{ server.add(doc); } catch (Exception ex){ //logmessage continue; } I think it has something to do with how the field is defined in schema but I don't know. All the files that get error messages are PDF's if that helps. There are .doc s in the file system but they don't error out. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Solr Logging
Hi, I want to log Solr search queries/response time and Solr indexing log separately in different set of log files. Is there any convenient framework/way to do it. Thanks Bharath -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Logging-tp4212730.html Sent from the Solr - User mailing list archive at Nabble.com.
Managed schema and schema.xml file
Hi everyone, I just upgraded from 5.1.0 to 5.2.1 and noticed a behavior change which I consider a bug. In my solrconfig.xml, I have the following: !-- schemaFactory class=ClassicIndexSchemaFactory/ -- schemaFactory class=ManagedIndexSchemaFactory bool name=mutabletrue/bool str name=managedSchemaResourceNamemy-schema.xml/str /schemaFactory In 5.1.0 (and maybe prior ver.?) when I enable managed schema per the above, the existing schema.xml file is left as-is, a copy of it is created as schema.xml.bak and a new one is created based on the name I gave it my-schema.xml. With 5.2.1 schema.xml is renamed to schema.xml.bak and my-schema.xml is created (e.g.: schema.xml is deleted). Is this an expected behavior or is this a bug? I see it as a bug because if I revert the change I made in my solrconfig.xml back to (i.e.: not managed schema any more): schemaFactory class=ClassicIndexSchemaFactory/ Solr will not restart because it cannot find schema.xml Thanks Steve
Error when submitting PDF to Solr w/text fields using SolrJ
Hello, I'm using Solr to pull information from a Database and a file system simultaneously. The database houses the file path of the file in the file system. It pulls all of those just fine. In fact, it combines the metadata from the database and the metadata from the file system great. The problem occurs when I try to index the text. The error does not occur at the point when it tries to add the field text to the document. The error occurs when I try to submit that document to Solr. It gives me this error, org.apache.solr.common.SolrException: Exception writing document id /some/filepath to the index; possible analysis error. This is how the field is defined in schema: field name=text type=string indexed=true stored=false required=false multiValued=true / and this is the code I use to add it to the document: File file = new File(filepath); ContentHandler textHandler = new BodyContentHandler(); Metadata metadata = new Metadata(); ParseContext context = new ParseContext(); Input Stream = new FileInputStream(file); try{ autoParser.parse(input, textHandler, metadata, context); } catch (Exception e) { //prints out error message continue; } if(textHandler != null){ doc.addField(text,textHandler.toString()); } try{ server.add(doc); } catch (Exception ex){ //logmessage continue; } I think it has something to do with how the field is defined in schema but I don't know. All the files that get error messages are PDF's if that helps. There are .doc s in the file system but they don't error out. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Managed schema and schema.xml file
On 6/18/2015 8:10 AM, Steven White wrote: In 5.1.0 (and maybe prior ver.?) when I enable managed schema per the above, the existing schema.xml file is left as-is, a copy of it is created as schema.xml.bak and a new one is created based on the name I gave it my-schema.xml. With 5.2.1 schema.xml is renamed to schema.xml.bak and my-schema.xml is created (e.g.: schema.xml is deleted). Is this an expected behavior or is this a bug? I see it as a bug because if I revert the change I made in my solrconfig.xml back to (i.e.: not managed schema any more): schemaFactory class=ClassicIndexSchemaFactory/ Solr will not restart because it cannot find schema.xml As I understand it, the managed schema system will complain if it sees a file named schema.xml -- having both the managed schema file and schema.xml is confusing, so if the classic file exists, it's an error. Because of that, if you switch your config from managed to classic schema, you must also create the schema.xml file (or rename the managed version). Neither factory is aware of the other, so there's no automated way to handle that. Thanks, Shawn
Re: Dedupe in a SolrCloud
Thanks :) exactly what I was looking for...as I only need to create the signature once this works perfect for me:) Cheers, Markus Sent from my iPhone On 17.06.2015, at 20:32, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Comments inline: On Wed, Jun 17, 2015 at 3:18 PM, Markus.Mirsberger markus.mirsber...@gmx.de wrote: Hi, I am trying to use the dedupe feature to detect and mark near duplicate content in my collections. I dont want to prevent duplicate content. I woud like to detect it and keep it for further processing. Thats why Im using an extra field and not the documents unique field. Here is how I added it to the solrConfig.xml : requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainfill_signature/str /lst /requestHandler updateRequestProcessorChain name=fill_signature processor=signature processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain updateProcessor class=solr.processor.SignatureUpdateProcessorFactory name=signature bool name=enabledtrue/bool str name=signatureFieldsignature/str bool name=overwriteDupesfalse/bool str name=fieldscontent/str str name=signatureClasssolr.processor.TextProfileSignature/str str name=quantRate.2/str str name=minTokenLen3/str /updateProcessor When I initially add the documents to the cloud everything works as expected . the documents are added and the signature will be created and added.perfect:) The problem occours when I want to update an exisiting document. In that case the update.chain=fill_signature parameter will of course be set too and I get a bad request error. I found this solr issue: https://issues.apache.org/jira/browse/SOLR-3473 Is it that problem I am running into? You haven't pasted the complete error response so I am guessing a bit here. It is possible that you are running into the same problem i.e. the signature is being calculated again and the signature field not multi-valued, causes an error. Is it somehow possible to add parameters or set a specific update Handler when Im adding documents to the cloud using solrJ? Yes, any custom parameter can be added to a SolrJ request. There is a setParam(String param, String value) method available in AbstractUpdateRequest which can be used to set a custom update.chain for each SolrJ request. In that case I could ether set the update.chain manually and remove it from the request handler or write a second request Handler which I only use if I want set the signature field. I know I can do that manually when Im using eg curl but is it also possible with SolrJ? :) Thanks, Markus -- Regards, Shalin Shekhar Mangar.
Re: Error when submitting PDF to Solr w/text fields using SolrJ
USING Solr 5.1.0 This is the schema file ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.5 field name=_version_ type=long indexed=true stored=true/ field name=_root_ type=string indexed=true stored=false/ field name=id type=string indexed=true stored=true required=false multiValued=false / field name=filepath type=string indexed=true stored =true required=false multiValued=false / field name=title type=string indexed=true stored =true required=false multiValued=false / field name=author type=string indexed=true stored =true required=false multiValued=false / field name=text type=string indexed=true stored =false required=false multiValued=true / field name=key type=string indexed=true stored =false required=false multiValued=false / dynamicField name=*_name type=text_general multiValued=false indexed=true stored=true / dynamicField name=*_i type=intindexed=true stored=true/ dynamicField name=*_is type=intindexed=true stored=true multiValued=true/ dynamicField name=*_s type=string indexed=true stored=true / dynamicField name=*_ss type=string indexed=true stored=true multiValued=true/ dynamicField name=*_l type=long indexed=true stored=true/ dynamicField name=*_ls type=long indexed=true stored=true multiValued=true/ dynamicField name=*_t type=text_generalindexed=true stored=true/ dynamicField name=*_txt type=text_general indexed=true stored=true multiValued=true/ dynamicField name=*_en type=text_enindexed=true stored=true multiValued=true/ dynamicField name=*_b type=boolean indexed=true stored=true/ dynamicField name=*_bs type=boolean indexed=true stored=true multiValued=true/ dynamicField name=*_f type=float indexed=true stored=true/ dynamicField name=*_fs type=float indexed=true stored=true multiValued=true/ dynamicField name=*_d type=double indexed=true stored=true/ dynamicField name=*_ds type=double indexed=true stored=true multiValued=true/ dynamicField name=*_coordinate type=tdouble indexed=true stored=false / dynamicField name=*_dt type=dateindexed=true stored=true/ dynamicField name=*_dts type=dateindexed=true stored=true multiValued=true/ dynamicField name=*_p type=location indexed=true stored=true/ dynamicField name=*_ti type=tintindexed=true stored=true/ dynamicField name=*_tl type=tlong indexed=true stored=true/ dynamicField name=*_tf type=tfloat indexed=true stored=true/ dynamicField name=*_td type=tdouble indexed=true stored=true/ dynamicField name=*_tdt type=tdate indexed=true stored=true/ dynamicField name=*_c type=currency indexed=true stored=true/ dynamicField name=ignored_* type=ignored multiValued=true/ dynamicField name=attr_* type=text_general indexed=true stored=true multiValued=true/ dynamicField name=random_* type=random / uniqueKeyfilepath/uniqueKey fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=boolean class=solr.BoolField sortMissingLast=true/ fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ fieldType name=double class=solr.TrieDoubleField precisionStep=0 positionIncrementGap=0/ fieldType name=tint class=solr.TrieIntField precisionStep=8 positionIncrementGap=0/ fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 positionIncrementGap=0/ fieldType name=tlong class=solr.TrieLongField precisionStep=8 positionIncrementGap=0/ fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ fieldType name=binary class=solr.BinaryField/ fieldType name=random class=solr.RandomSortField indexed=true / fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType
Re: Collections API and adding new boxes
See particularly the ADDREPLICA command and the node parameter. You might not even need the node parameter since when you add a replica Solr does its best to put the new replica on an underutilized node. Best, Erick On Thu, Jun 18, 2015 at 2:58 PM, Shawn Heisey apa...@elyograg.org wrote: On 6/18/2015 3:23 PM, Jim.Musil wrote: Let's say I have a zookeeper ensemble with several Solr nodes connected to it. I've created a collection successfully and all is well. What happens when I want to add another solr node? I've tried spinning one up and connecting it to zookeeper, but the new node doesn't join the collection. What's the expected next step? This is Solr 5.1. The new node will be part of the cloud as soon as it starts, but until you take action with the Collections API, it will not have any indexes on it. SolrCloud does not automatically create replicas except in a very specific set of circumstances that I do not think are very common. You'll need to either create a new collection or take steps to modify your current collection(s) so that one or more shard replicas are located on the new node. https://cwiki.apache.org/confluence/display/solr/Collections+API Thanks, Shawn
Re: Error when submitting PDF to Solr w/text fields using SolrJ
The stack trace is what gets returned to the client, right? It's often much more informative to see the Solr log output, the error message is often much more helpful there. By the time the exception bubbles up through the various layers vital information is sometimes not returned to the client in the error message. One precaution I would take since you've changed the schema is to _completely_ remove the index. 1 shut down Solr 2 rm -rf coreX/data 3 restart Solr. 4 try it again. Lucene doesn't really care at all whether a field gets indexed one way in one document and another way in the next document and occasionally having fields indexed different ways (string and text) in different documents at the same time confuses things. Best, Erick On Thu, Jun 18, 2015 at 10:31 AM, Paden rumsey...@gmail.com wrote: Just rolling out a little bit more information as it is coming. I changed the field type in the schema to text_general and that didn't change a thing. Another thing is that it's consistently submitting/not submitting the same documents. I will run over it one time and it won't index a set of documents. When I clear the index and run the program again it submits/doesn't submit the same documents. And it will index certain PDF's it just won't index others. Which is weird because I printed the strings that are submitted to Solr and the ones that get submitted are really similar to the ones that aren't submitted. I can't post the actual strings for sensitivity reasons. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704p4212757.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.4: Could not create instance of 'SolrInputDocument'
No clue whatsoever, you haven't provided near enough details. I rather doubt that many people on this list really understand the interactions of that technology stack, I certainly don't. I'd ask on the ColdFusion list, as they're (apparently) the ones who've integrated a Solr connector of sorts. What evidence do you have that using a stock Solr is even possible? For all I know, the Solr provided with CF has some kind of customizations (maybe a plugin?) that is required. Best, Erick On Thu, Jun 18, 2015 at 5:22 AM, Paul Revere pere...@mail.iad.gov wrote: Our web site is created using PaperThin's CommonSpot CMS in a ColdFusion 10 and Windows Server 2008 R2 environment, using Apache Solr 4.10.4 instead of CF Solr. We create collections through the CMS interface and they do appear in both the CMS and the Solr dashboard when created. However, when we try indexing our collections through the CMS interface, our CMS error logs show the entry 'Could not create instance of 'SolrInputDocument'' for each member of the collection. This is not a fatal error, as the indexing appears to cycle through all members, but each member errors out with log entries for each member. I've Googled this error message without success. What might this error message indicate please?? Paul
Re: Help: Problem in customized token filter
Hi Steve, you never set exhausted to false, and when the filter got reused, *it incorrectly carried state from the previous document.* Thanks for replying, but I am not able to understand this. With Regards Aman Tandon On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe sar...@gmail.com wrote: Hi Aman, The admin UI screenshot you linked to is from an older version of Solr - what version are you using? Lots of extraneous angle brackets and asterisks got into your email and made for a bunch of cleanup work before I could read or edit it. In the future, please put your code somewhere people can easily read it and copy/paste it into an editor: into a github gist or on a paste service, etc. Looks to me like your use of “exhausted” is unnecessary, and is likely the cause of the problem you saw (only one document getting processed): you never set exhausted to false, and when the filter got reused, it incorrectly carried state from the previous document. Here’s a simpler version that’s hopefully more correct and more efficient (2 fewer copies from the StringBuilder to the final token). Note: I didn’t test it: https://gist.github.com/sarowe/9b9a52b683869ced3a17 Steve www.lucidworks.com On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com wrote: Please help, what wrong I am doing here. please guide me. With Regards Aman Tandon On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* *import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* *public class ConcatenateWordsFilter extends TokenFilter {* * private CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class);* * private OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);* * PositionIncrementAttribute posIncr = addAttribute(PositionIncrementAttribute.class);* * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* * private StringBuilder stringBuilder = new StringBuilder();* * private boolean exhausted = false;* * /*** * * Creates a new ConcatenateWordsFilter* * * @param input TokenStream that will be filtered* * */* * public ConcatenateWordsFilter(TokenStream input) {* *super(input);* *
Re: Help: Problem in customized token filter
Aman, My version won’t produce anything at all, since incrementToken() always returns false… I updated the gist (at the same URL) to fix the problem by returning true from incrementToken() once and then false until reset() is called. It also handles the case when the concatenated token is zero length by not emitting a token. Steve www.lucidworks.com On Jun 19, 2015, at 12:55 AM, Steve Rowe sar...@gmail.com wrote: Hi Aman, The admin UI screenshot you linked to is from an older version of Solr - what version are you using? Lots of extraneous angle brackets and asterisks got into your email and made for a bunch of cleanup work before I could read or edit it. In the future, please put your code somewhere people can easily read it and copy/paste it into an editor: into a github gist or on a paste service, etc. Looks to me like your use of “exhausted” is unnecessary, and is likely the cause of the problem you saw (only one document getting processed): you never set exhausted to false, and when the filter got reused, it incorrectly carried state from the previous document. Here’s a simpler version that’s hopefully more correct and more efficient (2 fewer copies from the StringBuilder to the final token). Note: I didn’t test it: https://gist.github.com/sarowe/9b9a52b683869ced3a17 Steve www.lucidworks.com On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com wrote: Please help, what wrong I am doing here. please guide me. With Regards Aman Tandon On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* *import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* *public class ConcatenateWordsFilter extends TokenFilter {* * private CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class);* * private OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);* * PositionIncrementAttribute posIncr = addAttribute(PositionIncrementAttribute.class);* * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* * private StringBuilder stringBuilder = new StringBuilder();* * private boolean exhausted = false;* * /*** * * Creates a new ConcatenateWordsFilter* * * @param input TokenStream that will be filtered* * */* * public
Auto-suggest in Solr
I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=weightFieldProject/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str int name=ngrams5/int str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent In schema.xml fieldType name=suggestType class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=false/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=true/ /analyzer /fieldType Is there anything that I configured wrongly? Regards, Edwin
Re: Help: Problem in customized token filter
Hi Aman, The admin UI screenshot you linked to is from an older version of Solr - what version are you using? Lots of extraneous angle brackets and asterisks got into your email and made for a bunch of cleanup work before I could read or edit it. In the future, please put your code somewhere people can easily read it and copy/paste it into an editor: into a github gist or on a paste service, etc. Looks to me like your use of “exhausted” is unnecessary, and is likely the cause of the problem you saw (only one document getting processed): you never set exhausted to false, and when the filter got reused, it incorrectly carried state from the previous document. Here’s a simpler version that’s hopefully more correct and more efficient (2 fewer copies from the StringBuilder to the final token). Note: I didn’t test it: https://gist.github.com/sarowe/9b9a52b683869ced3a17 Steve www.lucidworks.com On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com wrote: Please help, what wrong I am doing here. please guide me. With Regards Aman Tandon On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* *import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* *public class ConcatenateWordsFilter extends TokenFilter {* * private CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class);* * private OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);* * PositionIncrementAttribute posIncr = addAttribute(PositionIncrementAttribute.class);* * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* * private StringBuilder stringBuilder = new StringBuilder();* * private boolean exhausted = false;* * /*** * * Creates a new ConcatenateWordsFilter* * * @param input TokenStream that will be filtered* * */* * public ConcatenateWordsFilter(TokenStream input) {* *super(input);* * }* * /*** * * {@inheritDoc}* * */* * @Override* * public final boolean incrementToken() throws IOException {* *while (!exhausted input.incrementToken()) {* * char terms[] = charTermAttribute.buffer();* * int termLength = charTermAttribute.length();* * if(typeAtrr.type().equals(ALPHANUM)){* * stringBuilder.append(terms, 0,
How to append new data to index i solr?
Hello, I'm a solr user with some question. I want to append new data to the existing index. Does Solr support to append new data to index? Thanks for any reply. Best wishes. Jason
Re: Help: Problem in customized token filter
Yes I just saw. With Regards Aman Tandon On Fri, Jun 19, 2015 at 10:39 AM, Steve Rowe sar...@gmail.com wrote: Aman, My version won’t produce anything at all, since incrementToken() always returns false… I updated the gist (at the same URL) to fix the problem by returning true from incrementToken() once and then false until reset() is called. It also handles the case when the concatenated token is zero length by not emitting a token. Steve www.lucidworks.com On Jun 19, 2015, at 12:55 AM, Steve Rowe sar...@gmail.com wrote: Hi Aman, The admin UI screenshot you linked to is from an older version of Solr - what version are you using? Lots of extraneous angle brackets and asterisks got into your email and made for a bunch of cleanup work before I could read or edit it. In the future, please put your code somewhere people can easily read it and copy/paste it into an editor: into a github gist or on a paste service, etc. Looks to me like your use of “exhausted” is unnecessary, and is likely the cause of the problem you saw (only one document getting processed): you never set exhausted to false, and when the filter got reused, it incorrectly carried state from the previous document. Here’s a simpler version that’s hopefully more correct and more efficient (2 fewer copies from the StringBuilder to the final token). Note: I didn’t test it: https://gist.github.com/sarowe/9b9a52b683869ced3a17 Steve www.lucidworks.com On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com wrote: Please help, what wrong I am doing here. please guide me. With Regards Aman Tandon On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* *import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* *public class ConcatenateWordsFilter extends TokenFilter {* * private CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class);* * private OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);* * PositionIncrementAttribute posIncr = addAttribute(PositionIncrementAttribute.class);* * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* * private StringBuilder stringBuilder = new StringBuilder();*
Re: Help: Problem in customized token filter
Aman, Solr uses the same Token filter instances over and over, calling reset() before sending each document through. Your code sets “exhausted to true and then never sets it back to false, so the next time the token filter instance is used, its “exhausted value is still true, so no input stream tokens are concatenated ever again. Does that make sense? Steve www.lucidworks.com On Jun 19, 2015, at 1:10 AM, Aman Tandon amantandon...@gmail.com wrote: Hi Steve, you never set exhausted to false, and when the filter got reused, *it incorrectly carried state from the previous document.* Thanks for replying, but I am not able to understand this. With Regards Aman Tandon On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe sar...@gmail.com wrote: Hi Aman, The admin UI screenshot you linked to is from an older version of Solr - what version are you using? Lots of extraneous angle brackets and asterisks got into your email and made for a bunch of cleanup work before I could read or edit it. In the future, please put your code somewhere people can easily read it and copy/paste it into an editor: into a github gist or on a paste service, etc. Looks to me like your use of “exhausted” is unnecessary, and is likely the cause of the problem you saw (only one document getting processed): you never set exhausted to false, and when the filter got reused, it incorrectly carried state from the previous document. Here’s a simpler version that’s hopefully more correct and more efficient (2 fewer copies from the StringBuilder to the final token). Note: I didn’t test it: https://gist.github.com/sarowe/9b9a52b683869ced3a17 Steve www.lucidworks.com On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com wrote: Please help, what wrong I am doing here. please guide me. With Regards Aman Tandon On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I created a *token concat filter* to concat all the tokens from token stream. It creates the concatenated token as expected. But when I am posting the xml containing more than 30,000 documents, then only first document is having the data of that field. *Schema:* *field name=titlex type=text indexed=true stored=false required=false omitNorms=false multiValued=false /* *fieldType name=text class=solr.TextField positionIncrementGap=100* * analyzer type=index* *charFilter class=solr.HTMLStripCharFilterFactory/* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=true tokenSeparator=/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* *filter class=solr.SynonymFilterFactory synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true expand=true/* * /analyzer* * analyzer type=query* *tokenizer class=solr.StandardTokenizerFactory/* *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/* *filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_text_prime_search.txt enablePositionIncrements=true /* *filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/* *filter class=solr.LowerCaseFilterFactory/* *filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/* *filter class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/* * /analyzer**/fieldType* Please help me, The code for the filter is as follows, please take a look. Here is the picture of what filter is doing http://i.imgur.com/THCsYtG.png?1 The code of concat filter is : *package com.xyz.analysis.concat;* *import java.io.IOException;* *import org.apache.lucene.analysis.TokenFilter;* *import org.apache.lucene.analysis.TokenStream;* *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* *import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* *public class ConcatenateWordsFilter extends TokenFilter {* * private CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class);* * private OffsetAttribute offsetAttribute = addAttribute(OffsetAttribute.class);* * PositionIncrementAttribute posIncr =
Re: How to do a Data sharding for data in a database table
You've repeated your original statement. Shawn's observation is that 10M docs is a very small corpus by Solr standards. You either have very demanding document/search combinations or you have a poorly tuned Solr installation. On reasonable hardware I expect 25-50M documents to have sub-second response time. So what we're trying to do is be sure this isn't an XY problem, from Hossman's apache page: Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 So again, how would you characterize your documents? How many fields? What do queries look like? How much physical memory on the machine? How much memory have you allocated to the JVM? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Thu, Jun 18, 2015 at 3:23 PM, wwang525 wwang...@gmail.com wrote: The query without load is still under 1 second. But under load, response time can be much longer due to the queued up query. We would like to shard the data to something like 6 M / shard, which will still give a under 1 second response time under load. What are some best practice to shard the data? for example, we could shard the data by date range, but that is pretty dynamic, and we could shard data by some other properties, but if the data is not evenly distributed, you may not be able shard it anymore. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4212803.html Sent from the Solr - User mailing list archive at Nabble.com.
How to do a Data sharding for data in a database table
Hi, We probably would like to shard the data since the response time for demanding queries at 10M records is getting 1 second in a single request scenario. I have not done any data sharding before. What are some recommended way to do data sharding. For example, may be by a criteria with a list of specific values? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MappingCharFilterFactory and start and end offsets
Hi Dmitry, It’s weird that start and end offsets are the same - what do you see for the start/end of ‘$’, i.e. if you take out MCFF? (I think it should be start:5, end:6.) As far as offsets “respecting the remapped token”, are you asking for offsets to be set as if ‘dollarsign' were part of the original text? If so, there is no setting that would do that - the intent is for offsets to map to the *original* text. You can work around this by performing the substitution prior to Solr analysis, e.g. in an update processor like RegexReplaceProcessorFactory. Steve www.lucidworks.com On Jun 18, 2015, at 3:07 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi, It looks like MappingCharFilter sets start and end offset to the same value. Can this be affected on by some setting? For a string: test $ test2 and mapping $ = dollarsign (we insert extra space to separate $ into its own token) we get: http://snag.gy/eJT1H.jpg Ideally, we would like to have start and end offset respecting the remapped token. Can this be achieved with settings? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
[ANN] Solr in Action book release (Solr 4.7)
Sent from my iPhone
Re: Error when submitting PDF to Solr w/text fields using SolrJ
Just rolling out a little bit more information as it is coming. I changed the field type in the schema to text_general and that didn't change a thing. Another thing is that it's consistently submitting/not submitting the same documents. I will run over it one time and it won't index a set of documents. When I clear the index and run the program again it submits/doesn't submit the same documents. And it will index certain PDF's it just won't index others. Which is weird because I printed the strings that are submitted to Solr and the ones that get submitted are really similar to the ones that aren't submitted. I can't post the actual strings for sensitivity reasons. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704p4212757.html Sent from the Solr - User mailing list archive at Nabble.com.
Collections API and adding new boxes
Hi, Let's say I have a zookeeper ensemble with several Solr nodes connected to it. I've created a collection successfully and all is well. What happens when I want to add another solr node? I've tried spinning one up and connecting it to zookeeper, but the new node doesn't join the collection. What's the expected next step? This is Solr 5.1. Thanks! Jim Musil
Re: How to do a Data sharding for data in a database table
The query without load is still under 1 second. But under load, response time can be much longer due to the queued up query. We would like to shard the data to something like 6 M / shard, which will still give a under 1 second response time under load. What are some best practice to shard the data? for example, we could shard the data by date range, but that is pretty dynamic, and we could shard data by some other properties, but if the data is not evenly distributed, you may not be able shard it anymore. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4212803.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to do a Data sharding for data in a database table
10M doesn't sound too demanding. How complex are your queries? How complex is your data - like number of fields and size, like very large documents? Are you sure you have enough RAM to fully cache your index? Are your queries compute-bound or I/O bound? If I/O-bound, get more RAM. If compute-bound, sharding may help, but have to examine query complexity first. -- Jack Krupansky On Thu, Jun 18, 2015 at 2:05 PM, wwang525 wwang...@gmail.com wrote: Hi, We probably would like to shard the data since the response time for demanding queries at 10M records is getting 1 second in a single request scenario. I have not done any data sharding before. What are some recommended way to do data sharding. For example, may be by a criteria with a list of specific values? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Collections API and adding new boxes
On 6/18/2015 3:23 PM, Jim.Musil wrote: Let's say I have a zookeeper ensemble with several Solr nodes connected to it. I've created a collection successfully and all is well. What happens when I want to add another solr node? I've tried spinning one up and connecting it to zookeeper, but the new node doesn't join the collection. What's the expected next step? This is Solr 5.1. The new node will be part of the cloud as soon as it starts, but until you take action with the Collections API, it will not have any indexes on it. SolrCloud does not automatically create replicas except in a very specific set of circumstances that I do not think are very common. You'll need to either create a new collection or take steps to modify your current collection(s) so that one or more shard replicas are located on the new node. https://cwiki.apache.org/confluence/display/solr/Collections+API Thanks, Shawn
Re: How to create concatenated token
Hi Erick, In that issue you forwarded to me, they want to make one token from all tokens received from token stream but in my case I want to keep the tokens same and create and extra new token which is concat of all the tokens. I'd guess, is the case here. I mean do you really want to concatenate 50 tokens? We are applying it on *title field* of product so max length can be 10 I guess and that too will be in rare case. With Regards Aman Tandon On Wed, Jun 17, 2015 at 7:16 PM, Erick Erickson erickerick...@gmail.com wrote: If you used the JIRA I linked, vote for it, add any improvements etc. Anyone can attach a patch to a JIRA, you just have to create a login. That said, this may be too rare a use-case to deal with. I just thought of shingling which I should have suggested before that will work for concatenating small numbers of tokens which, I'd guess, is the case here. I mean do you really want to concatenate 50 tokens? Best, Erick On Wed, Jun 17, 2015 at 12:07 AM, Aman Tandon amantandon...@gmail.com wrote: Dear Erick, e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 I did implemented the filter as per my requirement. Thank you so much for your help and guidance. So how could I contribute it to the solr. With Regards Aman Tandon On Wed, Jun 17, 2015 at 10:14 AM, Aman Tandon amantandon...@gmail.com wrote: Hi Erick, Thank you so much, it will be helpful for me to learn how to save the state of token. I has no idea of how to save state of previous tokens due to this it was difficult to generate a concatenated token in the last. So is there anything should I read to learn more about it. With Regards Aman Tandon On Wed, Jun 17, 2015 at 9:20 AM, Erick Erickson erickerick...@gmail.com wrote: I really question the premise, but have a look at: https://issues.apache.org/jira/browse/SOLR-7193 Note that this is not committed and I haven't reviewed it so I don't have anything to say about that. And you'd have to implement it as a custom Filter. Best, Erick On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Any guesses, how could I achieve this behaviour. With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com wrote: e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) typo error e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training) With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com wrote: We has some business logic to search the user query in user intent or finding the exact matching products. e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) As we can see it is phrase query so it will took more time than the single stemmed token query. There are also 5-7 words phrase query. So we want to reduce the search time by implementing this feature. With Regards Aman Tandon On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can I ask you why you need to concatenate the tokens ? Maybe we can find a better solution to concat all the tokens in one single big token . I find it difficult to understand the reasons behind tokenising, token filtering and then un-tokenizing again :) It would be great if you explain a little bit better what you would like to do ! Cheers 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com: Hi, I have a requirement to create the concatenated token of all the tokens created from the last item of my analyzer chain. *Suppose my analyzer chain is :* * tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1 preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front /filter class=solr.PorterStemmerFilterFactory/* I want to create a concatenated token plugin to add at concatenated token along with the last token. e.g. Solr training *Porter:-* solr train Position 1 2 *Concatenated :-* solr train solrtrain Position 1 2 Please help me out. How to create custom filter for this requirement. With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti
MappingCharFilterFactory and start and end offsets
Hi, It looks like MappingCharFilter sets start and end offset to the same value. Can this be affected on by some setting? For a string: test $ test2 and mapping $ = dollarsign (we insert extra space to separate $ into its own token) we get: http://snag.gy/eJT1H.jpg Ideally, we would like to have start and end offset respecting the remapped token. Can this be achieved with settings? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Contribute the Customized Phonetic Filter to Apache Solr
Hi, We created the new phonetic filter, It is working great on our products, mostly of our suppliers are Indian, it is quite helpful for us to provide the exact result e.g. 1) rikshaw, still able to find the suppliers of rickshaw 2) telefone, still able to find the suppliers of telephone We also analyzed our search satisfaction feedback, it improved by 13% (54% - 67%) just after implementing the same. And we want to contribute the same to solr, So how could I do it. With Regards Aman Tandon
Extended Dismax Query Parser with AND as default operator
Hello, I have a question to the extended dismax query parser. If the default operator is changed to AND (q.op=AND) then the search results seems to be incorrect. I will explain it on some examples. For this test I use solr v5.1 and the tika core from the example directory. == Preparation == Add the following lines to the schema.xml file field name=id type=string indexed=true stored=true required=true/ uniqueKeyid/uniqueKey Change the field text to stored=true Remove the multiValued attribute from the title and text field (we don't need multivaled fields in our test) Add test data (use curl or fiddler) Url:http://localhost:8983/solr/tika/update/json?commit=true Header: Content-type: application/json [ {id:1, title:green, author:Jon, text:blue}, {id:2, title:green, author:Jon Jessie, text:red}, {id:3, title:yellow, author:Jessie, text:blue}, {id:4, title:green, author:Jessie, text:blue}, {id:5, title:blue, author:Jon, text:yellow}, {id:6, title:red, author:Jon, text:green} ] == Test == The following parameter are always set. default operator is AND: q.op=AND use the extended dismax query parser: defType=edismax set the default query fields to title and text: qf=title text sort: id asc === #1 test === q=red green response: { numFound:2,start:0, docs:[ {id:2,title:green,author:Jon Jessie,text:red}, {id:6,title:red,author:Jon,text:green}] } parsedquery_toString: +(((text:green | title:green) (text:red | title:red))~2) This test works as expected. === #2 test === We use a group q=(red green) Same response as test one. parsedquery_toString: +(((text:green | title:green) (text:red | title:red))~2) This test works as expected. === #3 test === q=green red author:Jessie response: { numFound:1,start:0, docs:[{id:2,title:green,author:Jon Jessie,text:red}] } parsedquery_toString: +(((text:green | title:green) (text:red | title:red) author:jessie)~3) This test works as expected. === #4 test === q=(green red) author:Jessie response: { numFound:2,start:0, docs:[ {id:2,title:green,author:Jon Jessie,text:red}, {id:4,title:green,author:Jessie,text:blue}] } parsedquery_toString: +text:green | title:green) (text:red | title:red)) author:jessie)~2) The same result as the 3th test was expected. Why no AND is used for the query group? === #5 test === q=(+green +red) author:Jessie response: { numFound:4,start:0, docs:[ {id:2,title:green,author:Jon Jessie,text:red}, {id:3,title:yellow,author:Jessie,text:blue}, {id:4,title:green,author:Jessie,text:blue}, {id:6,title:red,author:Jon,text:green}] } parsedquery_toString: +((+(text:green | title:green) +(text:red | title:red)) author:jessie) Now AND is used for the group but the author is concatenated with OR. Why? === #6 test === q=(+green +red) +author:Jessie response: { numFound:3,start:0, docs:[ {id:2,title:green,author:Jon Jessie,text:red}, {id:3,title:yellow,author:Jessie,text:blue}, {id:4,title:green,author:Jessie,text:blue}] } parsedquery_toString: +((+(text:green | title:green) +(text:red | title:red)) +author:jessie) Still not the expected result. === #7 test === q=+(+green +red) +author:Jessie response: { numFound:1,start:0, docs:[{id:2,title:green,author:Jon Jessie,text:red}] } parsedquery_toString: +(+(+(text:green | title:green) +(text:red | title:red)) +author:jessie) Now the result is ok. But if all operators must be given then q.op=AND is useless. === #8 test === q=green author:(Jon Jessie) Found four results, expected are one. The query must changed to '+green +author:(+Jon +Jessie)' to get the expected result. Is this a bug in the extended dismax parser or what is the reason for not consequently applying q.op=AND to the query expression? Kind regards Dirk Buchhorn
Re: Contribute the Customized Phonetic Filter to Apache Solr
Hi Aman, https://wiki.apache.org/solr/HowToContribute HTH On Thu, Jun 18, 2015 at 12:11 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, We created the new phonetic filter, It is working great on our products, mostly of our suppliers are Indian, it is quite helpful for us to provide the exact result e.g. 1) rikshaw, still able to find the suppliers of rickshaw 2) telefone, still able to find the suppliers of telephone We also analyzed our search satisfaction feedback, it improved by 13% (54% - 67%) just after implementing the same. And we want to contribute the same to solr, So how could I do it. With Regards Aman Tandon
facet query is not working
http://localhost:8983/solr/col/select?q=*:*sfield=geolocationpt=26.697,83.1876facet.query={!frange%20l=0%20u=50}geodist()facet.query={!frange%20l=50.001%20u=100}geodist()wt=json I am not getting facet results . schema: field name=geolocation type=location indexed=true stored=true/ dynamicField name=*_coordinate type=tdouble indexed=true stored= false/
Re: facet query is not working
isn't facet=true necessary? On Thu, Jun 18, 2015 at 12:03 PM, Midas A test.mi...@gmail.com wrote: http://localhost:8983/solr/col/select?q=*:*sfield=geolocationpt=26.697,83.1876facet.query={!frange%20l=0%20u=50}geodist()facet.query={!frange%20l=50.001%20u=100}geodist()wt=json I am not getting facet results . schema: field name=geolocation type=location indexed=true stored=true/ dynamicField name=*_coordinate type=tdouble indexed=true stored= false/ -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Duplicate suggestions
Hi, I am using solr 5.1. I'm getting duplicate suggestions when using my solrsuggester. I'm using AnalyzingInfixLookupFactory DocumentDictionaryFactory. can i configure it to suggest me only different suggestions? here are details about my configuration: from schema.xml:searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namemySuggester1a/str str name=lookupImplAnalyzingInfixLookupFactory/str str name=indexPathsuggester_infix_dir1a/str str name=allTermsRequiredtrue/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldf1/str str name=weightFieldweightField/str str name=suggestAnalyzerFieldTypetext_general/str str name=buildOnStartupfalse/str /lst lst name=suggester str name=namemySuggester2a/str str name=lookupImplAnalyzingInfixLookupFactory/str str name=indexPathsuggester_infix_dir2a/str str name=allTermsRequiredtrue/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldf2/str str name=weightFieldweightField/str str name=suggestAnalyzerFieldTypetext_general/str str name=buildOnStartupfalse/str /lst /searchComponent requestHandler name=/suggest class=solr.SearchHandler startup=lazy lst name=defaults str name=suggesttrue/str str name=suggest.count6/str str name=suggest.dictionarymySuggester1a/str str name=suggest.dictionarymySuggester2a/str /lst arr name=components strsuggest/str /arr /requestHandler from schema.xml:field name=f1 type=string indexed=true stored=true required=false multiValued=false / field name=f2 type=string indexed=true stored=true required=false multiValued=false /Field name=weightField type=float indexed=true stored=true/ ** weightField is ignored by me, I'm not adding any values in it at all. document example:doc str name=f12015-04-01/str str name=f212:06:00/str str name=f3BOOO/str str name=f4/ str name=f57.52.11.212/str str name=f67.52.11.213/str str name=OID52358424/str/doc After i build the suggester I'm trying to get suggests like here: http://localhost/solr/core1/suggest?/suggest=truesuggest.q=12 ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime62/int /lst lst name=suggest lst name=mySuggester2a lst name=12 int name=numFound6/int arr name=suggestions lst str name=term18:34:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=term18:34:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=term18:35:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=term18:35:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=term18:35:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=termlt;bgt;12lt;/bgt;:06:02/str long name=weight0/long str name=payload / /lst /arr /lst /lst lst name=mySuggester1a lst name=12 int name=numFound0/int arr name=suggestions / /lst /lst /lst /response I would like to get this kind of suggester response ( no duplicates ): ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime62/int /lst lst name=suggest lst name=mySuggester2a lst name=12 int name=numFound3/int arr name=suggestions lst str name=term18:34:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=term18:35:lt;bgt;12lt;/bgt;/str long name=weight0/long str name=payload / /lst lst str name=termlt;bgt;12lt;/bgt;:06:02/str long name=weight0/long str name=payload / /lst /arr /lst /lst lst name=mySuggester1a lst name=12 int name=numFound0/int arr name=suggestions / /lst /lst /lst /responseThank you.
Re: facet query is not working
If he has not put any appends or invariant in the request handler, facet=true is mandatory to activate the facets. I haven't tried those specific facet queries . I hope the problem was not simply he didn't activate faceting ... 2015-06-18 10:35 GMT+01:00 Mikhail Khludnev mkhlud...@griddynamics.com: isn't facet=true necessary? On Thu, Jun 18, 2015 at 12:03 PM, Midas A test.mi...@gmail.com wrote: http://localhost:8983/solr/col/select?q=*:*sfield=geolocationpt=26.697,83.1876facet.query={!frange%20l=0%20u=50}geodist()facet.query={!frange%20l=50.001%20u=100}geodist()wt=json I am not getting facet results . schema: field name=geolocation type=location indexed=true stored=true/ dynamicField name=*_coordinate type=tdouble indexed=true stored= false/ -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England