Re: Search for the single hash "#" character never returns results
Fantastic, thanks, yes I completely overlooked that case, separating the analysers worked a treat. Had also posted on stack overflow but the mailing list proved to be superior! Many thanks, Daniel On 27 October 2011 13:09, Erick Erickson wrote: > Take a look at your admin/analysis page and put your tokens in for both > index and query times. What I think you'll see is that the # is being > stripped at query time due to the first PatternReplaceFilterFactory. > > You probably want to split your analyzers into an index-time and query-time > pair and do the appropriate replacements to keep # at quer time. > > > Best > Erick > > On Tue, Oct 25, 2011 at 12:27 PM, Daniel Bradley > wrote: > > When running a search such as: > > field_name:# > > field_name:"#" > > field_name:"\#" > > > > where there is a record with the value of exactly "#", solr returns 0 > rows. > > > > The workaround we are having to use is to use a range query on the > > field such as: > > field_name:[# TO #] > > and this returns the correct documents. > > > > Use case details: > > We have a field that indexes a text field and calculates a "letter > > group". This keeps only the first significant character from a value > > (number or letter), and if it is a number the simply stores "#" as we > > want all numbered items grouped together. > > > > I'm also aware that we could also fix this by using a specific number > > instead of the hash character, however, I though I'd raise this to see > > if there is a wider issue. I've listed some specific details below. > > > > Thanks for your time, > > > > Daniel Bradley > > > > > > Field definition: > > > sortMissingLast="true" omitNorms="true"> > > > > > pattern="^([a-zA-Z0-9]).*" group="1"/> > > > > > > >pattern="([^a-z0-9])" replacement="" replace="all" > >/> > > >pattern="([0-9])" replacement="#" replace="all" > >/> > > > > > > > > Server information: > > Solr Specification Version: 3.2.0 > > Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15 > > Lucene Specification Version: 3.2.0 > > Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57 > > >
Search for the single hash "#" character never returns results
When running a search such as: field_name:# field_name:"#" field_name:"\#" where there is a record with the value of exactly "#", solr returns 0 rows. The workaround we are having to use is to use a range query on the field such as: field_name:[# TO #] and this returns the correct documents. Use case details: We have a field that indexes a text field and calculates a "letter group". This keeps only the first significant character from a value (number or letter), and if it is a number the simply stores "#" as we want all numbered items grouped together. I'm also aware that we could also fix this by using a specific number instead of the hash character, however, I though I'd raise this to see if there is a wider issue. I've listed some specific details below. Thanks for your time, Daniel Bradley Field definition: Server information: Solr Specification Version: 3.2.0 Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15 Lucene Specification Version: 3.2.0 Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57
RE: Oracle incomplete DataImport results
After investigating the log files, the DataImporter was throwing an error from the Oracle DB driver: java.sql.SQLException: ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 2890, maximum: 2000) Aka. There was a problem with the 551st item where a related item had a text field of type Clob that was too long and was therefore causing a problem when using the function TO_NCHAR to fix the type. FIX: Used the Oracle function dbms_lob.substr("FIELD_NAME", MAX_LENGTH, 1) to just trim the string (this also applies and implicit converstion). -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: 22 September 2009 19:27 To: solr-user@lucene.apache.org Subject: Re: Oracle incomplete DataImport results On Tue, Sep 22, 2009 at 10:53 PM, Daniel Bradley < daniel.brad...@adfero.co.uk> wrote: > I appear to be getting only a small number of items imported into Solr > when doing a full-import against an oracle data-provider. The query I'm > running is something approximately similar to: > > SELECT "ID", dbms_lob.substr("Text", 4000, 1) "Text", "Date", > "LastModified", "Type", "Created", "Available", "Parent", "Title" from > "TheTableName" where "Available" < CURRENT_DATE and "Available" > > add_months(current_date, -1) > > This retrieves the last month's items from the database (The > dbms_lob.substr function is used to avoid Solr simply indexing the > object name as Text is the Oracle clob type). When running this in > oracle sql developer approximately 5600 rows are returned however > running a full import only imports approximately 550 items. > > There's no visible memory use and no exceptions suggesting any problems > with lack of memory. Is there any limiting of the number of items you > can import in a single request? Any other thoughts on this problem would > be much appreciated. > > What is the uniqueKey in schema.xml? Is it possible that many of those 5600 rows share the same value for solr's uniqueKey field? There are no limits on the number of items you can import. The number of documents created should correspond to the number of rows returned by the root level entity's query (assuming the uniqueKey for each of those documents is actually unique). -- Regards, Shalin Shekhar Mangar. This message has been scanned for viruses by Websense Hosted Email Security - On Behalf of Adfero Ltd DISCLAIMER: This email (including any attachments) is subject to copyright, and the information in it is confidential. Use of this email or of any information in it other than by the addressee is unauthorised and unlawful. Whilst reasonable efforts are made to ensure that any attachments are virus-free, it is the recipient's sole responsibility to scan all attachments for viruses. All calls and emails to and from this company may be monitored and recorded for legitimate purposes relating to this company's business. Any opinions expressed in this email (or in any attachments) are those of the author and do not necessarily represent the opinions of Adfero Ltd or of any other group company.
Oracle incomplete DataImport results
I appear to be getting only a small number of items imported into Solr when doing a full-import against an oracle data-provider. The query I'm running is something approximately similar to: SELECT "ID", dbms_lob.substr("Text", 4000, 1) "Text", "Date", "LastModified", "Type", "Created", "Available", "Parent", "Title" from "TheTableName" where "Available" < CURRENT_DATE and "Available" > add_months(current_date, -1) This retrieves the last month's items from the database (The dbms_lob.substr function is used to avoid Solr simply indexing the object name as Text is the Oracle clob type). When running this in oracle sql developer approximately 5600 rows are returned however running a full import only imports approximately 550 items. There's no visible memory use and no exceptions suggesting any problems with lack of memory. Is there any limiting of the number of items you can import in a single request? Any other thoughts on this problem would be much appreciated. Thanks Other Information: Running the command: http://xxx.xxx.xxx.xxx:8080/solr/dataimport?command=full-import Produces the output: 0 0 data-config.xml full-import idle 0:5:43.58 559 4726 557 0 2009-09-22 16:58:46 This response format is experimental. It is likely to change in the future. Running the command: http://xxx.xxx.xxx.xxx:8080/solr/dataimport?command=full-import&debug=on &verbose=true Produces the following output (dots added where content is not relevant): 0 40906 data-config.xml full-import debug ... SELECT "ID", dbms_lob.substr("Text", 4000, 1) "Text", "Date", "LastModified", "Type", "Created", "Available", "Parent", "Title" from "TheTableName" where "Available" < CURRENT_DATE and "Available" > add_months(current_date, -1) 0:0:7.766 --- row #1- 2009-08-22T16:04:04Z java.math.BigDecimal:0 ... 2009-08-22T16:04:04Z 2009-08-22T16:04:04Z java.math.BigDecimal:235 java.math.BigDecimal:1320541 2009-08-22T16:04:58Z ... - SELECT CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT("Level1",' '),"Level2"), ' '), "Level3"), ' '), "Level4") "Levels", TO_NCHAR("TheCategories"."Value") "Value" FROM "TheCategories" WHERE "TheCategories".ID='1320541' 0:0:5.485 --- row #1- 12 235 1848 - ... ... idle Configuration Re-loaded sucessfully 11 93 0 2009-09-22 16:47:28 0:0:39.47 This response format is experimental. It is likely to change in the future. This message has been scanned for viruses by Websense Hosted Email Security - On Behalf of Adfero Ltd DISCLAIMER: This email (including any attachments) is subject to copyright, and the information in it is confidential. Use of this email or of any information in it other than by the addressee is unauthorised and unlawful. Whilst reasonable efforts are made to ensure that any attachments are virus-free, it is the recipient's sole responsibility to scan all attachments for viruses. All calls and emails to and from this company may be monitored and recorded for legitimate purposes relating to this company's business. Any opinions expressed in this email (or in any attachments) are those of the author and do not necessarily represent the opinions of Adfero Ltd or of any other group company.