Re: Stemming for Finnish language
Have you tried lucene-hunspell plugin. Haven't tested it, but seems promising if it works in 1.4.1. http://rcmuir.wordpress.com/2010/03/02/minority-language-support-for-lucene-and-solr/ Matti 2011/1/21 Laura Virtala laura.virt...@eficode.fi: On 01/21/2011 11:26 AM, Laura Virtala wrote: Hello, I cannot find any examples how to configure FinnishLightStemFilterFactory (I understood that SnowballPorterFilterFactory for Finnish language doesn't work correctly). I tried following in my schema.xml, but I got org.apache.solr.common.SolrException: Error loading class 'solr.FinnishLightStemFilterFactory' ... filter class=solr.LowerCaseFilterFactory/ filter class=solr.FinnishLightStemFilterFactory/ ... Is there some parameters or some additional steps that are required in order to use this component? Br, Laura Hi, I just noticed that the FinnishLightStemFilterFactory is not in the solr version that I'm using (1.4.1). Is there any workaround to get the Finnish language stemming to work correctly with the version 1.4.1? Br, Laura
Re: solr wildcard queries and analyzers
I'm little busy right now, but I'm going to try to find suitable parser or if none is found then I think the only solution is to write a new one. 2011/1/13 Jayendra Patil jayendra.patil@gmail.com: Had the same issues with international characters and wildcard searches. One workaround we implemented, was to index the field with and without the ASCIIFoldingFilterFactory. You would have an original field and one with english equivalent to be used during searching. Wildcard searches with english equivalent or international terms would match either of those. Also, lowere case the search terms if you are using lowercasefilter during indexing. Reagrds, Jayendra On Wed, Jan 12, 2011 at 7:46 AM, Kári Hreinsson k...@gagnavarslan.iswrote: Have you made any progress? Since the AnalyzingQueryParser doesn't inherit from QParserPlugin solr doesn't want to use it but I guess we could implement a similar parser that does inherit from QParserPlugin? Switching parser seems to be what is needed? Has really no one solved this before? - Kári - Original Message - From: Matti Oinas matti.oi...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, 11 January, 2011 12:47:52 PM Subject: Re: solr wildcard queries and analyzers This might be the solution. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html 2011/1/11 Matti Oinas matti.oi...@gmail.com: Sorry, the message was not meant to be sent here. We are struggling with the same problem here. 2011/1/11 Matti Oinas matti.oi...@gmail.com: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers On wildcard and fuzzy searches, no text analysis is performed on the search word. 2011/1/11 Kári Hreinsson k...@gagnavarslan.is: Hi, I am having a problem with the fact that no text analysis are performed on wildcard queries. I have the following field type (a bit simplified): fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.ASCIIFoldingFilterFactory / /analyzer /fieldType My problem has to do with Icelandic characters, when I index a document with a text field including the word sjálfsögðu it gets indexed as sjalfsogdu (because of the ASCIIFoldingFilterFactory which replaces the Icelandic characters with their English equivalents). Then, when I search (without a wildcard) for sjálfsögðu or sjalfsogdu I get that document as a result. This is convenient since it enables people to search without using accented characters and yet get the results they want (e.g. if they are working on computers with English keyboards). However this all falls apart when using wildcard searches, then the search string isn't passed through the filters, and even if I search for sjálf* I don't get any results because the index doesn't contain the original words (I get result if I search for sjalf*). I know people have been having a similar problem with the case sensitivity of wildcard queries and most often the solution seems to be to lowercase the string before passing it on to solr, which is not exactly an optimal solution (yet a simple one in that case). The Icelandic characters complicate things a bit and applying the same solution (doing the lowercasing and character mapping) in my application seems like unnecessary duplication of code already part of solr, not to mention complication of my application and possible maintenance down the road. Is there any way around this? How are people solving this? Is there a way to apply the filters to wildcard queries? I guess removing the ASCIIFoldingFilterFactory is the simplest solution but this normalization (of the text done by the filter) is often very useful. I hope I'm not overlooking some obvious explanation. :/ Thanks in advance, Kári Hreinsson
Re: Problem with DIH delta-import delete.
Problem was incorrect pk definition on data-config.xml entity name=blog pk=id ... field column=uuid name=uuid template=blog-${blog.id} / field column=id name=blog_id / pk attribute needs to be the same as Solr uniqueField, so in my case changing pk value from id to uuid solved the problem. 2010/12/7 Matti Oinas matti.oi...@gmail.com: Thanks Koji. Problem seems to be that template transformer is not used when delete is performed. ... Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: entry Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll INFO: Deleting stale documents Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: 787 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: 786 ... There are entries with id 787 and 786 in database and those are marked as deleted. Query returns right number of deleted documents and right rows from database but delete fails because solr is using plain numeric id when deleting document. The same happens with blogs also. Matti 2010/12/4 Koji Sekiguchi k...@r.email.ne.jp: (10/11/17 20:18), Matti Oinas wrote: Solr does not delete documents from index although delta-import says it has deleted n documents from index. I'm using version 1.4.1. The schema looks like fields field name=uuid type=string indexed=true stored=true required=true / field name=type type=int indexed=true stored=true required=true / field name=blog_id type=int indexed=true stored=true / field name=entry_id type=int indexed=false stored=true / field name=content type=textgen indexed=true stored=true / /fields uniqueKeyuuid/uniqueKey Relevant fields from database tables: TABLE: blogs and entries both have Field: id Type: int(11) Null: NO Key: PRI Default: NULL Extra: auto_increment Field: modified Type: datetime Null: YES Key: Default: NULL Extra: Field: status Type: tinyint(1) unsigned Null: YES Key: Default: NULL Extra: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver.../ document entity name=blog pk=id query=SELECT id,description,1 as type FROM blogs WHERE status=2 deltaImportQuery=SELECT id,description,1 as type FROM blogs WHERE status=2 AND id='${dataimporter.delta.id}' deltaQuery=SELECT id FROM blogs WHERE '${dataimporter.last_index_time}'lt; modified AND status=2 deletedPkQuery=SELECT id FROM blogs WHERE '${dataimporter.last_index_time}'lt;= modified AND status=3 transformer=TemplateTransformer field column=uuid name=uuid template=blog-${blog.id} / field column=id name=blog_id / field column=description name=content / field column=type name=type / /entity entity name=entry pk=id query=SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND b.status=2 deltaImportQuery=SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND f.id='${dataimporter.delta.id}' deltaQuery=SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE '${dataimporter.last_index_time}'lt; b.modified AND b.status=2 deletedPkQuery=SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}' lt; b.modified transformer=HTMLStripTransformer,TemplateTransformer field column=uuid name=uuid template=entry-${entry.id} / field column=id name=entry_id / field column=blog_id name=blog_id / field column=content name=content stripHTML=true / field column=type name=type / /entity /document /dataConfig Full import and delta import works without problems
Re: solr wildcard queries and analyzers
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers On wildcard and fuzzy searches, no text analysis is performed on the search word. 2011/1/11 Kári Hreinsson k...@gagnavarslan.is: Hi, I am having a problem with the fact that no text analysis are performed on wildcard queries. I have the following field type (a bit simplified): fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.ASCIIFoldingFilterFactory / /analyzer /fieldType My problem has to do with Icelandic characters, when I index a document with a text field including the word sjálfsögðu it gets indexed as sjalfsogdu (because of the ASCIIFoldingFilterFactory which replaces the Icelandic characters with their English equivalents). Then, when I search (without a wildcard) for sjálfsögðu or sjalfsogdu I get that document as a result. This is convenient since it enables people to search without using accented characters and yet get the results they want (e.g. if they are working on computers with English keyboards). However this all falls apart when using wildcard searches, then the search string isn't passed through the filters, and even if I search for sjálf* I don't get any results because the index doesn't contain the original words (I get result if I search for sjalf*). I know people have been having a similar problem with the case sensitivity of wildcard queries and most often the solution seems to be to lowercase the string before passing it on to solr, which is not exactly an optimal solution (yet a simple one in that case). The Icelandic characters complicate things a bit and applying the same solution (doing the lowercasing and character mapping) in my application seems like unnecessary duplication of code already part of solr, not to mention complication of my application and possible maintenance down the road. Is there any way around this? How are people solving this? Is there a way to apply the filters to wildcard queries? I guess removing the ASCIIFoldingFilterFactory is the simplest solution but this normalization (of the text done by the filter) is often very useful. I hope I'm not overlooking some obvious explanation. :/ Thanks in advance, Kári Hreinsson
Re: solr wildcard queries and analyzers
Sorry, the message was not meant to be sent here. We are struggling with the same problem here. 2011/1/11 Matti Oinas matti.oi...@gmail.com: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers On wildcard and fuzzy searches, no text analysis is performed on the search word. 2011/1/11 Kári Hreinsson k...@gagnavarslan.is: Hi, I am having a problem with the fact that no text analysis are performed on wildcard queries. I have the following field type (a bit simplified): fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.ASCIIFoldingFilterFactory / /analyzer /fieldType My problem has to do with Icelandic characters, when I index a document with a text field including the word sjálfsögðu it gets indexed as sjalfsogdu (because of the ASCIIFoldingFilterFactory which replaces the Icelandic characters with their English equivalents). Then, when I search (without a wildcard) for sjálfsögðu or sjalfsogdu I get that document as a result. This is convenient since it enables people to search without using accented characters and yet get the results they want (e.g. if they are working on computers with English keyboards). However this all falls apart when using wildcard searches, then the search string isn't passed through the filters, and even if I search for sjálf* I don't get any results because the index doesn't contain the original words (I get result if I search for sjalf*). I know people have been having a similar problem with the case sensitivity of wildcard queries and most often the solution seems to be to lowercase the string before passing it on to solr, which is not exactly an optimal solution (yet a simple one in that case). The Icelandic characters complicate things a bit and applying the same solution (doing the lowercasing and character mapping) in my application seems like unnecessary duplication of code already part of solr, not to mention complication of my application and possible maintenance down the road. Is there any way around this? How are people solving this? Is there a way to apply the filters to wildcard queries? I guess removing the ASCIIFoldingFilterFactory is the simplest solution but this normalization (of the text done by the filter) is often very useful. I hope I'm not overlooking some obvious explanation. :/ Thanks in advance, Kári Hreinsson
Re: solr wildcard queries and analyzers
This might be the solution. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html 2011/1/11 Matti Oinas matti.oi...@gmail.com: Sorry, the message was not meant to be sent here. We are struggling with the same problem here. 2011/1/11 Matti Oinas matti.oi...@gmail.com: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers On wildcard and fuzzy searches, no text analysis is performed on the search word. 2011/1/11 Kári Hreinsson k...@gagnavarslan.is: Hi, I am having a problem with the fact that no text analysis are performed on wildcard queries. I have the following field type (a bit simplified): fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.ASCIIFoldingFilterFactory / /analyzer /fieldType My problem has to do with Icelandic characters, when I index a document with a text field including the word sjálfsögðu it gets indexed as sjalfsogdu (because of the ASCIIFoldingFilterFactory which replaces the Icelandic characters with their English equivalents). Then, when I search (without a wildcard) for sjálfsögðu or sjalfsogdu I get that document as a result. This is convenient since it enables people to search without using accented characters and yet get the results they want (e.g. if they are working on computers with English keyboards). However this all falls apart when using wildcard searches, then the search string isn't passed through the filters, and even if I search for sjálf* I don't get any results because the index doesn't contain the original words (I get result if I search for sjalf*). I know people have been having a similar problem with the case sensitivity of wildcard queries and most often the solution seems to be to lowercase the string before passing it on to solr, which is not exactly an optimal solution (yet a simple one in that case). The Icelandic characters complicate things a bit and applying the same solution (doing the lowercasing and character mapping) in my application seems like unnecessary duplication of code already part of solr, not to mention complication of my application and possible maintenance down the road. Is there any way around this? How are people solving this? Is there a way to apply the filters to wildcard queries? I guess removing the ASCIIFoldingFilterFactory is the simplest solution but this normalization (of the text done by the filter) is often very useful. I hope I'm not overlooking some obvious explanation. :/ Thanks in advance, Kári Hreinsson
Re: DataImportHanlder - Multiple entities will step into each other
Concat doesn't work as expected. Doing SELECT concat('blog-',id) as uuid instead of template transformer the uuid in the index would be something like str name=uuid[...@d760bb/str instead of str name=uuidblog-1/str I haven't tested if DIH can perform delete when using concat but at least you can not delete by uuid from anywhere else when using concat. 2011/1/5 Ephraim Ofir ephra...@icq.com: You could get around that by doing the concatenation at the SQL level, that way deletes would work as well. Ephraim Ofir -Original Message- From: Matti Oinas [mailto:matti.oi...@gmail.com] Sent: Tuesday, January 04, 2011 3:57 PM To: solr-user@lucene.apache.org Subject: Re: DataImportHanlder - Multiple entities will step into each other I managed to do that by using TemplateTransformer document entity name=company. transformer=TemplateTransformer field column=id name=id template=company-${company.id} / ... entity name=item. transformer=TemplateTransformer field column=id name=id template=item-${item.id} / ... /document Only problem is that delta import fails to perform delete to the index. It seems that TemplateTransformer is not used when performing delete so delete by id doesn't work. 2011/1/4 yu shen shenyu...@gmail.com: Hi All, I have a dataimporthandler config file as below. It contains multiple entities: dataConfig dataSource name=jdbc driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:1521/changan?useUnicode=trueamp;characterEncoding=utf8amp;autoReconnect=true... / document entity name=item dataSource=jdbc pk=id query=... entity name=company dataSource=jdbc pk=id query= /document /dataConfig All data are from a database. Problem is item/company and other entity all have the field 'id', with value start from 1 to n. In this case, item/company etc. will step into each other. Is there a way to prevent is from happening. Such as designate different entity to different partition. One way I can think of is to seperate different entity to different instance, which is not ideal solution IMO. Would some one point me to a reference? And also give some instructions?
Re: DataImportHanlder - Multiple entities will step into each other
Forgot to mention that delete works fine with TemplateTransformer when you are using it to create unique values for uniqueid field in solr and that same field is defined as pk in data config. schema.xml field name=uuid type=string indexed=true stored=true required=true / .. uniqueKeyuuid/uniqueKey data-config.xml entity name=blog pk=uuid field column=uuid name=uuid template=blog-${blog.blog_id} / .. entity name=entry pk=uuid.. field column=uuid name=uuid template=blog-${entry.entry_id} / DIH performs delete by getting value from field defined as pk in data-config and tries to delete document from index using this value to match to the field defined as uniqueId in schema. So if uniqueId and pk fields are different then DIH would probably fail to delete anything or it will delete something that is not supposed to be deleted. 2011/1/7 Matti Oinas matti.oi...@gmail.com: Concat doesn't work as expected. Doing SELECT concat('blog-',id) as uuid instead of template transformer the uuid in the index would be something like str name=uuid[...@d760bb/str instead of str name=uuidblog-1/str I haven't tested if DIH can perform delete when using concat but at least you can not delete by uuid from anywhere else when using concat. 2011/1/5 Ephraim Ofir ephra...@icq.com: You could get around that by doing the concatenation at the SQL level, that way deletes would work as well. Ephraim Ofir -Original Message- From: Matti Oinas [mailto:matti.oi...@gmail.com] Sent: Tuesday, January 04, 2011 3:57 PM To: solr-user@lucene.apache.org Subject: Re: DataImportHanlder - Multiple entities will step into each other I managed to do that by using TemplateTransformer document entity name=company. transformer=TemplateTransformer field column=id name=id template=company-${company.id} / ... entity name=item. transformer=TemplateTransformer field column=id name=id template=item-${item.id} / ... /document Only problem is that delta import fails to perform delete to the index. It seems that TemplateTransformer is not used when performing delete so delete by id doesn't work. 2011/1/4 yu shen shenyu...@gmail.com: Hi All, I have a dataimporthandler config file as below. It contains multiple entities: dataConfig dataSource name=jdbc driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:1521/changan?useUnicode=trueamp;characterEncoding=utf8amp;autoReconnect=true... / document entity name=item dataSource=jdbc pk=id query=... entity name=company dataSource=jdbc pk=id query= /document /dataConfig All data are from a database. Problem is item/company and other entity all have the field 'id', with value start from 1 to n. In this case, item/company etc. will step into each other. Is there a way to prevent is from happening. Such as designate different entity to different partition. One way I can think of is to seperate different entity to different instance, which is not ideal solution IMO. Would some one point me to a reference? And also give some instructions?
Re: DataImportHanlder - Multiple entities will step into each other
I managed to do that by using TemplateTransformer document entity name=company. transformer=TemplateTransformer field column=id name=id template=company-${company.id} / ... entity name=item. transformer=TemplateTransformer field column=id name=id template=item-${item.id} / ... /document Only problem is that delta import fails to perform delete to the index. It seems that TemplateTransformer is not used when performing delete so delete by id doesn't work. 2011/1/4 yu shen shenyu...@gmail.com: Hi All, I have a dataimporthandler config file as below. It contains multiple entities: dataConfig dataSource name=jdbc driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:1521/changan?useUnicode=trueamp;characterEncoding=utf8amp;autoReconnect=true... / document entity name=item dataSource=jdbc pk=id query=... entity name=company dataSource=jdbc pk=id query= /document /dataConfig All data are from a database. Problem is item/company and other entity all have the field 'id', with value start from 1 to n. In this case, item/company etc. will step into each other. Is there a way to prevent is from happening. Such as designate different entity to different partition. One way I can think of is to seperate different entity to different instance, which is not ideal solution IMO. Would some one point me to a reference? And also give some instructions?
Re: Problem with DIH delta-import delete.
Thanks Koji. Problem seems to be that template transformer is not used when delete is performed. ... Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: entry Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll INFO: Deleting stale documents Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: 787 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: 786 ... There are entries with id 787 and 786 in database and those are marked as deleted. Query returns right number of deleted documents and right rows from database but delete fails because solr is using plain numeric id when deleting document. The same happens with blogs also. Matti 2010/12/4 Koji Sekiguchi k...@r.email.ne.jp: (10/11/17 20:18), Matti Oinas wrote: Solr does not delete documents from index although delta-import says it has deleted n documents from index. I'm using version 1.4.1. The schema looks like fields field name=uuid type=string indexed=true stored=true required=true / field name=type type=int indexed=true stored=true required=true / field name=blog_id type=int indexed=true stored=true / field name=entry_id type=int indexed=false stored=true / field name=content type=textgen indexed=true stored=true / /fields uniqueKeyuuid/uniqueKey Relevant fields from database tables: TABLE: blogs and entries both have Field: id Type: int(11) Null: NO Key: PRI Default: NULL Extra: auto_increment Field: modified Type: datetime Null: YES Key: Default: NULL Extra: Field: status Type: tinyint(1) unsigned Null: YES Key: Default: NULL Extra: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver.../ document entity name=blog pk=id query=SELECT id,description,1 as type FROM blogs WHERE status=2 deltaImportQuery=SELECT id,description,1 as type FROM blogs WHERE status=2 AND id='${dataimporter.delta.id}' deltaQuery=SELECT id FROM blogs WHERE '${dataimporter.last_index_time}'lt; modified AND status=2 deletedPkQuery=SELECT id FROM blogs WHERE '${dataimporter.last_index_time}'lt;= modified AND status=3 transformer=TemplateTransformer field column=uuid name=uuid template=blog-${blog.id} / field column=id name=blog_id / field column=description name=content / field column=type name=type / /entity entity name=entry pk=id query=SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND b.status=2 deltaImportQuery=SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND f.id='${dataimporter.delta.id}' deltaQuery=SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE '${dataimporter.last_index_time}'lt; b.modified AND b.status=2 deletedPkQuery=SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}' lt; b.modified transformer=HTMLStripTransformer,TemplateTransformer field column=uuid name=uuid template=entry-${entry.id} / field column=id name=entry_id / field column=blog_id name=blog_id / field column=content name=content stripHTML=true / field column=type name=type / /entity /document /dataConfig Full import and delta import works without problems when it comes to adding new documents to the index but when blog is deleted (status is set to 3 in database), solr report after delta import is something like Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.. The problem is that documents are still found from solr index. 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26; 2. delta-import = str name= Indexing completed. Added/Updated: 0 documents. Deleted 81 documents. /str str name
Problem with DIH delta-import delete.
Solr does not delete documents from index although delta-import says it has deleted n documents from index. I'm using version 1.4.1. The schema looks like fields field name=uuid type=string indexed=true stored=true required=true / field name=type type=int indexed=true stored=true required=true / field name=blog_id type=int indexed=true stored=true / field name=entry_id type=int indexed=false stored=true / field name=content type=textgen indexed=true stored=true / /fields uniqueKeyuuid/uniqueKey Relevant fields from database tables: TABLE: blogs and entries both have Field: id Type: int(11) Null: NO Key: PRI Default: NULL Extra: auto_increment Field: modified Type: datetime Null: YES Key: Default: NULL Extra: Field: status Type: tinyint(1) unsigned Null: YES Key: Default: NULL Extra: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver.../ document entity name=blog pk=id query=SELECT id,description,1 as type FROM blogs WHERE status=2 deltaImportQuery=SELECT id,description,1 as type FROM blogs WHERE status=2 AND id='${dataimporter.delta.id}' deltaQuery=SELECT id FROM blogs WHERE '${dataimporter.last_index_time}' lt; modified AND status=2 deletedPkQuery=SELECT id FROM blogs WHERE '${dataimporter.last_index_time}' lt;= modified AND status=3 transformer=TemplateTransformer field column=uuid name=uuid template=blog-${blog.id} / field column=id name=blog_id / field column=description name=content / field column=type name=type / /entity entity name=entry pk=id query=SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND b.status=2 deltaImportQuery=SELECT f.id as id,f.content,f.blog_id,2 as type FROM entries f,blogs b WHERE f.blog_id=b.id AND f.id='${dataimporter.delta.id}' deltaQuery=SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE '${dataimporter.last_index_time}' lt; b.modified AND b.status=2 deletedPkQuery=SELECT f.id as id FROM entries f JOIN blogs b ON b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}' lt; b.modified transformer=HTMLStripTransformer,TemplateTransformer field column=uuid name=uuid template=entry-${entry.id} / field column=id name=entry_id / field column=blog_id name=blog_id / field column=content name=content stripHTML=true / field column=type name=type / /entity /document /dataConfig Full import and delta import works without problems when it comes to adding new documents to the index but when blog is deleted (status is set to 3 in database), solr report after delta import is something like Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.. The problem is that documents are still found from solr index. 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26; 2. delta-import = str name= Indexing completed. Added/Updated: 0 documents. Deleted 81 documents. /str str name=Committed2010-11-17 13:00:50/str str name=Optimized2010-11-17 13:00:50/str So solr says it has deleted documents and that index is also optimzed and committed after the operation. 3. Search; blog_id:26 still returns 1 document with type 1 (blog) and 80 documents with type 2 (entry).
Re: simple dismax with OR
Define mm(Minimum 'should' match) value for dismax. The default is 100% so every clause must match. http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 2010/11/15 Jakub Godawa jakub.god...@gmail.com: Hi! I have my dismax that is searching through two fields. requestHandler name=en class=solr.searchHandler lst name=defaults str name=defTypedismax/str str name=qt name_en^1.0 answe_en^1.5 /str /lst /requestHandler Now I have a document that has Various appliances can be installed here in the answen_en field, indexed with English analyzer. When I query installation I have the result of that doc, which is OK.\ When I query How to install something? I get nothing which is bad, because there is match highligthed on the analysis page. I've read that dismax don't read the q.op (query default operator). How should I do my dismax to handle that? Cheers, Jakub Godawa.