Re: Stemming for Finnish language

2011-01-23 Thread Matti Oinas
Have you tried lucene-hunspell plugin. Haven't tested it, but seems
promising if it works in 1.4.1.

http://rcmuir.wordpress.com/2010/03/02/minority-language-support-for-lucene-and-solr/

Matti

2011/1/21 Laura Virtala laura.virt...@eficode.fi:
 On 01/21/2011 11:26 AM, Laura Virtala wrote:

 Hello,

 I cannot find any examples how to configure FinnishLightStemFilterFactory
 (I understood that
 SnowballPorterFilterFactory for Finnish language doesn't work correctly).
 I tried following in my schema.xml, but I got
 org.apache.solr.common.SolrException: Error loading class
 'solr.FinnishLightStemFilterFactory'
 ...
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.FinnishLightStemFilterFactory/
 ...

 Is there some parameters or some additional steps that are required in
 order to use this component?

 Br,
 Laura

 Hi,
 I just noticed that the FinnishLightStemFilterFactory is not in the solr
 version that I'm using (1.4.1). Is there any workaround to get the Finnish
 language stemming to work correctly with the version 1.4.1?

 Br,
 Laura



Re: solr wildcard queries and analyzers

2011-01-12 Thread Matti Oinas
I'm little busy right now, but I'm going to try to find suitable
parser or if none is found then I think the only solution is to write
a new one.

2011/1/13 Jayendra Patil jayendra.patil@gmail.com:
 Had the same issues with international characters and wildcard searches.

 One workaround we implemented, was to index the field with and without the
 ASCIIFoldingFilterFactory.
 You would have an original field and one with english equivalent to be used
 during searching.

 Wildcard searches with english equivalent or international terms would match
 either of those.
 Also, lowere case the search terms if you are using lowercasefilter during
 indexing.

 Reagrds,
 Jayendra

 On Wed, Jan 12, 2011 at 7:46 AM, Kári Hreinsson k...@gagnavarslan.iswrote:

 Have you made any progress?  Since the AnalyzingQueryParser doesn't inherit
 from QParserPlugin solr doesn't want to use it but I guess we could
 implement a similar parser that does inherit from QParserPlugin?

 Switching parser seems to be what is needed?  Has really no one solved this
 before?

 - Kári

 - Original Message -
 From: Matti Oinas matti.oi...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, 11 January, 2011 12:47:52 PM
 Subject: Re: solr wildcard queries and analyzers

 This might be the solution.


 http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html

 2011/1/11 Matti Oinas matti.oi...@gmail.com:
  Sorry, the message was not meant to be sent here. We are struggling
  with the same problem here.
 
  2011/1/11 Matti Oinas matti.oi...@gmail.com:
  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers
 
  On wildcard and fuzzy searches, no text analysis is performed on the
  search word.
 
  2011/1/11 Kári Hreinsson k...@gagnavarslan.is:
  Hi,
 
  I am having a problem with the fact that no text analysis are performed
 on wildcard queries.  I have the following field type (a bit simplified):
     fieldType name=text class=solr.TextField
 positionIncrementGap=100
       analyzer
         tokenizer class=solr.WhitespaceTokenizerFactory /
         filter class=solr.TrimFilterFactory /
         filter class=solr.LowerCaseFilterFactory /
         filter class=solr.ASCIIFoldingFilterFactory /
       /analyzer
     /fieldType
 
  My problem has to do with Icelandic characters, when I index a document
 with a text field including the word sjálfsögðu it gets indexed as
 sjalfsogdu (because of the ASCIIFoldingFilterFactory which replaces the
 Icelandic characters with their English equivalents).  Then, when I search
 (without a wildcard) for sjálfsögðu or sjalfsogdu I get that document as
 a result.  This is convenient since it enables people to search without
 using accented characters and yet get the results they want (e.g. if they
 are working on computers with English keyboards).
 
  However this all falls apart when using wildcard searches, then the
 search string isn't passed through the filters, and even if I search for
 sjálf* I don't get any results because the index doesn't contain the
 original words (I get result if I search for sjalf*).  I know people have
 been having a similar problem with the case sensitivity of wildcard queries
 and most often the solution seems to be to lowercase the string before
 passing it on to solr, which is not exactly an optimal solution (yet a
 simple one in that case).  The Icelandic characters complicate things a bit
 and applying the same solution (doing the lowercasing and character mapping)
 in my application seems like unnecessary duplication of code already part of
 solr, not to mention complication of my application and possible maintenance
 down the road.
 
  Is there any way around this?  How are people solving this?  Is there a
 way to apply the filters to wildcard queries?  I guess removing the
 ASCIIFoldingFilterFactory is the simplest solution but this
 normalization (of the text done by the filter) is often very useful.
 
  I hope I'm not overlooking some obvious explanation. :/
 
  Thanks in advance,
  Kári Hreinsson
 
 
 




Re: Problem with DIH delta-import delete.

2011-01-11 Thread Matti Oinas
Problem was incorrect pk definition on data-config.xml

entity name=blog
   pk=id
 ...
   field column=uuid name=uuid
template=blog-${blog.id} /
   field column=id name=blog_id /

pk attribute needs to be the same as Solr uniqueField, so in my case
changing pk value from id to uuid solved the problem.


2010/12/7 Matti Oinas matti.oi...@gmail.com:
 Thanks Koji.

 Problem seems to be that template transformer is not used when delete
 is performed.

 ...
 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
 collectDelta
 INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0
 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
 collectDelta
 INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223
 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
 collectDelta
 INFO: Completed parentDeltaQuery for Entity: entry
 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll
 INFO: Deleting stale documents
 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
 INFO: Deleting document: 787
 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
 INFO: Deleting document: 786
 ...

 There are entries with id 787 and 786 in database and those are marked
 as deleted. Query returns right number of deleted documents and right
 rows from database but delete fails because solr is using plain
 numeric id when deleting document. The same happens with blogs also.

 Matti


 2010/12/4 Koji Sekiguchi k...@r.email.ne.jp:
 (10/11/17 20:18), Matti Oinas wrote:

 Solr does not delete documents from index although delta-import says
 it has deleted n documents from index. I'm using version 1.4.1.

 The schema looks like

  fields
     field name=uuid type=string indexed=true stored=true
 required=true /
     field name=type type=int indexed=true stored=true
 required=true /
     field name=blog_id type=int indexed=true stored=true /
     field name=entry_id type=int indexed=false stored=true /
     field name=content type=textgen indexed=true stored=true /
  /fields
  uniqueKeyuuid/uniqueKey


 Relevant fields from database tables:

 TABLE: blogs and entries both have

   Field: id
    Type: int(11)
    Null: NO
     Key: PRI
 Default: NULL
   Extra: auto_increment
 
   Field: modified
    Type: datetime
    Null: YES
     Key:
 Default: NULL
   Extra:
 
   Field: status
    Type: tinyint(1) unsigned
    Null: YES
     Key:
 Default: NULL
   Extra:


 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
        dataSource type=JdbcDataSource
 driver=com.mysql.jdbc.Driver.../
        document
                entity name=blog
                                pk=id
                                query=SELECT id,description,1 as type FROM
 blogs WHERE status=2
                                deltaImportQuery=SELECT id,description,1
 as type FROM blogs WHERE
 status=2 AND id='${dataimporter.delta.id}'
                                deltaQuery=SELECT id FROM blogs WHERE
 '${dataimporter.last_index_time}'lt; modified AND status=2
                                deletedPkQuery=SELECT id FROM blogs WHERE
 '${dataimporter.last_index_time}'lt;= modified AND status=3
                                transformer=TemplateTransformer
                        field column=uuid name=uuid
 template=blog-${blog.id} /
                        field column=id name=blog_id /
                        field column=description name=content /
                        field column=type name=type /
                /entity
                entity name=entry
                                pk=id
                                query=SELECT f.id as
 id,f.content,f.blog_id,2 as type FROM
 entries f,blogs b WHERE f.blog_id=b.id AND b.status=2
                                deltaImportQuery=SELECT f.id as
 id,f.content,f.blog_id,2 as type
 FROM entries f,blogs b WHERE f.blog_id=b.id AND
 f.id='${dataimporter.delta.id}'
                                deltaQuery=SELECT f.id as id FROM entries
 f JOIN blogs b ON
 b.id=f.blog_id WHERE '${dataimporter.last_index_time}'lt; b.modified
 AND b.status=2
                                deletedPkQuery=SELECT f.id as id FROM
 entries f JOIN blogs b ON
 b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}'
 lt; b.modified

  transformer=HTMLStripTransformer,TemplateTransformer
                        field column=uuid name=uuid
 template=entry-${entry.id} /
                        field column=id name=entry_id /
                        field column=blog_id name=blog_id /
                        field column=content name=content
 stripHTML=true /
                        field column=type name=type /
                /entity
        /document
 /dataConfig

 Full import and delta import works without problems

Re: solr wildcard queries and analyzers

2011-01-11 Thread Matti Oinas
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers

On wildcard and fuzzy searches, no text analysis is performed on the
search word.

2011/1/11 Kári Hreinsson k...@gagnavarslan.is:
 Hi,

 I am having a problem with the fact that no text analysis are performed on 
 wildcard queries.  I have the following field type (a bit simplified):
    fieldType name=text class=solr.TextField positionIncrementGap=100
      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.TrimFilterFactory /
        filter class=solr.LowerCaseFilterFactory /
        filter class=solr.ASCIIFoldingFilterFactory /
      /analyzer
    /fieldType

 My problem has to do with Icelandic characters, when I index a document with 
 a text field including the word sjálfsögðu it gets indexed as sjalfsogdu 
 (because of the ASCIIFoldingFilterFactory which replaces the Icelandic 
 characters with their English equivalents).  Then, when I search (without a 
 wildcard) for sjálfsögðu or sjalfsogdu I get that document as a result.  
 This is convenient since it enables people to search without using accented 
 characters and yet get the results they want (e.g. if they are working on 
 computers with English keyboards).

 However this all falls apart when using wildcard searches, then the search 
 string isn't passed through the filters, and even if I search for sjálf* I 
 don't get any results because the index doesn't contain the original words (I 
 get result if I search for sjalf*).  I know people have been having a 
 similar problem with the case sensitivity of wildcard queries and most often 
 the solution seems to be to lowercase the string before passing it on to 
 solr, which is not exactly an optimal solution (yet a simple one in that 
 case).  The Icelandic characters complicate things a bit and applying the 
 same solution (doing the lowercasing and character mapping) in my application 
 seems like unnecessary duplication of code already part of solr, not to 
 mention complication of my application and possible maintenance down the road.

 Is there any way around this?  How are people solving this?  Is there a way 
 to apply the filters to wildcard queries?  I guess removing the 
 ASCIIFoldingFilterFactory is the simplest solution but this normalization 
 (of the text done by the filter) is often very useful.

 I hope I'm not overlooking some obvious explanation. :/

 Thanks in advance,
 Kári Hreinsson



Re: solr wildcard queries and analyzers

2011-01-11 Thread Matti Oinas
Sorry, the message was not meant to be sent here. We are struggling
with the same problem here.

2011/1/11 Matti Oinas matti.oi...@gmail.com:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers

 On wildcard and fuzzy searches, no text analysis is performed on the
 search word.

 2011/1/11 Kári Hreinsson k...@gagnavarslan.is:
 Hi,

 I am having a problem with the fact that no text analysis are performed on 
 wildcard queries.  I have the following field type (a bit simplified):
    fieldType name=text class=solr.TextField positionIncrementGap=100
      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.TrimFilterFactory /
        filter class=solr.LowerCaseFilterFactory /
        filter class=solr.ASCIIFoldingFilterFactory /
      /analyzer
    /fieldType

 My problem has to do with Icelandic characters, when I index a document with 
 a text field including the word sjálfsögðu it gets indexed as sjalfsogdu 
 (because of the ASCIIFoldingFilterFactory which replaces the Icelandic 
 characters with their English equivalents).  Then, when I search (without a 
 wildcard) for sjálfsögðu or sjalfsogdu I get that document as a result.  
 This is convenient since it enables people to search without using accented 
 characters and yet get the results they want (e.g. if they are working on 
 computers with English keyboards).

 However this all falls apart when using wildcard searches, then the search 
 string isn't passed through the filters, and even if I search for sjálf* I 
 don't get any results because the index doesn't contain the original words 
 (I get result if I search for sjalf*).  I know people have been having a 
 similar problem with the case sensitivity of wildcard queries and most often 
 the solution seems to be to lowercase the string before passing it on to 
 solr, which is not exactly an optimal solution (yet a simple one in that 
 case).  The Icelandic characters complicate things a bit and applying the 
 same solution (doing the lowercasing and character mapping) in my 
 application seems like unnecessary duplication of code already part of solr, 
 not to mention complication of my application and possible maintenance down 
 the road.

 Is there any way around this?  How are people solving this?  Is there a way 
 to apply the filters to wildcard queries?  I guess removing the 
 ASCIIFoldingFilterFactory is the simplest solution but this 
 normalization (of the text done by the filter) is often very useful.

 I hope I'm not overlooking some obvious explanation. :/

 Thanks in advance,
 Kári Hreinsson




Re: solr wildcard queries and analyzers

2011-01-11 Thread Matti Oinas
This might be the solution.

http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html

2011/1/11 Matti Oinas matti.oi...@gmail.com:
 Sorry, the message was not meant to be sent here. We are struggling
 with the same problem here.

 2011/1/11 Matti Oinas matti.oi...@gmail.com:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers

 On wildcard and fuzzy searches, no text analysis is performed on the
 search word.

 2011/1/11 Kári Hreinsson k...@gagnavarslan.is:
 Hi,

 I am having a problem with the fact that no text analysis are performed on 
 wildcard queries.  I have the following field type (a bit simplified):
    fieldType name=text class=solr.TextField positionIncrementGap=100
      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.TrimFilterFactory /
        filter class=solr.LowerCaseFilterFactory /
        filter class=solr.ASCIIFoldingFilterFactory /
      /analyzer
    /fieldType

 My problem has to do with Icelandic characters, when I index a document 
 with a text field including the word sjálfsögðu it gets indexed as 
 sjalfsogdu (because of the ASCIIFoldingFilterFactory which replaces the 
 Icelandic characters with their English equivalents).  Then, when I search 
 (without a wildcard) for sjálfsögðu or sjalfsogdu I get that document 
 as a result.  This is convenient since it enables people to search without 
 using accented characters and yet get the results they want (e.g. if they 
 are working on computers with English keyboards).

 However this all falls apart when using wildcard searches, then the search 
 string isn't passed through the filters, and even if I search for sjálf* 
 I don't get any results because the index doesn't contain the original 
 words (I get result if I search for sjalf*).  I know people have been 
 having a similar problem with the case sensitivity of wildcard queries and 
 most often the solution seems to be to lowercase the string before passing 
 it on to solr, which is not exactly an optimal solution (yet a simple one 
 in that case).  The Icelandic characters complicate things a bit and 
 applying the same solution (doing the lowercasing and character mapping) in 
 my application seems like unnecessary duplication of code already part of 
 solr, not to mention complication of my application and possible 
 maintenance down the road.

 Is there any way around this?  How are people solving this?  Is there a way 
 to apply the filters to wildcard queries?  I guess removing the 
 ASCIIFoldingFilterFactory is the simplest solution but this 
 normalization (of the text done by the filter) is often very useful.

 I hope I'm not overlooking some obvious explanation. :/

 Thanks in advance,
 Kári Hreinsson





Re: DataImportHanlder - Multiple entities will step into each other

2011-01-06 Thread Matti Oinas
Concat doesn't work as expected.

Doing SELECT concat('blog-',id) as uuid instead of template
transformer the uuid in the index would be something like

str name=uuid[...@d760bb/str

instead of

str name=uuidblog-1/str

I haven't tested if DIH can perform delete when using concat but at
least you can not delete by uuid from anywhere else when using concat.

2011/1/5 Ephraim Ofir ephra...@icq.com:
 You could get around that by doing the concatenation at the SQL level, that 
 way deletes would work as well.

 Ephraim Ofir

 -Original Message-
 From: Matti Oinas [mailto:matti.oi...@gmail.com]
 Sent: Tuesday, January 04, 2011 3:57 PM
 To: solr-user@lucene.apache.org
 Subject: Re: DataImportHanlder - Multiple entities will step into each other

 I managed to do that by using TemplateTransformer

 document
  entity name=company. transformer=TemplateTransformer
     field column=id name=id template=company-${company.id} /
 ...
  entity name=item. transformer=TemplateTransformer
     field column=id name=id template=item-${item.id} /
 ...
 /document

 Only problem is that delta import fails to perform delete to the
 index. It seems that TemplateTransformer is not used when performing
 delete so delete by id doesn't work.



 2011/1/4 yu shen shenyu...@gmail.com:
 Hi All,

 I have a dataimporthandler config file as below. It contains multiple
 entities:
 dataConfig
        dataSource name=jdbc driver=com.mysql.jdbc.Driver

 url=jdbc:mysql://localhost:1521/changan?useUnicode=trueamp;characterEncoding=utf8amp;autoReconnect=true...
 /
        document
                entity name=item dataSource=jdbc pk=id query=...
                entity name=company dataSource=jdbc pk=id query=
                
        /document
 /dataConfig

 All data are from a database. Problem is item/company and other entity all
 have the field 'id', with value start from 1 to n. In this case,
 item/company etc. will step into each other.
 Is there a way to prevent is from happening. Such as designate different
 entity to different partition.

 One way I can think of is to seperate different entity to different
 instance, which is not ideal solution IMO.

 Would some one point me to a reference? And also give some instructions?




Re: DataImportHanlder - Multiple entities will step into each other

2011-01-06 Thread Matti Oinas
Forgot to mention that delete works fine with TemplateTransformer when
you are using it to create unique values for uniqueid field in solr
and that same field is defined as pk in data config.


schema.xml

field name=uuid type=string indexed=true stored=true required=true /
..
uniqueKeyuuid/uniqueKey


data-config.xml

entity name=blog pk=uuid
field column=uuid name=uuid template=blog-${blog.blog_id} /
..
entity name=entry pk=uuid..
field column=uuid name=uuid template=blog-${entry.entry_id} /


DIH performs delete by getting value from field defined as pk in
data-config and tries to delete document from index using this value
to match to the field defined as uniqueId in schema. So if uniqueId
and pk fields are different then DIH would probably fail to delete
anything or it will delete something that is not supposed to be
deleted.


2011/1/7 Matti Oinas matti.oi...@gmail.com:
 Concat doesn't work as expected.

 Doing SELECT concat('blog-',id) as uuid instead of template
 transformer the uuid in the index would be something like

 str name=uuid[...@d760bb/str

 instead of

 str name=uuidblog-1/str

 I haven't tested if DIH can perform delete when using concat but at
 least you can not delete by uuid from anywhere else when using concat.

 2011/1/5 Ephraim Ofir ephra...@icq.com:
 You could get around that by doing the concatenation at the SQL level, that 
 way deletes would work as well.

 Ephraim Ofir

 -Original Message-
 From: Matti Oinas [mailto:matti.oi...@gmail.com]
 Sent: Tuesday, January 04, 2011 3:57 PM
 To: solr-user@lucene.apache.org
 Subject: Re: DataImportHanlder - Multiple entities will step into each other

 I managed to do that by using TemplateTransformer

 document
  entity name=company. transformer=TemplateTransformer
     field column=id name=id template=company-${company.id} /
 ...
  entity name=item. transformer=TemplateTransformer
     field column=id name=id template=item-${item.id} /
 ...
 /document

 Only problem is that delta import fails to perform delete to the
 index. It seems that TemplateTransformer is not used when performing
 delete so delete by id doesn't work.



 2011/1/4 yu shen shenyu...@gmail.com:
 Hi All,

 I have a dataimporthandler config file as below. It contains multiple
 entities:
 dataConfig
        dataSource name=jdbc driver=com.mysql.jdbc.Driver

 url=jdbc:mysql://localhost:1521/changan?useUnicode=trueamp;characterEncoding=utf8amp;autoReconnect=true...
 /
        document
                entity name=item dataSource=jdbc pk=id query=...
                entity name=company dataSource=jdbc pk=id query=
                
        /document
 /dataConfig

 All data are from a database. Problem is item/company and other entity all
 have the field 'id', with value start from 1 to n. In this case,
 item/company etc. will step into each other.
 Is there a way to prevent is from happening. Such as designate different
 entity to different partition.

 One way I can think of is to seperate different entity to different
 instance, which is not ideal solution IMO.

 Would some one point me to a reference? And also give some instructions?





Re: DataImportHanlder - Multiple entities will step into each other

2011-01-04 Thread Matti Oinas
I managed to do that by using TemplateTransformer

document
  entity name=company. transformer=TemplateTransformer
 field column=id name=id template=company-${company.id} /
...
  entity name=item. transformer=TemplateTransformer
 field column=id name=id template=item-${item.id} /
...
/document

Only problem is that delta import fails to perform delete to the
index. It seems that TemplateTransformer is not used when performing
delete so delete by id doesn't work.



2011/1/4 yu shen shenyu...@gmail.com:
 Hi All,

 I have a dataimporthandler config file as below. It contains multiple
 entities:
 dataConfig
        dataSource name=jdbc driver=com.mysql.jdbc.Driver

 url=jdbc:mysql://localhost:1521/changan?useUnicode=trueamp;characterEncoding=utf8amp;autoReconnect=true...
 /
        document
                entity name=item dataSource=jdbc pk=id query=...
                entity name=company dataSource=jdbc pk=id query=
                
        /document
 /dataConfig

 All data are from a database. Problem is item/company and other entity all
 have the field 'id', with value start from 1 to n. In this case,
 item/company etc. will step into each other.
 Is there a way to prevent is from happening. Such as designate different
 entity to different partition.

 One way I can think of is to seperate different entity to different
 instance, which is not ideal solution IMO.

 Would some one point me to a reference? And also give some instructions?



Re: Problem with DIH delta-import delete.

2010-12-06 Thread Matti Oinas
Thanks Koji.

Problem seems to be that template transformer is not used when delete
is performed.

...
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: entry
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll
INFO: Deleting stale documents
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 787
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 786
...

There are entries with id 787 and 786 in database and those are marked
as deleted. Query returns right number of deleted documents and right
rows from database but delete fails because solr is using plain
numeric id when deleting document. The same happens with blogs also.

Matti


2010/12/4 Koji Sekiguchi k...@r.email.ne.jp:
 (10/11/17 20:18), Matti Oinas wrote:

 Solr does not delete documents from index although delta-import says
 it has deleted n documents from index. I'm using version 1.4.1.

 The schema looks like

  fields
     field name=uuid type=string indexed=true stored=true
 required=true /
     field name=type type=int indexed=true stored=true
 required=true /
     field name=blog_id type=int indexed=true stored=true /
     field name=entry_id type=int indexed=false stored=true /
     field name=content type=textgen indexed=true stored=true /
  /fields
  uniqueKeyuuid/uniqueKey


 Relevant fields from database tables:

 TABLE: blogs and entries both have

   Field: id
    Type: int(11)
    Null: NO
     Key: PRI
 Default: NULL
   Extra: auto_increment
 
   Field: modified
    Type: datetime
    Null: YES
     Key:
 Default: NULL
   Extra:
 
   Field: status
    Type: tinyint(1) unsigned
    Null: YES
     Key:
 Default: NULL
   Extra:


 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
        dataSource type=JdbcDataSource
 driver=com.mysql.jdbc.Driver.../
        document
                entity name=blog
                                pk=id
                                query=SELECT id,description,1 as type FROM
 blogs WHERE status=2
                                deltaImportQuery=SELECT id,description,1
 as type FROM blogs WHERE
 status=2 AND id='${dataimporter.delta.id}'
                                deltaQuery=SELECT id FROM blogs WHERE
 '${dataimporter.last_index_time}'lt; modified AND status=2
                                deletedPkQuery=SELECT id FROM blogs WHERE
 '${dataimporter.last_index_time}'lt;= modified AND status=3
                                transformer=TemplateTransformer
                        field column=uuid name=uuid
 template=blog-${blog.id} /
                        field column=id name=blog_id /
                        field column=description name=content /
                        field column=type name=type /
                /entity
                entity name=entry
                                pk=id
                                query=SELECT f.id as
 id,f.content,f.blog_id,2 as type FROM
 entries f,blogs b WHERE f.blog_id=b.id AND b.status=2
                                deltaImportQuery=SELECT f.id as
 id,f.content,f.blog_id,2 as type
 FROM entries f,blogs b WHERE f.blog_id=b.id AND
 f.id='${dataimporter.delta.id}'
                                deltaQuery=SELECT f.id as id FROM entries
 f JOIN blogs b ON
 b.id=f.blog_id WHERE '${dataimporter.last_index_time}'lt; b.modified
 AND b.status=2
                                deletedPkQuery=SELECT f.id as id FROM
 entries f JOIN blogs b ON
 b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}'
 lt; b.modified

  transformer=HTMLStripTransformer,TemplateTransformer
                        field column=uuid name=uuid
 template=entry-${entry.id} /
                        field column=id name=entry_id /
                        field column=blog_id name=blog_id /
                        field column=content name=content
 stripHTML=true /
                        field column=type name=type /
                /entity
        /document
 /dataConfig

 Full import and delta import works without problems when it comes to
 adding new documents to the index but when blog is deleted (status is
 set to 3 in database), solr report after delta import is something
 like Indexing completed. Added/Updated: 0 documents. Deleted 81
 documents.. The problem is that documents are still found from solr
 index.

 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26;

 2. delta-import =

 str name=
 Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.
 /str
 str name

Problem with DIH delta-import delete.

2010-11-17 Thread Matti Oinas
Solr does not delete documents from index although delta-import says
it has deleted n documents from index. I'm using version 1.4.1.

The schema looks like

 fields
field name=uuid type=string indexed=true stored=true
required=true /
field name=type type=int indexed=true stored=true
required=true /
field name=blog_id type=int indexed=true stored=true /
field name=entry_id type=int indexed=false stored=true /
field name=content type=textgen indexed=true stored=true /
 /fields
 uniqueKeyuuid/uniqueKey


Relevant fields from database tables:

TABLE: blogs and entries both have

  Field: id
   Type: int(11)
   Null: NO
Key: PRI
Default: NULL
  Extra: auto_increment

  Field: modified
   Type: datetime
   Null: YES
Key:
Default: NULL
  Extra:

  Field: status
   Type: tinyint(1) unsigned
   Null: YES
Key:
Default: NULL
  Extra:


?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver.../
document
entity name=blog
pk=id
query=SELECT id,description,1 as type FROM 
blogs WHERE status=2
deltaImportQuery=SELECT id,description,1 as 
type FROM blogs WHERE
status=2 AND id='${dataimporter.delta.id}'
deltaQuery=SELECT id FROM blogs WHERE
'${dataimporter.last_index_time}' lt; modified AND status=2
deletedPkQuery=SELECT id FROM blogs WHERE
'${dataimporter.last_index_time}' lt;= modified AND status=3
transformer=TemplateTransformer
field column=uuid name=uuid 
template=blog-${blog.id} /
field column=id name=blog_id /
field column=description name=content /
field column=type name=type /
/entity
entity name=entry
pk=id
query=SELECT f.id as id,f.content,f.blog_id,2 
as type FROM
entries f,blogs b WHERE f.blog_id=b.id AND b.status=2
deltaImportQuery=SELECT f.id as 
id,f.content,f.blog_id,2 as type
FROM entries f,blogs b WHERE f.blog_id=b.id AND
f.id='${dataimporter.delta.id}'
deltaQuery=SELECT f.id as id FROM entries f 
JOIN blogs b ON
b.id=f.blog_id WHERE '${dataimporter.last_index_time}' lt; b.modified
AND b.status=2
deletedPkQuery=SELECT f.id as id FROM entries 
f JOIN blogs b ON
b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}'
lt; b.modified

transformer=HTMLStripTransformer,TemplateTransformer
field column=uuid name=uuid 
template=entry-${entry.id} /
field column=id name=entry_id /
field column=blog_id name=blog_id /
field column=content name=content stripHTML=true 
/
field column=type name=type /
/entity
/document
/dataConfig

Full import and delta import works without problems when it comes to
adding new documents to the index but when blog is deleted (status is
set to 3 in database), solr report after delta import is something
like Indexing completed. Added/Updated: 0 documents. Deleted 81
documents.. The problem is that documents are still found from solr
index.

1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26;

2. delta-import =

str name=
Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.
/str
str name=Committed2010-11-17 13:00:50/str
str name=Optimized2010-11-17 13:00:50/str

So solr says it has deleted documents and that index is also optimzed
and committed after the operation.

3. Search; blog_id:26 still returns 1 document with type 1 (blog) and
80 documents with type 2 (entry).


Re: simple dismax with OR

2010-11-15 Thread Matti Oinas
Define mm(Minimum 'should' match) value for dismax. The default is
100% so every clause must match.

http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29

2010/11/15 Jakub Godawa jakub.god...@gmail.com:
 Hi! I have my dismax that is searching through two fields.

 requestHandler name=en class=solr.searchHandler
  lst name=defaults
    str name=defTypedismax/str
    str name=qt
      name_en^1.0 answe_en^1.5
    /str
  /lst
 /requestHandler

 Now I have a document that has Various appliances can be installed
 here in the answen_en field, indexed with English analyzer.
 When I query installation I have the result of that doc, which is OK.\
 When I query How to install something? I get nothing which is bad,
 because there is match highligthed on the analysis page.

 I've read that dismax don't read the q.op (query default operator).
 How should I do my dismax to handle that?

 Cheers,
 Jakub Godawa.