Re: Is this DIH entity forEach expression OK? ... yes
Hello, I am having bother with forEach. I have XML source documents containing many embedded images within mediaBlock elements. Each image has a an associated caption. I want to implement a separate image search function which searches the captions and brings back the associated image. entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} stream=false forEach=/record | /record/mediaBlock field column=vurl xpath=/record/mediaBlock/mediaObject/@vurl / field column=imgCpation xpath=/record/mediaBlock/caption / Is is OK to have an xpath expression within forEach which is a child of another of the forEach xpath expressions? Yes. It works fine, duplicate uniqueKeys were making it appear otherwise. But -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: spellcheck.onlyMorePopular
Grant Ingersoll wrote: I believe the reason is b/c when onlyMP is false, if the word itself is already in the index, it short circuits out. When onlyMP is true, it checks to see if there are more frequently occurring variations. This would mean that onlyMorePopular=false isn't useful at all. If the word is in the index it would not find less frequent words and if it is not in the index onlyMorePopular=false isn't usefull since there are no less popular words. So if you are right this is a bug, isn't it? Thanks, Marcus
abbreviation problem
hi, all to abbreviation, for example, 'US', how can i get results containing 'United States' in solr or lucene? in solr, synonyms filter, it seems only to handle one-word to one-word. but in abbreviation queries, words should be expanded. any body has a goods solution to that ? --steven.li
commit error which kill my dataimport.properties file
Hi, Last night I've got an error during the importation and I don't get what does that mean and it even kill my dataimport.properties (empty file), so nothing was write in this file then the delta-import, started to import from the very start I guess. Thanks a lot for your help, I wish you guys a lovely day, there is the error: 2009/02/12 23:45:01 commit request to Solr at http://books.com:8180/solr/books/update failed: 2009/02/12 23:45:01 htmlheadtitleApache Tomcat/5.5 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - No space left on device java.io.IOException: No space left on device at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(RandomAccessFile.java:466) at org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:679) at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85) at org.apache.lucene.store.BufferedIndexOutput.close(BufferedIndexOutput.java:109) at org.apache.lucene.store.FSDirectory$FSIndexOutput.close(FSDirectory.java:686) at org.apache.lucene.index.FieldsWriter.close(FieldsWriter.java:145) at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:83) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:373) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:562) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3803) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3712) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1752) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1716) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1687) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:214) at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:172) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:341) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:78) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:619) /h1HR size=1 noshade=noshadeptype Status report/ppmessage uNo space left on device java.io.IOException: No space left on device at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(RandomAccessFile.java:466) at
Re: commit error which kill my dataimport.properties file
It's actually the space, sorry. But yes my snapshot looks huge around 3G every 20mn, so should I clean them up more often like every 4hours?? sunnyfr wrote: Hi, Last night I've got an error during the importation and I don't get what does that mean and it even kill my dataimport.properties (empty file), so nothing was write in this file then the delta-import, started to import from the very start I guess. Thanks a lot for your help, I wish you guys a lovely day, there is the error: 2009/02/12 23:45:01 commit request to Solr at http://books.com:8180/solr/books/update failed: 2009/02/12 23:45:01 htmlheadtitleApache Tomcat/5.5 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - No space left on device java.io.IOException: No space left on device at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(RandomAccessFile.java:466) at org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:679) at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85) at org.apache.lucene.store.BufferedIndexOutput.close(BufferedIndexOutput.java:109) at org.apache.lucene.store.FSDirectory$FSIndexOutput.close(FSDirectory.java:686) at org.apache.lucene.index.FieldsWriter.close(FieldsWriter.java:145) at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:83) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:373) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:562) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3803) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3712) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1752) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1716) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1687) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:214) at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:172) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:341) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:78) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:619) /h1HR size=1 noshade=noshadeptype Status
Re: spellcheck.onlyMorePopular
Shalin Shekhar Mangar wrote: The end goal is to give spelling suggestions. Even if it gave less frequently occurring spelling suggestions, what would you do with it? To give you an example: We have an index for computer games. One title is gran turismo. The word gran is less frequent in the index than grand. So if someone searches for grand turismo there will be no suggestion gran. And to come back to my last question: There seems to be no case in which onlyMorePopular=false makes sense (provided Grant's assumption is correct). Do you see one? Thanks, Marcus
Re: several snapshot ...
Hi Hoss, Thanks a lot for your clear answer. It's very clear. Thanks hossman wrote: : I would like to get how is a snapshot really. It's obviously a hard link to : the files. : But it just contain the last update ?? the nature of lucene indexes is that files are never modified -- only created, or deleted. this makes rsyncing very efficient when updates have been made to an index, because only new new files exist, and only those new files need to be synced. : My problem is ... Ive cronjob to commit and auto start snapshooter every : 5mn, which works properly. : And on my slaves I've cronjob every 5minute to snapshoot. But I dont get : why, it doesn't take every snapshot files ... maybe bad syncronisation, or : it's too long to install? in your case, it looks like two things are happening... note that the snapinstaller command run at 16:35:36 doesn't seem to finish until 16:41:37 (361 seconds later) and it encountered an error that it couldn't connect to your solr port (do the scripts have hte correct host:port configuration?) the second thing to notice is that everytime snapinstaller runs after that, it says the most current snapshot is snapshot.20090205162502 ... which means either snappuller isn't running often enough, or snapshooter isn't producing snapshots as frequently as you think it is. FYI: typically people cron snappuller; snapinstaller together as a single crontab entry ... http://wiki.apache.org/solr/CollectionDistribution : 2009/02/05 16:35:36 started by root : 2009/02/05 16:35:36 command: /data/solr/books/bin/snapinstaller : 2009/02/05 16:35:37 installing snapshot : /data/solr/books/data/snapshot.20090205162502 : 2009/02/05 16:35:38 notifing Solr to open a new Searcher : 2009/02/05 16:40:04 started by root : 2009/02/05 16:40:04 command: /data/solr/books/bin/snapinstaller : 2009/02/05 16:40:04 latest snapshot : /data/solr/books/data/snapshot.20090205162502 already installed : 2009/02/05 16:40:04 ended (elapsed time: 0 sec) : 2009/02/05 16:41:37 failed to connect to Solr server : 2009/02/05 16:41:37 snapshot installed but Solr server has not open a new : Searcher : 2009/02/05 16:41:37 failed (elapsed time: 361 sec) : 2009/02/05 16:54:40 started by root : 2009/02/05 16:54:40 started by root : 2009/02/05 16:54:40 command: /data/solr/books/bin/snapinstaller : 2009/02/05 16:54:40 command: /data/solr/books/bin/snapinstaller : 2009/02/05 16:54:40 latest snapshot : /data/solr/books/data/snapshot.20090205162502 already installed : 2009/02/05 16:54:40 latest snapshot : /data/solr/books/data/snapshot.20090205162502 already installed : 2009/02/05 16:54:40 ended (elapsed time: 0 sec) : 2009/02/05 16:54:40 ended (elapsed time: 0 sec) -Hoss -- View this message in context: http://www.nabble.com/several-snapshot-...-tp21855239p21992862.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: abbreviation problem
李学健 wrote: hi, all to abbreviation, for example, 'US', how can i get results containing 'United States' in solr or lucene? in solr, synonyms filter, it seems only to handle one-word to one-word. but in abbreviation queries, words should be expanded. SynonymFilter should support one word to phrase (two words or more), phrase to one word and phrase to phrase. For example: US, United States or US = United States or United States = US Cheers, Koji
Problem using DIH templatetransformer to create uniqueKey
Hello, templatetransformer behaves rather ungracefully if one of the replacement fields is missing. I am parsing a single XML document into multiple separate solr documents. It turns out that none of the source documents fields can be used to create a uniqueKey alone. I need to combine two, using template transformer as follows: entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} / field column=fileWebPath regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1 sourceColName=fileAbsolutePath/ field column=id template=${jc.fileAbsolutePath}${x.vurl} / field column=vurl xpath=/record/mediaBlock/mediaObject/@vurl / The trouble is that vurl is only defined as a child of /record/mediaBlock so my attempt to create id, the uniqueKey fails for the parent document /record I am hacking around with TemplateTransformer.java to sort this but was wondering if there was a good reason for this behavior. Regards. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: spellcheck.onlyMorePopular
On Fri, Feb 13, 2009 at 2:51 PM, Marcus Stratmann stratm...@gmx.de wrote: Shalin Shekhar Mangar wrote: The end goal is to give spelling suggestions. Even if it gave less frequently occurring spelling suggestions, what would you do with it? To give you an example: We have an index for computer games. One title is gran turismo. The word gran is less frequent in the index than grand. So if someone searches for grand turismo there will be no suggestion gran. Unless, I'm misunderstanding something, you need phrase suggestions and not individual suggestions. I mean that you need suggestions for gran turismo and not gran and turismo separately. Did you try using KeywordTokenizer for this spell check field? And to come back to my last question: There seems to be no case in which onlyMorePopular=false makes sense (provided Grant's assumption is correct). Do you see one? Here's a use-case -- you provide a mis-spelled word and you want the closest suggestion by edit distance (frequency does not matter). -- Regards, Shalin Shekhar Mangar.
Re: Problem using DIH templatetransformer to create uniqueKey
the intent was to not to make a partial string if some of the variable are missing probably we can enhance TemplateTransformer by using an extra attribute on the field field column=id template=${jc.fileAbsolutePath}${x.vurl} ignoreMissingVariables=true/ then it can just resolve with whatever is available... On Fri, Feb 13, 2009 at 3:17 PM, Fergus McMenemie fer...@twig.me.uk wrote: Hello, templatetransformer behaves rather ungracefully if one of the replacement fields is missing. I am parsing a single XML document into multiple separate solr documents. It turns out that none of the source documents fields can be used to create a uniqueKey alone. I need to combine two, using template transformer as follows: entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} / field column=fileWebPath regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1 sourceColName=fileAbsolutePath/ field column=id template=${jc.fileAbsolutePath}${x.vurl} / field column=vurl xpath=/record/mediaBlock/mediaObject/@vurl / The trouble is that vurl is only defined as a child of /record/mediaBlock so my attempt to create id, the uniqueKey fails for the parent document /record I am hacking around with TemplateTransformer.java to sort this but was wondering if there was a good reason for this behavior. Regards. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer === -- --Noble Paul
facet count on partial results
Hi Solr, I pass a rather large amount of OR clauses to Solr, ending up with lots and lots of results. It's however only the results above a certain score threadshold that is interesting for me, thus I'd like to only get facet count of the results within the threadshold. How can I do that? karl
Re: spellcheck.onlyMorePopular
Shalin Shekhar Mangar wrote: And to come back to my last question: There seems to be no case in which onlyMorePopular=false makes sense (provided Grant's assumption is correct). Do you see one? Here's a use-case -- you provide a mis-spelled word and you want the closest suggestion by edit distance (frequency does not matter). Hm, when I try searching for grand using onlyMorePopular=false I do not get any results. Same when trying gran. It seems that there will be no results at all when using onlyMorePopular=false. Without onlyMorePopular there are suggestions for both terms, so there are suggestions close enough to the original word(s). Have you tested your example case? Anyway, if you look at it from the user's point of view: The wiki says spellcheck.onlyMorePopular -- Only return suggestions that result in more hits for the query than the existing query. This implies that if onlyMorePopular=false I will get even results with less hits. So when I'm checking grand I would expect to get the suggestion gran which is less frequent in the index. But it seems this is not the case. But even if just the documentation is wrong or unclear: 1) I could not find a case in which onlyMorePopular=false works at all. 2) It would be nice if one could get suggestion with lower frequency than the checked word (which is, to me, what onlyMorePopular=false implies). Thanks, Marcus
Re: spellcheck.onlyMorePopular
On Fri, Feb 13, 2009 at 5:05 PM, Marcus Stratmann stratm...@gmx.de wrote: Hm, when I try searching for grand using onlyMorePopular=false I do not get any results. Same when trying gran. It seems that there will be no results at all when using onlyMorePopular=false. When onlyMorePopular is false and the word you searched exists in the index, it is returned as-is. Therefore if gran and grand are both present in the index, they will be returned as is. Without onlyMorePopular there are suggestions for both terms, so there are suggestions close enough to the original word(s). Have you tested your example case? I am confused by this. Did you mean With onlyMorePopular=true there are suggestions for both terms? Anyway, if you look at it from the user's point of view: The wiki says spellcheck.onlyMorePopular -- Only return suggestions that result in more hits for the query than the existing query. This implies that if onlyMorePopular=false I will get even results with less hits. So when I'm checking grand I would expect to get the suggestion gran which is less frequent in the index. But it seems this is not the case. If onlyMorePopular=true, then the algorithm finds tokens which have greater frequency than the searched term. Among these terms, the one which is closest (by edit distance) is returned. I think I now understand the source of the confusion. onlyMorePopular=true is a special behavior which uses *only* those tokens which have higher frequency than the searched term. onlyMorePopular=false just switches off this special behavior. It does *not* limit suggestions to tokens which have lesser frequency than the searched term. In fact, onlyMorePopular=false does not use frequency of tokens at all. We should document this clearly to avoid such confusions in the future. 2) It would be nice if one could get suggestion with lower frequency than the checked word (which is, to me, what onlyMorePopular=false implies). We could enhance spell checker to do that. But can you please explain your use-case for limiting suggestions to tokens which have lesser frequency? The goal of spell checker is to give suggestions of wrongly spelled words. It was neither designed nor intended to give any other sort of query suggestions. -- Regards, Shalin Shekhar Mangar.
Trouble with solr IndexbasedSpellChecker and FilebasedSpellChecker
Hi folks, I'm using solr 1.3 Here is the relevant section from my solrconfig.xml searchComponent name=spellcheck class=solr.SpellCheckComponent !-- str name=queryAnalyzerFieldTypetextSpell/str -- lst name=spellchecker str name=namedefault/str !-- str name=classnamesolr.IndexBasedSpellChecker/str -- !-- str name=fieldDESC/str -- !-- str name=spellcheckIndexDir./spellchecker/str -- str name=classnamesolr.FileBasedSpellChecker/str str name=sourceLocation/tmp/dct.txt/str str name=spellcheckIndexDir./filespellchecker/str str name=accuracy0.7/str /lst /searchComponent Neither the indexbasedspellchecker, nor the filebasedchecker works for me. What happens is that after I add the spellcheck component in the solrconfig and subsequently restart my resin(3.16) server, my data index gets removed. Before: 47Mwebapps/index/solr/data/index 47Mwebapps/index/solr/data/ After: 12Kwebapps/index/solr/data/index 12Kwebapps/index/solr/data/filespellchecker 28Kwebapps/index/solr/data/ Can anyone clue me into why this is happening? Thanks a lot! Kartik
Get # of docs pending commit
Hi, Is there a way to retrieve the # of documents which are pending commit (when using autocommit)? Thanks, Jacob -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: Problem using DIH templatetransformer to create uniqueKey
Hello, templatetransformer behaves rather ungracefully if one of the replacement fields is missing. Looking at TemplateString.java I see that left to itself fillTokens would replace a missing variable with . It is an extra check in TemplateTransformer that is throwing the warning and stopping the row being returned. Commenting out the check seems to solve my problem. Having done this, an undefined replacement string in TemplateTransformer is replaced with . However a neater fix would probably involve making use of the default value which can be assigned to a row? in schema.xml. I am parsing a single XML document into multiple separate solr documents. It turns out that none of the source documents fields can be used to create a uniqueKey alone. I need to combine two, using template transformer as follows: entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} / field column=fileWebPath regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1 sourceColName=fileAbsolutePath/ field column=id template=${jc.fileAbsolutePath}${x.vurl} / field column=vurl xpath=/record/mediaBlock/mediaObject/@vurl / The trouble is that vurl is only defined as a child of /record/mediaBlock so my attempt to create id, the uniqueKey fails for the parent document /record I am hacking around with TemplateTransformer.java to sort this but was wondering if there was a good reason for this behavior. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: Get # of docs pending commit
Jacob, Regardless of you are using autocommit or manul commit, look at Admin statistics Update Handlers status docsPending. Koji Jacob Singh wrote: Hi, Is there a way to retrieve the # of documents which are pending commit (when using autocommit)? Thanks, Jacob
delete snapshot??
root 26834 16.2 0.0 19412 824 ?S16:05 0:08 rsync -Wa --delete rsync://##.##.##.##:18180/solr/snapshot.20090213160051/ /data/solr/books/data/snapshot.20090213160051-wip Hi obviously it can't delete them because the adress is bad it shouldnt be : rsync://##.##.##.##:18180/solr/snapshot.20090213160051/ but: rsync://##.##.##.##:18180/solr/books/snapshot.20090213160051/ Where should I change this, I checked my script.conf on the slave server but it seems good. Because files can be very big and my server in few hours is getting full. So actually snapcleaner is not necessary on the master ? what about the slave? Thanks a lot, Sunny -- View this message in context: http://www.nabble.com/delete-snapshot---tp21998333p21998333.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheck.onlyMorePopular
Shalin Shekhar Mangar wrote: If onlyMorePopular=true, then the algorithm finds tokens which have greater frequency than the searched term. Among these terms, the one which is closest (by edit distance) is returned. Okay, this is a bit weird, but I think I got it now. Let me try to explain it using my example. When I search for gran (frequency 10) I get the suggestion grand (frequency 17) when using onlyMorePopular=true. When I use onlyMorePopular=false there are no suggestions at all. This is because there are some (rare) terms which are closer to gran than grand, but all of them are not considered, because there frequency is below 10. Is that correct? But then, why isn't grand promoted to first place and returned as a valid suggestion? I think I now understand the source of the confusion. onlyMorePopular=true is a special behavior which uses *only* those tokens which have higher frequency than the searched term. onlyMorePopular=false just switches off this special behavior. It does *not* limit suggestions to tokens which have lesser frequency than the searched term. In fact, onlyMorePopular=false does not use frequency of tokens at all. We should document this clearly to avoid such confusions in the future. I'm still missing the two parameters accuracy and spellcheck.count. Let me try to explain how I (now) think the algorithm works: 1) Take all terms from the index as a basic set. 2) If onlyMorePopular=true remove all terms from the basic set which have a frequency below the frequency of the search term. 3) Sort the basic set in respect of distance to the search term and keep the spellcheck.count terms whith the smallest distance and which are within accuracy. 4) Remove of terms which have a lower frequency than the search term in the case onlyMorePopular=false. 5) Return the remaining terms as suggestions. Point 3 would explain why I do not get any suggestions for gran having onlyMorePopular=false. Nevertheless I think this is a bug since point 3 should take into account the frequency as well and promote suggestions with high enough frequency if suggestion with low frequency are deleted. But this is just my assumption on how the algorithm works which explains why there are no suggestions using onlyMorePopular=false. Maybe I am wrong, but somewhere in the process grand is deleted from the result set. 2) It would be nice if one could get suggestion with lower frequency than the checked word (which is, to me, what onlyMorePopular=false implies). We could enhance spell checker to do that. But can you please explain your use-case for limiting suggestions to tokens which have lesser frequency? The goal of spell checker is to give suggestions of wrongly spelled words. It was neither designed nor intended to give any other sort of query suggestions. An example would be the mentioned grand turismo (regard that in the example above I was searching for gran whereas now I am searching for grand). gran would not be returned as a suggestion because grand is more frequent in the index. And yes, I know, returning a suggestion in this case will be only useful if there is more than one word in the search term. You proposed to use KeywordTokenizer for this case but a) I (again) was not able to find any documentation for this and b) we are working on a different solution for this case using stored search queries. If you are interested, it works like this: For every word in the query get some spell checking suggestions. Combine these and find out if any of these combinations has been search for (successfully) before. Propose the one with the highest (search) frequency. Looks promising so far, but the gran turismo example won't work, since there are too many grands in the index. Thanks, Marcus
Re: Get # of docs pending commit
Hi Koji, Thanks, but I'm trying to get it via a web service, not via the admin interface. Best, Jacob On Fri, Feb 13, 2009 at 8:20 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Jacob, Regardless of you are using autocommit or manul commit, look at Admin statistics Update Handlers status docsPending. Koji Jacob Singh wrote: Hi, Is there a way to retrieve the # of documents which are pending commit (when using autocommit)? Thanks, Jacob -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: Get # of docs pending commit
Jacob, the output of stats.jsp is an XML which you can consume in your program. It is transformed to html using XSL. On Fri, Feb 13, 2009 at 9:09 PM, Jacob Singh jacobsi...@gmail.com wrote: Hi Koji, Thanks, but I'm trying to get it via a web service, not via the admin interface. Best, Jacob On Fri, Feb 13, 2009 at 8:20 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Jacob, Regardless of you are using autocommit or manul commit, look at Admin statistics Update Handlers status docsPending. Koji Jacob Singh wrote: Hi, Is there a way to retrieve the # of documents which are pending commit (when using autocommit)? Thanks, Jacob -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com -- Regards, Shalin Shekhar Mangar.
Re: Get # of docs pending commit
*Jacob Singh feels dumb* Thanks! On Fri, Feb 13, 2009 at 9:14 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Jacob, the output of stats.jsp is an XML which you can consume in your program. It is transformed to html using XSL. On Fri, Feb 13, 2009 at 9:09 PM, Jacob Singh jacobsi...@gmail.com wrote: Hi Koji, Thanks, but I'm trying to get it via a web service, not via the admin interface. Best, Jacob On Fri, Feb 13, 2009 at 8:20 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Jacob, Regardless of you are using autocommit or manul commit, look at Admin statistics Update Handlers status docsPending. Koji Jacob Singh wrote: Hi, Is there a way to retrieve the # of documents which are pending commit (when using autocommit)? Thanks, Jacob -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com -- Regards, Shalin Shekhar Mangar. -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: spellcheck.onlyMorePopular
Fuzzy search should match grand turismo to gran turismo without using spelling suggestions. At Netflix, the first hit for the query grand turismo is the movie Gran Torino and we use fuzzy with Solr. wunder On 2/13/09 3:35 AM, Marcus Stratmann stratm...@gmx.de wrote: Shalin Shekhar Mangar wrote: And to come back to my last question: There seems to be no case in which onlyMorePopular=false makes sense (provided Grant's assumption is correct). Do you see one? Here's a use-case -- you provide a mis-spelled word and you want the closest suggestion by edit distance (frequency does not matter). Hm, when I try searching for grand using onlyMorePopular=false I do not get any results. Same when trying gran. It seems that there will be no results at all when using onlyMorePopular=false. Without onlyMorePopular there are suggestions for both terms, so there are suggestions close enough to the original word(s). Have you tested your example case? Anyway, if you look at it from the user's point of view: The wiki says spellcheck.onlyMorePopular -- Only return suggestions that result in more hits for the query than the existing query. This implies that if onlyMorePopular=false I will get even results with less hits. So when I'm checking grand I would expect to get the suggestion gran which is less frequent in the index. But it seems this is not the case. But even if just the documentation is wrong or unclear: 1) I could not find a case in which onlyMorePopular=false works at all. 2) It would be nice if one could get suggestion with lower frequency than the checked word (which is, to me, what onlyMorePopular=false implies). Thanks, Marcus
Re: Problem using DIH templatetransformer to create uniqueKey
Paul, Following up your usenet sussgetion: field column=id template=${jc.fileAbsolutePath}${x.vurl} ignoreMissingVariables=true/ and to add more to what I was thinking... if the field is undefined in the input document, but the schema.xml does allow a default value, then TemplateTransformer can use the default value. If there is no default value defined in schema.xml then it can fail as at present. This would allow or any other value to be fed into TemplateTransformer, and still enable avoidance of the partial strings you referred to. Regards Fergus. Hello, templatetransformer behaves rather ungracefully if one of the replacement fields is missing. Looking at TemplateString.java I see that left to itself fillTokens would replace a missing variable with . It is an extra check in TemplateTransformer that is throwing the warning and stopping the row being returned. Commenting out the check seems to solve my problem. Having done this, an undefined replacement string in TemplateTransformer is replaced with . However a neater fix would probably involve making use of the default value which can be assigned to a row? in schema.xml. I am parsing a single XML document into multiple separate solr documents. It turns out that none of the source documents fields can be used to create a uniqueKey alone. I need to combine two, using template transformer as follows: entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} / field column=fileWebPath regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1 sourceColName=fileAbsolutePath/ field column=id template=${jc.fileAbsolutePath}${x.vurl} / field column=vurl xpath=/record/mediaBlock/mediaObject/@vurl / The trouble is that vurl is only defined as a child of /record/mediaBlock so my attempt to create id, the uniqueKey fails for the parent document /record I am hacking around with TemplateTransformer.java to sort this but was wondering if there was a good reason for this behavior. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: spellcheck.onlyMorePopular
On Fri, Feb 13, 2009 at 8:46 PM, Marcus Stratmann stratm...@gmx.de wrote: Okay, this is a bit weird, but I think I got it now. Let me try to explain it using my example. When I search for gran (frequency 10) I get the suggestion grand (frequency 17) when using onlyMorePopular=true. When I use onlyMorePopular=false there are no suggestions at all. This is because there are some (rare) terms which are closer to gran than grand, but all of them are not considered, because there frequency is below 10. Is that correct? No. Think of onlyMorePopular as a toggle between whether to consider frequency or not. When you say onlyMorePopular=true, higher frequency terms are considered. When you say onlyMorePopular=false, frequency plays no role at all and gran is returned because according to the spell checker, it exists in the index and is therefore a correctly spelled term. I'm still missing the two parameters accuracy and spellcheck.count. Let me try to explain how I (now) think the algorithm works: 1) Take all terms from the index as a basic set. 2) If onlyMorePopular=true remove all terms from the basic set which have a frequency below the frequency of the search term. 3) Sort the basic set in respect of distance to the search term and keep the spellcheck.count terms whith the smallest distance and which are within accuracy. 4) Remove of terms which have a lower frequency than the search term in the case onlyMorePopular=false. 5) Return the remaining terms as suggestions. Point 3 would explain why I do not get any suggestions for gran having onlyMorePopular=false. Nevertheless I think this is a bug since point 3 should take into account the frequency as well and promote suggestions with high enough frequency if suggestion with low frequency are deleted. But this is just my assumption on how the algorithm works which explains why there are no suggestions using onlyMorePopular=false. Maybe I am wrong, but somewhere in the process grand is deleted from the result set. Point #4 is incorrect. As I said earlier, when onlyMorePopular=false, frequency information is not used and there is no filtering of tokens with respect to frequency. The implementation is a bit more complicated. 1. Read all tokens from the specified field in the solr index. 2. Create n-grams of the terms read in #1 and index them into a separate Lucene index (spellcheck index). 3. When asked for suggestions, create n-grams of the query terms, search the spellcheck index and collects the top (by lucene score) 10*spellcheck.count results. 4. If onlyMorePopular=true, determine frequency of each result in the solr index and remove terms which have lesser frequency. 5. Compute the edit distance between the result and the query token. 6. Return the top spellcheck.count results (sorted by edit distance descending) which are greater than specified accuracy. An example would be the mentioned grand turismo (regard that in the example above I was searching for gran whereas now I am searching for grand). gran would not be returned as a suggestion because grand is more frequent in the index. And yes, I know, returning a suggestion in this case will be only useful if there is more than one word in the search term. You proposed to use KeywordTokenizer for this case but a) I (again) was not able to find any documentation for this and b) we are working on a different solution for this case using stored search queries. If you are interested, it works like this: For every word in the query get some spell checking suggestions. Combine these and find out if any of these combinations has been search for (successfully) before. Propose the one with the highest (search) frequency. Looks promising so far, but the gran turismo example won't work, since there are too many grands in the index. Your primary use-case is not spellcheck at all but this might work with some hacking. Fuzzy queries may be a better solution as Walter said. Storing, all successful search queries may be hard to scale. -- Regards, Shalin Shekhar Mangar.
Re: Get # of docs pending commit
Jacob - note that the results from stats.jsp come back in XML format - which could be used programmatically from a client. Unfortunately the JSP pages don't follow the wt (writer type) parameter that standard request handlers use, but at least it's structured data and not HTML to be scraped. Erik On Feb 13, 2009, at 6:50 AM, Koji Sekiguchi wrote: Jacob, Regardless of you are using autocommit or manul commit, look at Admin statistics Update Handlers status docsPending. Koji Jacob Singh wrote: Hi, Is there a way to retrieve the # of documents which are pending commit (when using autocommit)? Thanks, Jacob
Re: Problem using DIH templatetransformer to create uniqueKey
What about having the template transformer support ${field:default} syntax? I'm assuming it doesn't support that currently right? The replace stuff in the config files does though. Erik On Feb 13, 2009, at 8:17 AM, Fergus McMenemie wrote: Paul, Following up your usenet sussgetion: field column=id template=${jc.fileAbsolutePath}${x.vurl} ignoreMissingVariables=true/ and to add more to what I was thinking... if the field is undefined in the input document, but the schema.xml does allow a default value, then TemplateTransformer can use the default value. If there is no default value defined in schema.xml then it can fail as at present. This would allow or any other value to be fed into TemplateTransformer, and still enable avoidance of the partial strings you referred to. Regards Fergus. Hello, templatetransformer behaves rather ungracefully if one of the replacement fields is missing. Looking at TemplateString.java I see that left to itself fillTokens would replace a missing variable with . It is an extra check in TemplateTransformer that is throwing the warning and stopping the row being returned. Commenting out the check seems to solve my problem. Having done this, an undefined replacement string in TemplateTransformer is replaced with . However a neater fix would probably involve making use of the default value which can be assigned to a row? in schema.xml. I am parsing a single XML document into multiple separate solr documents. It turns out that none of the source documents fields can be used to create a uniqueKey alone. I need to combine two, using template transformer as follows: entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer =DateFormatTransformer,TemplateTransformer,RegexTransformer field column=fileAbsolutePathtemplate=$ {jc.fileAbsolutePath} / field column=fileWebPath regex=$ {dataimporter.request.installdir}(.*) replaceWith=/ford$1 sourceColName=fileAbsolutePath/ field column=id template=$ {jc.fileAbsolutePath}${x.vurl} / field column=vurlxpath=/record/mediaBlock/ mediaObject/@vurl / The trouble is that vurl is only defined as a child of /record/ mediaBlock so my attempt to create id, the uniqueKey fails for the parent document /record I am hacking around with TemplateTransformer.java to sort this but was wondering if there was a good reason for this behavior. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Wildcard query case problem
Hey guys, I getting problems making wildcard query in the form nameSort:Arlin*. If I do such a query, I get 0 results, but when I do nameSort:arlin* I get 310 results from my index. Are wildcard queries case sensitive? This is the searched field config. fieldType name=string_lc class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.ISOLatin1AccentFilterFactory / /analyzer /fieldType -- Alexander Ramos Jardim
Re: Wildcard query case problem
Are you using the same analyzer to queue and index? zayhen wrote: Hey guys, I getting problems making wildcard query in the form nameSort:Arlin*. If I do such a query, I get 0 results, but when I do nameSort:arlin* I get 310 results from my index. Are wildcard queries case sensitive? This is the searched field config. fieldType name=string_lc class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.ISOLatin1AccentFilterFactory / /analyzer /fieldType -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Wildcard-query-case-problem-tp22000692p22001259.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard query case problem
From a post in the archives: Wildcard searches are case-sensitive in Solr. I faced the same issue and handled converting the query string to lower case in my code itself. The filters and analyzers are not applicable for wildcard queries. The searchable mail archive is wonderful G. Best Erick On Fri, Feb 13, 2009 at 12:36 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Are you using the same analyzer to queue and index? zayhen wrote: Hey guys, I getting problems making wildcard query in the form nameSort:Arlin*. If I do such a query, I get 0 results, but when I do nameSort:arlin* I get 310 results from my index. Are wildcard queries case sensitive? This is the searched field config. fieldType name=string_lc class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.ISOLatin1AccentFilterFactory / /analyzer /fieldType -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Wildcard-query-case-problem-tp22000692p22001259.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem using DIH templatetransformer to create uniqueKey
Hmmm. Just gave that a go! No luck But how many layers of defaults do we need? Rgds Fergus What about having the template transformer support ${field:default} syntax? I'm assuming it doesn't support that currently right? The replace stuff in the config files does though. Erik On Feb 13, 2009, at 8:17 AM, Fergus McMenemie wrote: Paul, Following up your usenet sussgetion: field column=id template=${jc.fileAbsolutePath}${x.vurl} ignoreMissingVariables=true/ and to add more to what I was thinking... if the field is undefined in the input document, but the schema.xml does allow a default value, then TemplateTransformer can use the default value. If there is no default value defined in schema.xml then it can fail as at present. This would allow or any other value to be fed into TemplateTransformer, and still enable avoidance of the partial strings you referred to. Regards Fergus. Hello, templatetransformer behaves rather ungracefully if one of the replacement fields is missing. Looking at TemplateString.java I see that left to itself fillTokens would replace a missing variable with . It is an extra check in TemplateTransformer that is throwing the warning and stopping the row being returned. Commenting out the check seems to solve my problem. Having done this, an undefined replacement string in TemplateTransformer is replaced with . However a neater fix would probably involve making use of the default value which can be assigned to a row? in schema.xml. I am parsing a single XML document into multiple separate solr documents. It turns out that none of the source documents fields can be used to create a uniqueKey alone. I need to combine two, using template transformer as follows: entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer =DateFormatTransformer,TemplateTransformer,RegexTransformer field column=fileAbsolutePathtemplate=$ {jc.fileAbsolutePath} / field column=fileWebPath regex=$ {dataimporter.request.installdir}(.*) replaceWith=/ford$1 sourceColName=fileAbsolutePath/ field column=id template=$ {jc.fileAbsolutePath}${x.vurl} / field column=vurlxpath=/record/mediaBlock/ mediaObject/@vurl / The trouble is that vurl is only defined as a child of /record/ mediaBlock so my attempt to create id, the uniqueKey fails for the parent document /record I am hacking around with TemplateTransformer.java to sort this but was wondering if there was a good reason for this behavior. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer === -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: delete snapshot??
The --delete option of the rsync command deletes extraneous files from the destination directory. It does not delete Solr snapshots. To do that you can use the snapcleaner on the master and/or slave. Bill On Fri, Feb 13, 2009 at 10:15 AM, sunnyfr johanna...@gmail.com wrote: root 26834 16.2 0.0 19412 824 ?S16:05 0:08 rsync -Wa --delete rsync://##.##.##.##:18180/solr/snapshot.20090213160051/ /data/solr/books/data/snapshot.20090213160051-wip Hi obviously it can't delete them because the adress is bad it shouldnt be : rsync://##.##.##.##:18180/solr/snapshot.20090213160051/ but: rsync://##.##.##.##:18180/solr/books/snapshot.20090213160051/ Where should I change this, I checked my script.conf on the slave server but it seems good. Because files can be very big and my server in few hours is getting full. So actually snapcleaner is not necessary on the master ? what about the slave? Thanks a lot, Sunny -- View this message in context: http://www.nabble.com/delete-snapshot---tp21998333p21998333.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard query case problem
Thanks for pointing this out to me Erick. 2009/2/13 Erick Erickson erickerick...@gmail.com From a post in the archives: Wildcard searches are case-sensitive in Solr. I faced the same issue and handled converting the query string to lower case in my code itself. The filters and analyzers are not applicable for wildcard queries. The searchable mail archive is wonderful G. Best Erick On Fri, Feb 13, 2009 at 12:36 PM, Marc Sturlese marc.sturl...@gmail.com wrote: Are you using the same analyzer to queue and index? zayhen wrote: Hey guys, I getting problems making wildcard query in the form nameSort:Arlin*. If I do such a query, I get 0 results, but when I do nameSort:arlin* I get 310 results from my index. Are wildcard queries case sensitive? This is the searched field config. fieldType name=string_lc class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.ISOLatin1AccentFilterFactory / /analyzer /fieldType -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Wildcard-query-case-problem-tp22000692p22001259.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim
Re: SolrJ API and XMLResponseParser
Hi Noble, According to the wiki, the following should work: server.setParser(new XMLResponseParser()); However, I don't see that method. The only place I see that method even being declared is in the SolrRequest class but then wiring that up with the SolrServer and getting results wasn't overly obvious to me. Thanks Amit On Fri, Feb 13, 2009 at 12:31 AM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: On Fri, Feb 13, 2009 at 1:16 PM, Amit Nithian anith...@gmail.com wrote: I am using SolrJ from trunk and according to http://wiki.apache.org/solr/Solrj you should be able to set the response parser in the SolrServer interface layer; however, I am unable to do so and I need the XML response support for querying and adding documents to a Solr 1.2 instance. Also, I have changed the XML response for a particular query handler and hence may need to alter the response parser to accommodate these changes. setting a response parser should work , how do you know it does not work? I know that the API is experimental but has trunk's version of the SolrJ API changed with respect to SolrJ wiki? No. the trunk should still work w/ Solr 1.2 Thanks Amit -- --Noble Paul
Re: Problem using DIH templatetransformer to create uniqueKey
On Fri, Feb 13, 2009 at 10:17 AM, Fergus McMenemie fer...@twig.me.uk wrote: Paul, Following up your usenet sussgetion: field column=id template=${jc.fileAbsolutePath}${x.vurl} ignoreMissingVariables=true/ and to add more to what I was thinking... if the field is undefined in the input document, but the schema.xml does allow a default value, then TemplateTransformer can use the default value. If there is no default value defined in schema.xml it is not really useful. Solr would automatically fill up with default values. then it can fail as at present. This would allow or any other value to be fed into TemplateTransformer, and still enable avoidance of the partial strings you referred to. Regards Fergus. Hello, templatetransformer behaves rather ungracefully if one of the replacement fields is missing. Looking at TemplateString.java I see that left to itself fillTokens would replace a missing variable with . It is an extra check in TemplateTransformer that is throwing the warning and stopping the row being returned. Commenting out the check seems to solve my problem. Having done this, an undefined replacement string in TemplateTransformer is replaced with . However a neater fix would probably involve making use of the default value which can be assigned to a row? in schema.xml. I am parsing a single XML document into multiple separate solr documents. It turns out that none of the source documents fields can be used to create a uniqueKey alone. I need to combine two, using template transformer as follows: entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} / field column=fileWebPath regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1 sourceColName=fileAbsolutePath/ field column=id template=${jc.fileAbsolutePath}${x.vurl} / field column=vurl xpath=/record/mediaBlock/mediaObject/@vurl / The trouble is that vurl is only defined as a child of /record/mediaBlock so my attempt to create id, the uniqueKey fails for the parent document /record I am hacking around with TemplateTransformer.java to sort this but was wondering if there was a good reason for this behavior. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer === -- --Noble Paul
Re: Problem using DIH templatetransformer to create uniqueKey
On Fri, Feb 13, 2009 at 11:04 AM, Erik Hatcher e...@ehatchersolutions.com wrote: What about having the template transformer support ${field:default} syntax? this is the only usecase for this. this can be easily achieved with a custom Transformer I'm assuming it doesn't support that currently right? The replace stuff in the config files does though. Erik On Feb 13, 2009, at 8:17 AM, Fergus McMenemie wrote: Paul, Following up your usenet sussgetion: field column=id template=${jc.fileAbsolutePath}${x.vurl} ignoreMissingVariables=true/ and to add more to what I was thinking... if the field is undefined in the input document, but the schema.xml does allow a default value, then TemplateTransformer can use the default value. If there is no default value defined in schema.xml then it can fail as at present. This would allow or any other value to be fed into TemplateTransformer, and still enable avoidance of the partial strings you referred to. Regards Fergus. Hello, templatetransformer behaves rather ungracefully if one of the replacement fields is missing. Looking at TemplateString.java I see that left to itself fillTokens would replace a missing variable with . It is an extra check in TemplateTransformer that is throwing the warning and stopping the row being returned. Commenting out the check seems to solve my problem. Having done this, an undefined replacement string in TemplateTransformer is replaced with . However a neater fix would probably involve making use of the default value which can be assigned to a row? in schema.xml. I am parsing a single XML document into multiple separate solr documents. It turns out that none of the source documents fields can be used to create a uniqueKey alone. I need to combine two, using template transformer as follows: entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field column=fileAbsolutePathtemplate=${jc.fileAbsolutePath} / field column=fileWebPath regex=${dataimporter.request.installdir}(.*) replaceWith=/ford$1 sourceColName=fileAbsolutePath/ field column=id template=${jc.fileAbsolutePath}${x.vurl} / field column=vurl xpath=/record/mediaBlock/mediaObject/@vurl / The trouble is that vurl is only defined as a child of /record/mediaBlock so my attempt to create id, the uniqueKey fails for the parent document /record I am hacking around with TemplateTransformer.java to sort this but was wondering if there was a good reason for this behavior. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer === -- --Noble Paul
Re: SolrJ API and XMLResponseParser
On Fri, Feb 13, 2009 at 9:18 PM, Amit Nithian anith...@gmail.com wrote: Hi Noble, According to the wiki, the following should work: server.setParser(new XMLResponseParser()); I guess it may be a typo. pls referto the javadocs for CommonsHttpSolrServer However, I don't see that method. The only place I see that method even being declared is in the SolrRequest class but then wiring that up with the SolrServer and getting results wasn't overly obvious to me. if a parser is set at the Request level ,that takes precedence. Thanks Amit On Fri, Feb 13, 2009 at 12:31 AM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: On Fri, Feb 13, 2009 at 1:16 PM, Amit Nithian anith...@gmail.com wrote: I am using SolrJ from trunk and according to http://wiki.apache.org/solr/Solrj you should be able to set the response parser in the SolrServer interface layer; however, I am unable to do so and I need the XML response support for querying and adding documents to a Solr 1.2 instance. Also, I have changed the XML response for a particular query handler and hence may need to alter the response parser to accommodate these changes. setting a response parser should work , how do you know it does not work? I know that the API is experimental but has trunk's version of the SolrJ API changed with respect to SolrJ wiki? No. the trunk should still work w/ Solr 1.2 Thanks Amit -- --Noble Paul -- --Noble Paul