Re: Using the ids parameter
Hi, Actually we ran into the same issue with using ids parameter, in the solr front with shards architecture (exception throws in the solr front). Were you able to solve it by using the key:value syntax or some other way? BTW, there was a related issue: https://issues.apache.org/jira/browse/SOLR-1477 but it's marked as Won't Fix, does anyone know why it is so, or if this is planned to be resolved? Dmitry On Tue, Mar 20, 2012 at 11:53 PM, Jamie Johnson jej2...@gmail.com wrote: We're running into an issue where we are trying to use the ids= parameter to return a set of documents given their id. This seems to work intermittently when running in SolrCloud. The first question I have is this something that we should be using or instead should we doing a query with key:? The stack trace that I am getting right now is included below, any thoughts would be appreciated. Mar 20, 2012 5:36:38 PM org.apache.solr.core.SolrCore execute INFO: [slice1_shard1] webapp=/solr path=/select params={hl.fragsize=1ids=4f14cc9b-f669-4d6f-85ae-b22fad143492,urn:uuid:020335a7-1476-43d6-8f91-241bce1e7696,urn:uuid:352473eb-af56-4f6f-94d5-c0096dcb08d4} status=500 QTime=32 Mar 20, 2012 5:36:38 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.NullPointerException at org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardDoc.java:232) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:231) at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:140) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:156) at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:839) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:609) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:332) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Re: StreamingUpdateSolrServer - exceptions not propagated
On 3/26/2012 10:25 PM, Shawn Heisey wrote: The problem is that I currently have no way (that I know of so far) to detect that a problem happened. As far as my code is concerned, everything worked, so it updates my position tracking and those documents will never be inserted. I have not yet delved into the response object to see whether it can tell me anything. My code currently assumes that if no exception was thrown, it was successful. This works with CHSS. I will write some test code that tries out various error situations and see what the response contains. I've written some test code. When doing an add with SUSS against a server that's down, no exception is thrown. It does throw one for query and deleteByQuery. When doing the add test with CHSS, an exception is thrown. I guess I'll just have to use CHSS until this gets fixed, assuming it ever does. Would it be at all helpful to file an issue in jira, or has one already been filed? With a quick search, I could not find one. Thanks, Shawn
Re: possible spellcheck bug in 3.5 causing erroneous suggestions
so any one has a clue what's (might be) going wrong ? or do i have to debug and myself and post a jira issue? PS: unfortunately i cant give anyone the index for testing due to NDA. cheers On 22.03.2012 10:17, tom wrote: same On 22.03.2012 10:00, Markus Jelsma wrote: Can you try spellcheck.q ? On Thu, 22 Mar 2012 09:57:19 +0100, tom dev.tom.men...@gmx.net wrote: hi folks, i think i found a bug in the spellchecker but am not quite sure: this is the query i send to solr: http://lh:8983/solr/CompleteIndex/select? rows=0 echoParams=all spellcheck=true spellcheck.onlyMorePopular=true spellcheck.extendedResults=no q=a+bb+ccc++ and this is the result: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime4/int lst name=params str name=echoParamsall/str str name=spellchecktrue/str str name=echoParamsall/str str name=spellcheck.extendedResultsno/str str name=qa bb ccc /str str name=rows0/str str name=spellcheck.onlyMorePopulartrue/str /lst /lst result name=response numFound=43 start=0 / lst name=spellcheck lst name=suggestions lst name=bb int name=numFound1/int int name=startOffset2/int int name=endOffset4/int arr name=suggestion strabb/str /arr /lst lst name=1 int name=numFound1/int int name=startOffset5/int int name=endOffset8/int arr name=suggestion strccc/str /arr /lst lst name=2 int name=numFound1/int int name=startOffset5/int int name=endOffset8/int arr name=suggestion strccc/str /arr /lst lst name= int name=numFound1/int int name=startOffset10/int int name=endOffset14/int arr name=suggestion strdvd/str /arr /lst /lst /lst /response now, i know this is just a technical query and i have done it for a test regarding suggestions and i discovered the oddity just by chance and was not regarding the test i did: my question is regarding, how the suggestions 1 and 2 come about. from what i understand from the wiki, that the entries in spellcheck/suggestions are only (misspelled) substrings from the user query. the setup/context is thus: - the words a ccc exists 11 times in the index but 1 and 2 dont http://lh:8983/solr/CompleteIndex/terms?terms=onterms.fl=spellterms.prefix=cccterms.mincount=0 responselst name=responseHeaderint name=status0/intint name=QTime1/int/lstlst name=termslst name=spellint name=ccc11/int/lst/lst/response - analyzer for the spellchecker yields the terms as entered, i.e. a|bb|ccc| - the config is thus searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker/str /lst /searchComponent does anyone have a clue what's going on?
Re: Using the ids parameter
So I solved it by using key:(id1 OR ... idn). On Tue, Mar 27, 2012 at 9:14 AM, Dmitry Kan dmitry@gmail.com wrote: Hi, Actually we ran into the same issue with using ids parameter, in the solr front with shards architecture (exception throws in the solr front). Were you able to solve it by using the key:value syntax or some other way? BTW, there was a related issue: https://issues.apache.org/jira/browse/SOLR-1477 but it's marked as Won't Fix, does anyone know why it is so, or if this is planned to be resolved? Dmitry On Tue, Mar 20, 2012 at 11:53 PM, Jamie Johnson jej2...@gmail.com wrote: We're running into an issue where we are trying to use the ids= parameter to return a set of documents given their id. This seems to work intermittently when running in SolrCloud. The first question I have is this something that we should be using or instead should we doing a query with key:? The stack trace that I am getting right now is included below, any thoughts would be appreciated. Mar 20, 2012 5:36:38 PM org.apache.solr.core.SolrCore execute INFO: [slice1_shard1] webapp=/solr path=/select params={hl.fragsize=1ids=4f14cc9b-f669-4d6f-85ae-b22fad143492,urn:uuid:020335a7-1476-43d6-8f91-241bce1e7696,urn:uuid:352473eb-af56-4f6f-94d5-c0096dcb08d4} status=500 QTime=32 Mar 20, 2012 5:36:38 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.NullPointerException at org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardDoc.java:232) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:231) at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:140) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:156) at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:839) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:609) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:332) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) -- Regards, Dmitry Kan
how to store file path in Solr when using TikaEntityProcessor
Hi, I am using DIH to index local file system. But the file path, size and lastmodified field were not stored. in the schema.xml I defined: fields field name=title type=string indexed=true stored=true/ field name=author type=string indexed=true stored=true / !--field name=text type=text indexed=true stored=true / liang added-- field name=path type=string indexed=true stored=true / field name=size type=long indexed=true stored=true / field name=lastmodified type=date indexed=true stored=true / /fields And also defined tika-data-config.xml: dataConfig dataSource name=bin type=BinFileDataSource / document entity name=f dataSource=null rootEntity=false processor=FileListEntityProcessor baseDir=E:/my_project/ecmkit/infotouch fileName=.*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt) onError=skip recursive=true entity name=tika-test dataSource=bin processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=text onError=skip field column=Author name=author meta=true/ field column=title name=title meta=true/ !-- field column=text name=text/ -- field column=fileAbsolutePath name=path / field column=fileSize name=size / field column=fileLastModified name=lastmodified / /entity /entity /document /dataConfig The Solr version is 3.5. any idea? Thanks in advance. Liang
Re: Client-side failover with SolrJ
I rediscover the world every day, thanks for this. -- View this message in context: http://lucene.472066.n3.nabble.com/Client-side-failover-with-SolrJ-tp3858461p3860700.html Sent from the Solr - User mailing list archive at Nabble.com.
CLOSE_WAIT connections
Hi list, I have looked into the CLOSE_WAIT problem and created an issue with a patch to fix this. A search for CLOSE_WAIT shows that there are many Apache projects hit by this problem. https://issues.apache.org/jira/browse/SOLR-3280 Can someone recheck the patch (it belongs to SnapPuller) and give the OK for release? The patch is against branch_3x (3.6). Regards Bernd
Re: StreamingUpdateSolrServer - exceptions not propagated
Like I said, you have to extend the class and override the error method. Sent from my iPhone On Mar 27, 2012, at 2:29 AM, Shawn Heisey s...@elyograg.org wrote: On 3/26/2012 10:25 PM, Shawn Heisey wrote: The problem is that I currently have no way (that I know of so far) to detect that a problem happened. As far as my code is concerned, everything worked, so it updates my position tracking and those documents will never be inserted. I have not yet delved into the response object to see whether it can tell me anything. My code currently assumes that if no exception was thrown, it was successful. This works with CHSS. I will write some test code that tries out various error situations and see what the response contains. I've written some test code. When doing an add with SUSS against a server that's down, no exception is thrown. It does throw one for query and deleteByQuery. When doing the add test with CHSS, an exception is thrown. I guess I'll just have to use CHSS until this gets fixed, assuming it ever does. Would it be at all helpful to file an issue in jira, or has one already been filed? With a quick search, I could not find one. Thanks, Shawn
Re: document inside document?
For your tagging, think about using multiValued=true with an increment gap of, say, 100. Then your searches on this field can be phrase queries with a smaller slop e.g. tall woman~90 would match, but purse gucci~90 would not because purse and gucci are not within 90 tokens of each other. As far as the metadata is concerned, this is just specifying which fields should be queried, see the qf parameter in edismax. As far as fieldType, spend some time with admin/analysis to understand the kinds that various tokenizers and filters do, your question is really too broad to answer. I'd start with one of the text types and iterate. Grouping on primary key is a pretty useless thing to do, what is your use case? And you'll just have to get used to denormalizing data with Solr/Lucene, which is hard for a DB person, it just feels icky G.. Best Erick On Mon, Mar 26, 2012 at 3:00 PM, sam ” skyn...@gmail.com wrote: Hey, I am making an image search engine where people can tag images with various items that are themselves tagged. For example, http://example.com/abc.jpg is tagged with the following three items: - item1 that is tagged with: tall blond woman - item2 that is tagged with: yellow purse - item3 that is tagged with: gucci red dress Querying for +yellow +purse will return the example image. But, querying for +gucci +purse will not because the image does not have an item tagged with both gucci and purse. In addition to items, each image has various metadata such as alt text, location, description, photo credit.. etc that should be available for search. How should I write my schema.xml ? If imageUrl is primary key, do I implement my own fieldType for items, so that I can write: field name=items type=myItemType multiValued=true/ What would myItemType look like so that solr would know the example image will not be part of the query, +gucci +purse?? If itemId is primary key, I can use result grouping ( http://wiki.apache.org/solr/FieldCollapsing). But, I need to repeat alt text and other image metadata for each item. Or, should I create different schema for item search and metadata search? Thanks. Sam.
Re: Solr cores issue
It might be administratively easier to have multiple webapps, but it shouldn't really matter as far as I know... Best Erick On Tue, Mar 27, 2012 at 12:22 AM, Sujatha Arun suja.a...@gmail.com wrote: yes ,I must have mis-copied and yes, i do have the conf folder per core with schema etc ... Because of this issue ,we have decided to have multiple webapps with about 50 cores per webapp ,instead of one singe webapp with all 200 cores ,would this make better sense ? what would be your suggestion? Regards Sujatha On Tue, Mar 27, 2012 at 12:07 AM, Erick Erickson erickerick...@gmail.comwrote: Shouldn't be. What do your log files say? You have to treat each core as a separate index. In other words, you need to have a core#/conf with the schema matching your core#/data/index directory etc. I suspect you've simply mis-copied something. Best Erick On Mon, Mar 26, 2012 at 8:27 AM, Sujatha Arun suja.a...@gmail.com wrote: I was migrating to cores from webapp ,and I was copying a bunch of indexes from webapps to respective cores ,when I restarted ,I had this issue where the whole webapp with the cores would not startup and was getting index corrupted message.. In this scenario or in a scenario where there is an issue with schema /config file for one core ,will the whole webapp with the cores not restart? Regards Sujatha On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson erickerick...@gmail.com wrote: Index corruption is very rare, can you provide more details how you got into that state? Best Erick On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun suja.a...@gmail.com wrote: Hello, Suppose I have several cores in a single webapp ,I have issue with Index beong corrupted in one core ,or schema /solrconfig of one core is not well formed ,then entire webapp refused to load on server restart? Why does this happen? Regards Sujatha
preventing words from being indexed in spellcheck dictionary?
hello all, i am creating a spellcheck dictionary from the itemDescSpell field in my schema. is there a way to prevent certain words from entering the dictionary - as the dictionary is being built? thanks for any help mark // snipped from solarconfig.xml lst name=spellchecker str name=namedefault/str str name=fielditemDescSpell/str str name=buildOnOptimizetrue/str str name=spellcheckIndexDirspellchecker_mark/str -- View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861472.html Sent from the Solr - User mailing list archive at Nabble.com.
dataImportHandler: delta query fetching data, not just ids?
It seems that delta import works in 2 steps, first query fetches the ids of the modified entries, then second query fetches the actual data. entity name=item pk=ID query=select * from item deltaImportQuery=select * from item where ID='${dataimporter.delta.id}' deltaQuery=select id from item where last_modified gt; '${dataimporter.last_index_time}' entity name=feature pk=ITEM_ID query=select description as features from feature where item_id='${item.ID}' /entity entity name=item_category pk=ITEM_ID, CATEGORY_ID query=select CATEGORY_ID from item_category where ITEM_ID='${item.ID}' entity name=category pk=ID query=select description as cat from category where id = '${item_category.CATEGORY_ID}' /entity /entity I am aware that there's a workaround: http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport But still, to clarify, and make sure I have up-to-date info how Solr works: 1. Is it possible to fetch the modified data with a single SQL query using deltaImportQuery, as in: deltaImportQuery=select * from item where last_modified gt; '${dataimporter.last_index_time}'? 2. If not - what's the reason delta import is implemented like it is? Why split it in two queries? I would think having a single delta query that fetches the data would be kind of an obvious design unless there's something that calls for 2 separate queries...?
Re: how to store file path in Solr when using TikaEntityProcessor
I am using DIH to index local file system. But the file path, size and lastmodified field were not stored. in the schema.xml I defined: fields field name=title type=string indexed=true stored=true/ field name=author type=string indexed=true stored=true / !--field name=text type=text indexed=true stored=true / liang added-- field name=path type=string indexed=true stored=true / field name=size type=long indexed=true stored=true / field name=lastmodified type=date indexed=true stored=true / /fields And also defined tika-data-config.xml: dataConfig dataSource name=bin type=BinFileDataSource / document entity name=f dataSource=null rootEntity=false processor=FileListEntityProcessor baseDir=E:/my_project/ecmkit/infotouch fileName=.*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt) onError=skip recursive=true entity name=tika-test dataSource=bin processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=text onError=skip field column=Author name=author meta=true/ field column=title name=title meta=true/ !-- field column=text name=text/ -- field column=fileAbsolutePath name=path / field column=fileSize name=size / field column=fileLastModified name=lastmodified / /entity /entity /document /dataConfig The Solr version is 3.5. any idea? The implicit fields fileDir, file, fileAbsolutePath, fileSize, fileLastModified are generated by the FileListEntityProcessor. They should be defined above the TikaEntityProcessor.
Re: dataImportHandler: delta query fetching data, not just ids?
2. If not - what's the reason delta import is implemented like it is? Why split it in two queries? I would think having a single delta query that fetches the data would be kind of an obvious design unless there's something that calls for 2 separate queries...? I think this is it? https://issues.apache.org/jira/browse/SOLR-811
Re: StreamingUpdateSolrServer - exceptions not propagated
On 3/26/2012 6:43 PM, Mark Miller wrote: It doesn't get thrown because that logic needs to continue - you don't necessarily want one bad document to stop all the following documents from being added. So the exception is sent to that method with the idea that you can override and do what you would like. I've written sample code around stopping and throwing an exception, but I guess its not totally trivial. Other ideas for reporting errors have been thrown around in the past, but no work on it has gotten any traction. It looks like StreamingUpdateSolrServer is not meant for situations where strict error checking is required. I think the documentation should reflect that. Would you be opposed to a javadoc update at the class level (plus a wiki addition) like the following? Because document inserts are handled as background tasks, exceptions and errors that occur during those operations will not be available to the calling program, but they will be logged. For example, if the Solr server is down, your program must determine this on its own. If you need strict error handling, use CommonsHttpSolrServer. If my wording is bad, feel free to make suggestions. If I'm wrong and you do have an example of an error handling override that would do what I need, I would love to see it. From what I can tell, add requests are pushed down and handled by Runner threads, completely disconnected from the request. The response to add calls always seems to be a NOTE element saying the request is processed in a background stream, even if successful. Thanks, Shawn
RE: possible spellcheck bug in 3.5 causing erroneous suggestions
It might be easier to know what's going on if you provide some snippets from solrconfig.xml and schema.xml. But my guess is that in your solrconfig.xml, under the spellcheck searchComponent either the queryAnalyzerFieldType or the fieldType (one level down) is set to a field that is removing numbers or otherwise modifying the tokens on analysis. The reason is that your query contained ccc but it says that 1 is a misspelled word in your query. Typically you want a simple analysis chain that just tokenizes on whitespace and little else for spellchecking. With that said, I wouldn't be surprised if this was a bug as we've had problems in the past with words containing numbers, dashes and the like. If you become convinced you've found a bug, would you be able to write a failing unit test and post it on JIRA? See http://wiki.apache.org/solr/HowToContribute for more information. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: tom [mailto:dev.tom.men...@gmx.net] Sent: Tuesday, March 27, 2012 2:31 AM To: solr-user@lucene.apache.org Subject: Re: possible spellcheck bug in 3.5 causing erroneous suggestions so any one has a clue what's (might be) going wrong ? or do i have to debug and myself and post a jira issue? PS: unfortunately i cant give anyone the index for testing due to NDA. cheers On 22.03.2012 10:17, tom wrote: same On 22.03.2012 10:00, Markus Jelsma wrote: Can you try spellcheck.q ? On Thu, 22 Mar 2012 09:57:19 +0100, tom dev.tom.men...@gmx.net wrote: hi folks, i think i found a bug in the spellchecker but am not quite sure: this is the query i send to solr: http://lh:8983/solr/CompleteIndex/select? rows=0 echoParams=all spellcheck=true spellcheck.onlyMorePopular=true spellcheck.extendedResults=no q=a+bb+ccc++ and this is the result: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime4/int lst name=params str name=echoParamsall/str str name=spellchecktrue/str str name=echoParamsall/str str name=spellcheck.extendedResultsno/str str name=qa bb ccc /str str name=rows0/str str name=spellcheck.onlyMorePopulartrue/str /lst /lst result name=response numFound=43 start=0 / lst name=spellcheck lst name=suggestions lst name=bb int name=numFound1/int int name=startOffset2/int int name=endOffset4/int arr name=suggestion strabb/str /arr /lst lst name=1 int name=numFound1/int int name=startOffset5/int int name=endOffset8/int arr name=suggestion strccc/str /arr /lst lst name=2 int name=numFound1/int int name=startOffset5/int int name=endOffset8/int arr name=suggestion strccc/str /arr /lst lst name= int name=numFound1/int int name=startOffset10/int int name=endOffset14/int arr name=suggestion strdvd/str /arr /lst /lst /lst /response now, i know this is just a technical query and i have done it for a test regarding suggestions and i discovered the oddity just by chance and was not regarding the test i did: my question is regarding, how the suggestions 1 and 2 come about. from what i understand from the wiki, that the entries in spellcheck/suggestions are only (misspelled) substrings from the user query. the setup/context is thus: - the words a ccc exists 11 times in the index but 1 and 2 dont http://lh:8983/solr/CompleteIndex/terms?terms=onterms.fl=spellterms.prefix=cccterms.mincount=0 responselst name=responseHeaderint name=status0/intint name=QTime1/int/lstlst name=termslst name=spellint name=ccc11/int/lst/lst/response - analyzer for the spellchecker yields the terms as entered, i.e. a|bb|ccc| - the config is thus searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker/str /lst /searchComponent does anyone have a clue what's going on?
RE: preventing words from being indexed in spellcheck dictionary?
If the list of words isn't very long, you can add a StopFilter to the analysis for itemDescSpell and put the words you don't want in the stop list. If you want to prevent low-occuring words from being sued as corrections, use the thresholdTokenFrequency in your spellcheck configuration. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: geeky2 [mailto:gee...@hotmail.com] Sent: Tuesday, March 27, 2012 9:07 AM To: solr-user@lucene.apache.org Subject: preventing words from being indexed in spellcheck dictionary? hello all, i am creating a spellcheck dictionary from the itemDescSpell field in my schema. is there a way to prevent certain words from entering the dictionary - as the dictionary is being built? thanks for any help mark // snipped from solarconfig.xml lst name=spellchecker str name=namedefault/str str name=fielditemDescSpell/str str name=buildOnOptimizetrue/str str name=spellcheckIndexDirspellchecker_mark/str -- View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861472.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: StreamingUpdateSolrServer - exceptions not propagated
On Mar 27, 2012, at 10:51 AM, Shawn Heisey wrote: On 3/26/2012 6:43 PM, Mark Miller wrote: It doesn't get thrown because that logic needs to continue - you don't necessarily want one bad document to stop all the following documents from being added. So the exception is sent to that method with the idea that you can override and do what you would like. I've written sample code around stopping and throwing an exception, but I guess its not totally trivial. Other ideas for reporting errors have been thrown around in the past, but no work on it has gotten any traction. It looks like StreamingUpdateSolrServer is not meant for situations where strict error checking is required. I think the documentation should reflect that. Would you be opposed to a javadoc update at the class level (plus a wiki addition) like the following? Because document inserts are handled as background tasks, exceptions and errors that occur during those operations will not be available to the calling program, but they will be logged. For example, if the Solr server is down, your program must determine this on its own. If you need strict error handling, use CommonsHttpSolrServer. If my wording is bad, feel free to make suggestions. If I'm wrong and you do have an example of an error handling override that would do what I need, I would love to see it. From what I can tell, add requests are pushed down and handled by Runner threads, completely disconnected from the request. The response to add calls always seems to be a NOTE element saying the request is processed in a background stream, even if successful. Thanks, Shawn I'm not saying what it's meant for, I'm just saying what it is. Currently, the only thing you can do to check for errors is override that method. I understand it's still somewhat limiting - it depends on your use case how well it can work. For example, I've know people that just want to stop the update process if a doc fails, and throw an exception. You can write code to do that by extending the class and overriding handleError. You can also collection the exceptions, count the fails, read and parse any error messages, etc. It doesn't help you with an ID or anything though - unless you get unluck/lucky and can parse it out of error messages (if it's even in them). It might be more useful if you could set the name of an id field for it to look for and perhaps also dump to that method. Their have been previous conversations about improving error reporting for this SolrServer, but no work has ever really gotten off the ground. There may be existing JIRA issues around this topic - certainly there are previous email threads. All and all though, please, make all the suggestions and JIRA issues you want. Javadoc improvements can be submitted as patches through JIRA as well. Also, the Wiki is open to anyone to update. - Mark Miller lucidimagination.com
Re: StreamingUpdateSolrServer - exceptions not propagated
https://issues.apache.org/jira/browse/SOLR-445 This JIRA reflects the slightly different case of wanting better reporting of *which* document failed in a multi-document packet, it doesn't specifically address SUSS. But it might serve to give you some ideas if you tackle this. On Tue, Mar 27, 2012 at 11:14 AM, Mark Miller markrmil...@gmail.com wrote: On Mar 27, 2012, at 10:51 AM, Shawn Heisey wrote: On 3/26/2012 6:43 PM, Mark Miller wrote: It doesn't get thrown because that logic needs to continue - you don't necessarily want one bad document to stop all the following documents from being added. So the exception is sent to that method with the idea that you can override and do what you would like. I've written sample code around stopping and throwing an exception, but I guess its not totally trivial. Other ideas for reporting errors have been thrown around in the past, but no work on it has gotten any traction. It looks like StreamingUpdateSolrServer is not meant for situations where strict error checking is required. I think the documentation should reflect that. Would you be opposed to a javadoc update at the class level (plus a wiki addition) like the following? Because document inserts are handled as background tasks, exceptions and errors that occur during those operations will not be available to the calling program, but they will be logged. For example, if the Solr server is down, your program must determine this on its own. If you need strict error handling, use CommonsHttpSolrServer. If my wording is bad, feel free to make suggestions. If I'm wrong and you do have an example of an error handling override that would do what I need, I would love to see it. From what I can tell, add requests are pushed down and handled by Runner threads, completely disconnected from the request. The response to add calls always seems to be a NOTE element saying the request is processed in a background stream, even if successful. Thanks, Shawn I'm not saying what it's meant for, I'm just saying what it is. Currently, the only thing you can do to check for errors is override that method. I understand it's still somewhat limiting - it depends on your use case how well it can work. For example, I've know people that just want to stop the update process if a doc fails, and throw an exception. You can write code to do that by extending the class and overriding handleError. You can also collection the exceptions, count the fails, read and parse any error messages, etc. It doesn't help you with an ID or anything though - unless you get unluck/lucky and can parse it out of error messages (if it's even in them). It might be more useful if you could set the name of an id field for it to look for and perhaps also dump to that method. Their have been previous conversations about improving error reporting for this SolrServer, but no work has ever really gotten off the ground. There may be existing JIRA issues around this topic - certainly there are previous email threads. All and all though, please, make all the suggestions and JIRA issues you want. Javadoc improvements can be submitted as patches through JIRA as well. Also, the Wiki is open to anyone to update. - Mark Miller lucidimagination.com
RE: preventing words from being indexed in spellcheck dictionary?
thank you very much for the info ;) -- View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3861987.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud with Tomcat and external Zookeeper, does it work?
Hi Vadim, I too am experimenting with SolrCloud and need help with setting it up using Tomcat as the java servlet container. While searching for help on this question, I found another thread in the solr-mailing-list that is helpful. In case you haven't seen this thread that I found, please search the solr-mailing-list for: SolrCloud new You can also view it at nabble using this link: http://lucene.472066.n3.nabble.com/SolrCloud-new-td1528872.html Best, Jerry M. On Wed, Mar 21, 2012 at 5:51 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hello folks, i read the SolrCloud Wiki and Bruno Dumon's blog entry with his First Exploration of SolrCloud. Examples and a first setup with embedded Jetty and ZK WORKS without problems. I tried to setup my own configuration with Tomcat and an external Zookeeper(my Master-ZK), but it doesn't work really. My setup: - latest Solr version from trunk - Tomcat 6 - external ZK - Target: 1 Server, 1 Tomcat, 1 Solr instance, 2 collections with different config/schema What i tried: -- 1. After checkout i build solr(ant run-example), it works. --- 2. I send my config/schema files to external ZK with Jetty: java -Djetty.port=8080 -Dbootstrap_confdir=/root/solrCloud/conf/ -Dcollection.configName=conf1 -DzkHost=master-zk:2181 -jar start.jar it works, too. --- 3. I create my (empty, without cores)solr.xml, like Bruno: http://www.ngdata.com/site/blog/57-ng.html#disqus_thread --- 4. I started my Tomcat, and get the first error: in UI: This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml: !-- Admin Handlers - This will register all the standard admin RequestHandlers. -- requestHandler name=/admin/ class=solr.admin.AdminHandlers / Admin request Handlers are definitely activated in my solrconfig. I get this error only with the latest trunk versions, with r1292064 from February not. Sometimes it works with the new version, sometimes not and i get this error. -- 5. Ok , it it works, after few restarts, i changed my JAVA_OPTS for Tomcat and added this: -DzkHost=master-zk:2181 Next Error: This The web application [/solr2] appears to have started a thread named [main-SendThread(master-zk:2181)] but has failed to stop it. This is very likely to create a memory leak. Exception in thread Thread-2 java.lang.NullPointerException at org.apache.solr.cloud.Overseer$CloudStateUpdater.amILeader(Overseer.java:179) at org.apache.solr.cloud.Overseer$CloudStateUpdater.run(Overseer.java:104) at java.lang.Thread.run(Thread.java:662) 15.03.2012 13:25:17 org.apache.catalina.loader.WebappClassLoader loadClass INFO: Illegal access: this web application instance has been stopped already. Could not load org.apache.zookeeper.server.ZooTrace. The eventual following stack trace is caused by an error thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access, and has no functional impact. java.lang.IllegalStateException at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1531) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1491) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1196) 15.03.2012 13:25:17 org.apache.coyote.http11.Http11Protocol destroy - 6. Ok, we assume, that the first steps works, and i would create new cores and my 2 collections. My requests with CoreAdminHandler are ok, my solr.xml looks like this: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores zkClientTimeout=1 hostPort=8080 hostContext=solr core name=shard1_data collection=col1 shard=shard1 instanceDir=xxx/ / core name=shard2_data collection=col2 shard=shard2 instanceDir=xx2/ / /cores /solr Now i get the following exception: ...couldn't find conf name for collection1... I don't have an collection 1. Why this exception? --- You can see, there are too many exceptions and eventually configuration problems with Tomcat and an external ZK. Has anyone set up an identical configuration and does it work? Does anyone detect mistakes in my configuration steps? Best regards Vadim
Re: First steps with Solr
I've had the same problem and my solution was to... #set($pName = #field('name')) #set($pName = $pName.trim()) Marcelo Carvalho Fernandes +55 21 8272-7970 +55 21 2205-2786 On Mon, Mar 26, 2012 at 3:24 PM, henri.gour...@laposte.net henri.gour...@laposte.net wrote: trying to play with javascript to clean-up my URL!! Context is velocity Suggestions? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858959.html Sent from the Solr - User mailing list archive at Nabble.com.
Auto-complete phrase
Hello, I am working on creating a auto-complete functionality for my field merchant_name present all over my documents. I am using the version 3.4 of Solr and I am trying to take advantage of the Suggester functionality. Unfortunately so far I didn't figure out how to make it works as I expected. If my list of merchants present in my documents is:(my real list is bigger than the following list, that's the reason why I don't use dictionnary and also because it will change often.) Redoute Suisse Trois Conforama But Cult Beauty Brother Trois I expect from the Suggester component to match words or part of them and return phrases where words or part of them have been matched. for example with /suggest?q=tro, I would like to get this: response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=spellcheck lst name=suggestions lst name=tro int name=numFound2/int int name=startOffset0/int int name=endOffsetx/int arr name=suggestion strBother Trois/str strSuisse Trois/str /arr /lst /lst /lst /response I experimented suggestion on a field configured with the tokenizer solr.KeywordTokenizerFactory or solr.WhitespaceTokenizerFactory. In my mind I have to find a way to handle 3 cases: /suggest?q=bo -(should return) bother trois /suggest?q=tro -(should return) bother trois, suisse trois /suggest?q=bo%20tro -(should return) bother trois With the solr.KeywordTokenizerFactory I get: /suggest?q=bo - bother trois /suggest?q=tro - nothing /suggest?q=bo%20tro - nothing With the solr.WhitespaceTokenizerFactory I get: /suggest?q=bo - bother /suggest?q=troi - trois /suggest?q=bo%20tro - bother, trois Not exactly what I want ... :( My configuration in the file solrconfig.xml for the suggester component: searchComponent class=solr.SpellCheckComponent name=suggestMerchant lst name=spellchecker str name=namesuggestMerchant/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str !-- Alternatives to lookupImpl: org.apache.solr.spelling.suggest.fst.FSTLookup [finite state automaton] org.apache.solr.spelling.suggest.fst.WFSTLookupFactory [weighted finite state automaton] org.apache.solr.spelling.suggest.jaspell.JaspellLookup [default, jaspell-based] org.apache.solr.spelling.suggest.tst.TSTLookup [ternary trees] -- str name=fieldmerchant_name_autocomplete/str !-- the indexed field to derive suggestions from -- float name=threshold0.0/float str name=buildOnCommittrue/str !-- str name=sourceLocationamerican-english/str -- /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest/merchant lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggestMerchant/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str int name=spellcheck.maxCollations10/int /lst arr name=components strsuggestMerchant/str /arr /requestHandler How can I implement autocomplete with the Suggester component to get what I expect? Thanks for your help, I really appreciate.
Re: Using the ids parameter
Yes, sorry for the delay, we now do q=key:(key1 key2...) and that works properly. On Tue, Mar 27, 2012 at 3:53 AM, Dmitry Kan dmitry@gmail.com wrote: So I solved it by using key:(id1 OR ... idn). On Tue, Mar 27, 2012 at 9:14 AM, Dmitry Kan dmitry@gmail.com wrote: Hi, Actually we ran into the same issue with using ids parameter, in the solr front with shards architecture (exception throws in the solr front). Were you able to solve it by using the key:value syntax or some other way? BTW, there was a related issue: https://issues.apache.org/jira/browse/SOLR-1477 but it's marked as Won't Fix, does anyone know why it is so, or if this is planned to be resolved? Dmitry On Tue, Mar 20, 2012 at 11:53 PM, Jamie Johnson jej2...@gmail.com wrote: We're running into an issue where we are trying to use the ids= parameter to return a set of documents given their id. This seems to work intermittently when running in SolrCloud. The first question I have is this something that we should be using or instead should we doing a query with key:? The stack trace that I am getting right now is included below, any thoughts would be appreciated. Mar 20, 2012 5:36:38 PM org.apache.solr.core.SolrCore execute INFO: [slice1_shard1] webapp=/solr path=/select params={hl.fragsize=1ids=4f14cc9b-f669-4d6f-85ae-b22fad143492,urn:uuid:020335a7-1476-43d6-8f91-241bce1e7696,urn:uuid:352473eb-af56-4f6f-94d5-c0096dcb08d4} status=500 QTime=32 Mar 20, 2012 5:36:38 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.NullPointerException at org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardDoc.java:232) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:231) at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:140) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:156) at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:839) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:630) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:609) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:332) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) -- Regards, Dmitry Kan
Re: First steps with Solr
Note that the VelocityResponseWriter puts a tool in the context to escape various things. See the Velocity Context section here: http://wiki.apache.org/solr/VelocityResponseWriter. That'll take you to this http://velocity.apache.org/tools/releases/1.4/generic/EscapeTool.html You can do $esc.url($some_variable) to URL encode _pieces_ of a URL. You can see the use of $esc in VM_global_library.vm and some of the other templates that ship with Solr. Erik On Mar 27, 2012, at 10:00 , Marcelo Carvalho Fernandes wrote: I've had the same problem and my solution was to... #set($pName = #field('name')) #set($pName = $pName.trim()) Marcelo Carvalho Fernandes +55 21 8272-7970 +55 21 2205-2786 On Mon, Mar 26, 2012 at 3:24 PM, henri.gour...@laposte.net henri.gour...@laposte.net wrote: trying to play with javascript to clean-up my URL!! Context is velocity Suggestions? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/First-steps-with-Solr-tp3858406p3858959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why my highlights are wrong(one character offset)?
Does anyone know it is a bug or not? I use Ngram in my index. fieldType name=text_general_rev class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=5 maxGramSize=5/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=text_general_2NGram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=2/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType ... field name=sequence type=text_general_rev indexed=true stored=true termVectors=true termPositions=true termOffsets=true/ -- View this message in context: http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3862326.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why my highlights are wrong(one character offset)?
Can you reproduce the problem with latest trunk? Does anyone know it is a bug or not? I use Ngram in my index. fieldType name=text_general_rev class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=5 maxGramSize=5/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=text_general_2NGram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=2/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType ... field name=sequence type=text_general_rev indexed=true stored=true termVectors=true termPositions=true termOffsets=true/ -- View this message in context: http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3862326.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: preventing words from being indexed in spellcheck dictionary?
hello, should i apply the StopFilterFactory at index time or query time. right now - per the schema below - i am applying it at BOTH index time and query time. is this correct? thank you, mark // snipped from schema.xml field name=itemDescSpell type=textSpell/ fieldType name=textSpell class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3862722.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: preventing words from being indexed in spellcheck dictionary?
Assuming you're just using this field for spellcheck and not for queries, then it doesn't matter. But the correct way to do it is to have it in both places. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: geeky2 [mailto:gee...@hotmail.com] Sent: Tuesday, March 27, 2012 3:42 PM To: solr-user@lucene.apache.org Subject: RE: preventing words from being indexed in spellcheck dictionary? hello, should i apply the StopFilterFactory at index time or query time. right now - per the schema below - i am applying it at BOTH index time and query time. is this correct? thank you, mark // snipped from schema.xml field name=itemDescSpell type=textSpell/ fieldType name=textSpell class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/preventing-words-from-being-indexed-in-spellcheck-dictionary-tp3861472p3862722.html Sent from the Solr - User mailing list archive at Nabble.com.
Unload(true) doesn't delele Index file when unloading a core
From what I understand, isn't the index file deletion an expected result? Thanks public int drop(, boolean removeIndex) === removeIndex passed in as true throws Exception { String coreName = . Unload req = new Unload(removeIndex); req.setCoreName(coreName); SolrServer adminServer = buildAdminServer(); ... return req.process(adminServer).getStatus(); === removes reference to solr core in solr.xml but doesn't delete the index file } -- View this message in context: http://lucene.472066.n3.nabble.com/Unload-true-doesn-t-delele-Index-file-when-unloading-a-core-tp3862816p3862816.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why my highlights are wrong(one character offset)?
My current version is solr 3.5. It should be the most updated. -- View this message in context: http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3862872.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why my highlights are wrong(one character offset)?
How does your sequence field look like in schema.xml, fieldType and field? And what version are you using? koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/03/27 13:06), neosky wrote: all of my highlights has one character mistake in the offset,some fragments from my response. Thanks! response lst name=responseHeader int name=status0/int int name=QTime259/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flsequence/str str name=wt/ str name=hltrue/str str name=rows10/str str name=version2.2/str str name=fl*,score/str str name=hl.useFastVectorHighlightertrue/str str name=start0/str str name=qsequence:NGNFN/str str name=qt/ str name=fq/ /lst /lst lst name=highlighting lst name= B9SUS0 arr name=sequence strTSQSELemSNGNF/emNRRPKIELSNFDGNHPKTWIRKC/str /arr /lst lst name= Q01GW2 arr name=sequence strGENTREemRNGNF/emNSLTRERSFAELENHPPKVRRNGSEG/str /arr /lst lst name= C5L0V0 arr name=sequence strEGRYPCemNNGNF/emNLTTGRCVCEKNYVHLIYEDRI/str /arr /lst lst name= C4JX93 arr name=sequence strYAEENYemINGNF/emNEEPY/str /arr /lst lst name= D7CK80 arr name=sequence strKEVADDemCNGNF/emNQPTGVRI/str /arr /lst /lst /response -- View this message in context: http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860283p3860283.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: how to store file path in Solr when using TikaEntityProcessor
Could you please show me how to get those values inside TikaEntityProcessor? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: 2012年3月27日 22:43 To: solr-user@lucene.apache.org Subject: Re: how to store file path in Solr when using TikaEntityProcessor I am using DIH to index local file system. But the file path, size and lastmodified field were not stored. in the schema.xml I defined: fields field name=title type=string indexed=true stored=true/ field name=author type=string indexed=true stored=true / !--field name=text type=text indexed=true stored=true / liang added-- field name=path type=string indexed=true stored=true / field name=size type=long indexed=true stored=true / field name=lastmodified type=date indexed=true stored=true / /fields And also defined tika-data-config.xml: dataConfig dataSource name=bin type=BinFileDataSource / document entity name=f dataSource=null rootEntity=false processor=FileListEntityProcessor baseDir=E:/my_project/ecmkit/infotouch fileName=.*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt) onError=skip recursive=true entity name=tika-test dataSource=bin processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=text onError=skip field column=Author name=author meta=true/ field column=title name=title meta=true/ !-- field column=text name=text/ -- field column=fileAbsolutePath name=path / field column=fileSize name=size / field column=fileLastModified name=lastmodified / /entity /entity /document /dataConfig The Solr version is 3.5. any idea? The implicit fields fileDir, file, fileAbsolutePath, fileSize, fileLastModified are generated by the FileListEntityProcessor. They should be defined above the TikaEntityProcessor.
Solr with UIMA
I am having a hard time integrating UIMA with Solr. I have downloaded the Solr 3.5 dist and have it successfully running with nutch and tika on windows 7 using solrcell and curl via cygwin. To begin, I copied the 6 jars from solr/contrib/uima/lib to the working /lib in solr. Next, I read the readme.txt file in solr/contrib/uima/lib and edited both my solrconfig.xml and schema.xml accordingly to no avail. I then found this link which seemed a bit more applicable since I didnt care to use Alchemy or OpenCalais: http://code.google.com/a/apache-extras.org/p/rondhuit-uima/?redir=1 Still- when I run a curl command that imports a pdf via solrcell I do not get the additional UIMA fields nor do I get anything on my logs. The test.pdf is parsed though and I see the pdf in Solr using: curl 'http://localhost:8080/solr/update/extract?fmap.content=contentliteral.id=doc1commit=true' -F file=@test.pdf What I added to my SolrConfig.XML: /updateRequestProcessorChain name=uima processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig lst name=runtimeParameters /lst str name=analysisEngineC:\web\solrcelluimacrawler\com\rondhuit\uima\desc\KeyphraseExtractAnnotatorDescriptor.xml/str bool name=ignoreErrorstrue/bool str name=logFieldid/str lst name=analyzeFields bool name=mergefalse/bool arr name=fields strcontent/str /arr /lst lst name=fieldMappings lst name=type str name=namecom.rondhuit.uima.yahoo.Keyphrase/str lst name=mapping str name=featurekeyphrase/str str name=fieldUIMAname/str /lst /lst /lst /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain / I also adjusted my requestHander: /requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processoruima/str /lst /requestHandler/ Finally, my added entries in my Schema.xml / field name=UIMAname type=string indexed=true stored=true multiValued=true required=false/ dynamicField name=*_sm type=string indexed=true stored=true/ / All I am trying to do is have test *any* UIMA AE in Solr and cannot figure out what I am doing wrong. Thank you in advance for reading this. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3863324.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: StreamingUpdateSolrServer - exceptions not propagated
On 3/27/2012 11:14 AM, Mark Miller wrote: On Mar 27, 2012, at 10:51 AM, Shawn Heisey wrote: On 3/26/2012 6:43 PM, Mark Miller wrote: It doesn't get thrown because that logic needs to continue - you don't necessarily want one bad document to stop all the following documents from being added. So the exception is sent to that method with the idea that you can override and do what you would like. I've written sample code around stopping and throwing an exception, but I guess its not totally trivial. Other ideas for reporting errors have been thrown around in the past, but no work on it has gotten any traction. It looks like StreamingUpdateSolrServer is not meant for situations where strict error checking is required. I think the documentation should reflect that. Would you be opposed to a javadoc update at the class level (plus a wiki addition) like the following? Because document inserts are handled as background tasks, exceptions and errors that occur during those operations will not be available to the calling program, but they will be logged. For example, if the Solr server is down, your program must determine this on its own. If you need strict error handling, use CommonsHttpSolrServer. If my wording is bad, feel free to make suggestions. It might make sense to accumulate the errors in a fixed-size queue and report them either when the queue fills up or when the client commits (assuming the commit will wait for all outstanding inserts to complete or fail). This is what we do client-side when performing multi-threaded inserts. Sounds great in theory, I think, but then I haven't delved in to SUSS at all ... just a suggestion, take it or leave it. Actually I wonder whether SUSS is necessary of you do the threading client-side? You might get a similar perf gain; I know we see a substantial speedup that way. because then your updates spawn multiple threads in the server anyway, don't they? - Mike
Re: Auto-complete phrase
I am also very confused at the use case for the Suggester component. With collate on, it will try to combine random words together not the actual phrases that are there. I get better mileage out of EDGE grams and tokenize on whitespace... Left to right... Since that is how most people think. However, I would like Suggester to work as follows: Index: Chris Smith Tony Dawson Chris Leaf Daddy Golucky Query: 1. Chris it returns Chris Leaf but not both Chris Smith and Chris Leaf. 2. I seem to get collated (take first work and combine with second word). SO I would see things like Smith Leaf Very strange and not what we expect. These are formal names. When I use Ngrams I can index: C Ch Chr Chri Chris S Sm Smi Smit Smith Thus if I search on Smi it will match Chris Smith and also Chris Leaf. Exactly what I want. On Tue, Mar 27, 2012 at 11:05 AM, Rémy Loubradou r...@hipsnip.com wrote: Hello, I am working on creating a auto-complete functionality for my field merchant_name present all over my documents. I am using the version 3.4 of Solr and I am trying to take advantage of the Suggester functionality. Unfortunately so far I didn't figure out how to make it works as I expected. If my list of merchants present in my documents is:(my real list is bigger than the following list, that's the reason why I don't use dictionnary and also because it will change often.) Redoute Suisse Trois Conforama But Cult Beauty Brother Trois I expect from the Suggester component to match words or part of them and return phrases where words or part of them have been matched. for example with /suggest?q=tro, I would like to get this: response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=spellcheck lst name=suggestions lst name=tro int name=numFound2/int int name=startOffset0/int int name=endOffsetx/int arr name=suggestion strBother Trois/str strSuisse Trois/str /arr /lst /lst /lst /response I experimented suggestion on a field configured with the tokenizer solr.KeywordTokenizerFactory or solr.WhitespaceTokenizerFactory. In my mind I have to find a way to handle 3 cases: /suggest?q=bo -(should return) bother trois /suggest?q=tro -(should return) bother trois, suisse trois /suggest?q=bo%20tro -(should return) bother trois With the solr.KeywordTokenizerFactory I get: /suggest?q=bo - bother trois /suggest?q=tro - nothing /suggest?q=bo%20tro - nothing With the solr.WhitespaceTokenizerFactory I get: /suggest?q=bo - bother /suggest?q=troi - trois /suggest?q=bo%20tro - bother, trois Not exactly what I want ... :( My configuration in the file solrconfig.xml for the suggester component: searchComponent class=solr.SpellCheckComponent name=suggestMerchant lst name=spellchecker str name=namesuggestMerchant/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str !-- Alternatives to lookupImpl: org.apache.solr.spelling.suggest.fst.FSTLookup [finite state automaton] org.apache.solr.spelling.suggest.fst.WFSTLookupFactory [weighted finite state automaton] org.apache.solr.spelling.suggest.jaspell.JaspellLookup [default, jaspell-based] org.apache.solr.spelling.suggest.tst.TSTLookup [ternary trees] -- str name=fieldmerchant_name_autocomplete/str !-- the indexed field to derive suggestions from -- float name=threshold0.0/float str name=buildOnCommittrue/str !-- str name=sourceLocationamerican-english/str -- /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest/merchant lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggestMerchant/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str int name=spellcheck.maxCollations10/int /lst arr name=components strsuggestMerchant/str /arr /requestHandler How can I implement autocomplete with the Suggester component to get what I expect? Thanks for your help, I really appreciate. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: dataImportHandler: delta query fetching data, not just ids?
How did it work before SOLR-811 update? I don't understand. Did it fetch delta data with two queries (1. gets ids, 2. gets data per each id) or did it fetch all delta data with a single query? On Tue, Mar 27, 2012 at 5:45 PM, Ahmet Arslan iori...@yahoo.com wrote: 2. If not - what's the reason delta import is implemented like it is? Why split it in two queries? I would think having a single delta query that fetches the data would be kind of an obvious design unless there's something that calls for 2 separate queries...? I think this is it? https://issues.apache.org/jira/browse/SOLR-811