Re: Solr Join with Dismax
Thanks Hoss! Here it is: https://issues.apache.org/jira/browse/SOLR-2972 On Wed, Dec 14, 2011 at 4:47 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I have been doing more tracing in the code. And I think that I understand a : bit more. The problem does not seem to be dismax+join, but : dismax+join+fromIndex. Correct. join+dismax works fine as i already demonstrated... : Note: even with that hardcoded lucene bug, you can still override the : default by using var dereferencing to point at another param with it's own : localparams specying the type... : :qf=text name :q={!join from=manu_id_s to=id v=$qq} :qq={!dismax}ipod ...the problem you are refering to now has nothing to do with dismax, and is specificly a bug in how the query is parsed when fromIndex is used (which i thought i already mentioned in this thread but i see you found independently)... https://issues.apache.org/jira/browse/SOLR-2824 Did you file a Jira about defaulting to lucene instead of null so we can make the defType local param syntax work? I havne't seen it in my email but it's really an unrelated problem so it should be tracked seperately) -Hoss -- Pascal Dimassimo Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: Solr Join with Dismax
Thanks Hoss! But unfortunately, the dismax parameters (like qf) are not passed over to the fromIndex. In fact, even if using var dereferencing makes Dismax to be selected as the fromQueryParser, the query that is passed to the JoinQuery object contains nothing to indicate that it should use dismax. The following code is from the method createParser in JoinQParserPlugin.java: // With var dereferencing, this makes the fromQueryParser to be dismax QParser fromQueryParser = subQuery(v, lucene); // But after the call to getQuery, there is no indication that dismax should be used Query fromQuery = fromQueryParser.getQuery(); JoinQuery jq = new JoinQuery(fromField, toField, fromIndex, fromQuery); So I guess that as it is right now, dismax can't really be used with joins. On Fri, Dec 9, 2011 at 3:20 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Is there a specific reason why it is hard-coded to use the lucene : QParser? I was looking at JoinQParserPlugin.java and here it is in : createParser: : : QParser fromQueryParser = subQuery(v, lucene); : : I could pass another param named fromQueryParser and use it instead of : lucene. But again, is there a reason why I should not do that? It's definitley a bug, but we don't need a new local param: that hardcoded lucene should just be replaced with null, so that the defType local param will be checked (just like it can in the BoostQParser)... qf=text name q={!join from=manu_id_s to=id defType=dismax}ipod Note: even with that hardcoded lucene bug, you can still override the default by using var dereferencing to point at another param with it's own localparams specying the type... qf=text name q={!join from=manu_id_s to=id v=$qq} qq={!dismax}ipod -Hoss -- Pascal Dimassimo Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: Solr Join with Dismax
Hi, I have been doing more tracing in the code. And I think that I understand a bit more. The problem does not seem to be dismax+join, but dismax+join+fromIndex. When doing this joined dismax query on the same index: http://localhost:8080/solr/gutenberg/select?q={!join+from=id+to=id+v=$qq}qq={!dismax+qf='body%20tag ^2'}solr the query returned by the method fromQueryParser.getQuery looks like this: +(body:solr | tag:solr^2.0) But when doing the same query across another core: http://localhost:8080/solr/test/select/?q={!join+fromIndex=gutenberg+from=id+to=id+v=$qq}qq={!dismax+qf='body%20tag ^2'}solr the query is: +(body:solr) We see that the second field defined in the qf param is not added to the query. Tracing deeper shows that this happens because the tag field does not exist in the test core, hence it is not added. This can be seen in SolrPluginUtils.java in the method getFieldQuery. All the fields not part of the current index won't be added to the query. So the conclusion does not seem to be that dismax can't be used with joins, but that it can't be used with another core that does not have the same fields than the one where the initial query is made. I just notice SOLR-2824. So it is really a bug. I'll take the time to look at the patch attached to this ticket. On Wed, Dec 14, 2011 at 2:55 PM, Pascal Dimassimo pascal.dimass...@sematext.com wrote: Thanks Hoss! But unfortunately, the dismax parameters (like qf) are not passed over to the fromIndex. In fact, even if using var dereferencing makes Dismax to be selected as the fromQueryParser, the query that is passed to the JoinQuery object contains nothing to indicate that it should use dismax. The following code is from the method createParser in JoinQParserPlugin.java: // With var dereferencing, this makes the fromQueryParser to be dismax QParser fromQueryParser = subQuery(v, lucene); // But after the call to getQuery, there is no indication that dismax should be used Query fromQuery = fromQueryParser.getQuery(); JoinQuery jq = new JoinQuery(fromField, toField, fromIndex, fromQuery); So I guess that as it is right now, dismax can't really be used with joins. On Fri, Dec 9, 2011 at 3:20 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Is there a specific reason why it is hard-coded to use the lucene : QParser? I was looking at JoinQParserPlugin.java and here it is in : createParser: : : QParser fromQueryParser = subQuery(v, lucene); : : I could pass another param named fromQueryParser and use it instead of : lucene. But again, is there a reason why I should not do that? It's definitley a bug, but we don't need a new local param: that hardcoded lucene should just be replaced with null, so that the defType local param will be checked (just like it can in the BoostQParser)... qf=text name q={!join from=manu_id_s to=id defType=dismax}ipod Note: even with that hardcoded lucene bug, you can still override the default by using var dereferencing to point at another param with it's own localparams specying the type... qf=text name q={!join from=manu_id_s to=id v=$qq} qq={!dismax}ipod -Hoss -- Pascal Dimassimo Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -- Pascal Dimassimo Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: Solr Join with Dismax
Hi, Is there a specific reason why it is hard-coded to use the lucene QParser? I was looking at JoinQParserPlugin.java and here it is in createParser: QParser fromQueryParser = subQuery(v, lucene); I could pass another param named fromQueryParser and use it instead of lucene. But again, is there a reason why I should not do that? If it is ok, I could submit a patch. Thanks. On Tue, Dec 6, 2011 at 1:20 PM, Pascal Dimassimo pascal.dimass...@sematext.com wrote: Hi, I was trying Solr Join across 2 cores on the same Solr installation. Per example: /solr/index1/select?q={!join fromIndex=index2 from=tag to=tag}restaurant My understanding is that the restaurant query will be executed on index2 and the results of this query will be joined with the documents of index1 by matching the tag field. According to my tests, It looks like the restaurant query will always be parsed using the Lucene QParser. I did not find a way to use another QParser, like Dismax. Am I right or is there a way? Thanks! -- Pascal Dimassimo Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -- Pascal Dimassimo Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Solr Join with Dismax
Hi, I was trying Solr Join across 2 cores on the same Solr installation. Per example: /solr/index1/select?q={!join fromIndex=index2 from=tag to=tag}restaurant My understanding is that the restaurant query will be executed on index2 and the results of this query will be joined with the documents of index1 by matching the tag field. According to my tests, It looks like the restaurant query will always be parsed using the Lucene QParser. I did not find a way to use another QParser, like Dismax. Am I right or is there a way? Thanks! -- Pascal Dimassimo Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: Solr Join with Dismax
Hi, Thanks for this! But your partner-tmo request handler is probably configured with your ing-content index, no? In my case, I'd like to execute a dismax query on the fromIndex. On Tue, Dec 6, 2011 at 2:57 PM, Jeff Schmidt j...@535consulting.com wrote: Hi Pascal: I have an issue similar to yours, but also need to facet the joined documents... I've been playing with various things. There's not much documentation I can find. Looking at http://wiki.apache.org/solr/Join, in the fourth example you can see the join being relegated to a filter query: http://localhost:8983/solr/select?q=ipodfl=*,scoresort=score+descfq={!join+from=id+to=manu_id_s}compName_s:Belkin So, I figured if you can do that, why not specify the query handler (qt). When I issue this query for my application: http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type:nodeq=brca1fq={!join+from=conceptId+to=id+fromIndex=partner-tmo}*:*debugQuery=truerows=5fl=id,n_type,n_name My configured edismax based request handler is named partner-tmo, and with debugQuery=true I can see the query being handled by that handler: str name=parsedquery_toString+(n_pathway_namePartial:brca1^4.25 | n_pathway_name:brca1^8.5 | n_macromolecule_id:brca1^9.0 | n_m_s_macc:brca1^6.0 | n_go_id:brca1^6.0 | n_go_term:brca1^4.0 | n_cellreg_regulates:brca1 | n_acc_id_sp:brca1^9.5 | n_m_s_mseq:brca1^6.0 | n_namePartial:brca1^5.0 | n_synonymPartial:brca1^4.85 | n_neighborof_process:brca1^2.0 | n_acc_id_rs_mrna:brca1^9.5 | n_tissue_typePartial:brca1^4.0 | n_c_iupac:brca1 | n_member_name:brca1^9.7 | n_c_cas_number:brca1^2.0 | n_c_pubchem_cid:brca1^4.0 | n_protein_family:brca1^7.0 | n_cellreg_regulated_by:brca1 | n_go_termPartial:brca1 | n_function:brca1^7.0 | n_ref_author:brca1 | n_name:brca1^9.9 | n_m_s_mirbase_family_name:brca1^4.0 | n_protein_familyPartial:brca1^3.5 | n_type:brca1^2.0 | p_name:brca1^8.0 | n_m_s_mname:brca1^6.0 | n_c_systematic:brca1^2.0 | n_ref_source_id:brca1^4.0 | n_macromolecule_name:brca1^9.8 | n_c_formula:brca1^4.0 | n_memberof_name:brca1^9.7 | n_neighborof_name:brca1^6.0 | p_class:brca1^0.1 | n_cellreg_diseasePartial:brca1^4.5 | n_m_m_sacc:brca1^6.0 | n_ref_title:brca1^1.1 | n_m_acc:brca1^9.5 | n_acc_id_ug:brca1^9.5 | n_cellreg_binds:brca1 | n_synonym:brca1^9.7 | n_acc_id:brca1^9.5 | n_macromolecule_namePartial:brca1^4.9 | n_macromolecule_summary:brca^0.6 | n_m_s_mirbase_comments:brca1^0.6 | p_description:brca^7.0 | n_m_m_sname:brca1^6.0 | p_nameExact:brca1^10.0 | n_m_s_mirbase_family_acc:brca1^8.0 | n_tissue_type:brca1^7.0 | n_eg_id:brca1^9.5 | n_cellreg_disease:brca1^9.0 | n_typePartial:brca1^3.25 | p_classExact:brca1^1.5 | n_m_rna_target_name:brca1^3.0 | n_m_m_sseq:brca1^6.0 | n_acc_id_rs_prot:brca1^9.5 | n_m_seq:brca1^9.0 | n_cellreg_role_in_cellPartial:brca1^3.75 | n_memberof_namePartial:brca1^4.85 | n_cellreg_role_in_cell:brca1^7.5)~0.1 ()/str I know, that's a lot fields to be searching. :) Anyway, I'm still working on figuring out the join results. It is doing something according to the debug output: lst name=join lst name={!join from=conceptId to=id fromIndex=partner-tmo}*:* long name=time737/long int name=fromSetSize1593981/int int name=toSetSize63021/int int name=fromTermCount63021/int long name=fromTermTotalDf63021/long int name=fromTermDirectCount62351/int int name=fromTermHits63021/int long name=fromTermHitsTotalDf63021/long int name=toTermHits63021/int long name=toTermHitsTotalDf63021/long int name=toTermDirectCount62871/int int name=smallSetsDeferred150/int long name=toSetDocsAdded63021/long /lst /lst I'm not sure how much this helps you, but it looks like you can combine join with [e]dismax. Cheers, Jeff On Dec 6, 2011, at 11:20 AM, Pascal Dimassimo wrote: Hi, I was trying Solr Join across 2 cores on the same Solr installation. Per example: /solr/index1/select?q={!join fromIndex=index2 from=tag to=tag}restaurant My understanding is that the restaurant query will be executed on index2 and the results of this query will be joined with the documents of index1 by matching the tag field. According to my tests, It looks like the restaurant query will always be parsed using the Lucene QParser. I did not find a way to use another QParser, like Dismax. Am I right or is there a way? Thanks! -- Pascal Dimassimo Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068 -- Pascal Dimassimo Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: Documents disappearing
Hi, hossman wrote: : We index using 4 processes that read from a queue of documents. Each process : send one document at a time to the /update handler. Hmmm.. then you should have a message from the LogUpdateProcessorFactory for every individual add command that was recieved ... did you crunch those to see if anything odd popped up (ie: duplicated IDs) what did the start commit log messages look like? (FWIW: I have no hunches as to what caused that behavior, i'm just scrounging for more data) A quick check did show me a couple of duplicates, but if I understand correctly, even if two different process send the same document, the last one should update the previous. If I send the same documents 10 times, in the end, it should only be in my index once, no? The start commit message is always: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) hossman wrote: : Yes, I double checked that no delete occur. Since that indexation, I : re-index the same set of documents twice and we always end up with 7725 : documents, but it did not show that ~1 documents count that we saw the : first time. But the difference between the first indexation and the others : was that the first time, the indexation last a couple of hours because the : documents were not always accessible in our document queue. The others Hmmm... what exactly does yout indexing code do when the documents aren't available? ... and what happens if you forcibly commit in the middle of reindexing (to see some of those counts again) If no document is available, the threads are sleeping. If a commit is send manually during the re-indexation, it just commit what has been sent to the index so far. I will redo the test with the same documents and in the same conditions as in our first indexation to see if the counts will be the same again. Again, thanks a lot for your help. -- View this message in context: http://old.nabble.com/Documents-disappearing-tp27659047p27794641.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SpellCheck configuration
Did you try the setting 'onlyMorePopular' ? http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular André Maldonado wrote: Hi all. I'm configuring spell checking in my index. Everything is working, but I want to get the best suggestion based in number of ocurrences, and not in the way Solr defines. So,, let me giva an example: Query: apartamemy Suggestions: arr name=suggestion strapartamemto/str strapartameto/str strapartameno/str strapartament/str strapartamen/str strapartamentyo/str strapartamento/str strapartamente/str strapartametos/str strapartametno/str /arr The best suggestion, by solr, is apartamemto, but the really best suggestion (based on number of occurrences) is apartamento. How can I change this? Thanks Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És verdadeiramente o Filho de Deus. (Mateus 14:33) -- View this message in context: http://old.nabble.com/SpellCheck-configuration-tp27743634p27747165.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto suggestion
By 'AutoSuggestion', are you referring to the spellcheck handler? If so, you have to rebuild your spellcheck index using the 'build' parameter after you add new data. You can also configure the spellcheck module to rebuild the index automatically after a commit or an optimize. http://wiki.apache.org/solr/SpellCheckComponent Suram wrote: Hi AutoSuggestion not found for newly indexed data ,how can i configure that anyone help me Thans in advance -- View this message in context: http://old.nabble.com/Auto-suggestion-tp27718858p27747260.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documents disappearing
Hoss, Thanks for your answers. You are absolutely right, I should have provided you more details. We index using 4 processes that read from a queue of documents. Each process send one document at a time to the /update handler. Yes, I double checked that no delete occur. Since that indexation, I re-index the same set of documents twice and we always end up with 7725 documents, but it did not show that ~1 documents count that we saw the first time. But the difference between the first indexation and the others was that the first time, the indexation last a couple of hours because the documents were not always accessible in our document queue. The others times, the documents were all available so it took around 20 minutes to re-index all documents. So there we no time for an auto-commit to happen during the others indexation so the log never shows the newSearcher warming query that I use as a document count. About the newSearcher warming query, it is a typo in the config. It should have been 'qt'. Thanks for this one! In my schema.xml, I have define the id ans signature fields like this: field name=id type=string indexed=true stored=true required=true / field name=signature type=string indexed=true stored=true/ ... uniqueKeyid/uniqueKey defaultSearchFieldfulltext/defaultSearchField And here is our solrconfig.xml: ?xml version=1.0 encoding=UTF-8 ? config abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError indexDefaults useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor ramBufferSizeMB32/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypesingle/lockType /indexDefaults mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB32/ramBufferSizeMB mergeFactor10/mergeFactor maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength unlockOnStartupfalse/unlockOnStartup /mainIndex updateHandler class=solr.DirectUpdateHandler2 !-- Perform a commit/ automatically under certain conditions: maxDocs - number of updates since last commit is greater than this maxTime - oldest uncommited update (in ms) is this long ago -- autoCommit maxDocs1/maxDocs maxTime180/maxTime /autoCommit /updateHandler query maxBooleanClauses1024/maxBooleanClauses filterCache class=solr.FastLRUCache size=1048576 initialSize=4096 autowarmCount=1024/ queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=128/ documentCache class=solr.FastLRUCache size=1048576 initialSize=512 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoading queryResultWindowSize50/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached HashDocSet maxSize=3000 loadFactor=0.75/ listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=q*:*/str str name=sortoriginal_date desc/str /lst lst str name=q*:*/str str name=wtdismax/str /lst lst str name=q*:*/str str name=facettrue/str str name=facet.fieldsource/str str name=facet.fieldauthor/str str name=facet.fieldtype/str str name=facet.fieldsite/str /lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lst str name=q*:*/str str name=sortoriginal_date desc/str /lst lst str name=q*:*/str str name=wtdismax/str /lst lst str name=q*:*/str str name=facettrue/str str name=facet.fieldsource/str str name=facet.fieldauthor/str str name=facet.fieldtype/str str name=facet.fieldsite/str /lst /arr /listener useColdSearcherfalse/useColdSearcher maxWarmingSearchers2/maxWarmingSearchers /query requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=2048 / httpCaching lastModifiedFrom=openTime etagSeed=Solr /httpCaching /requestDispatcher requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters --
Re: Multicore Example
Are you sure that you don't have any java processes that are still running? Did you change the port or are you still using 8983? Lee Smith-6 wrote: Hey All Trying to dip my feet into multicore and hoping someone can advise why the example is not working. Basically I have been working with the example single core fine so I have stopped the server and restarted with the new command line for multicore ie, java -Dsolr.solr.home=multicore -jar start.jar When it launches I get this error: 2010-02-19 11:13:39.740::WARN: EXCEPTION java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at etc Any ideas what this can be because I have stopped the first one. Thank you if you can advise. -- View this message in context: http://old.nabble.com/Multicore-Example-tp27659052p27659102.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Documents disappearing
Using LukeRequestHandler, I see: int name=numDocs7725/int int name=maxDoc28099/int int name=numTerms758826/int long name=version1266355690710/long bool name=optimizedfalse/bool bool name=currenttrue/bool bool name=hasDeletionstrue/bool str name=directory org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/opt/solr/myindex/data/index /str I will copy the index to my local machine so I can open it with luke. Should I look for something specific? Thanks! ANKITBHATNAGAR wrote: Try inspecting your index with luke Ankit -Original Message- From: Pascal Dimassimo [mailto:thesuper...@hotmail.com] Sent: Friday, February 19, 2010 2:22 PM To: solr-user@lucene.apache.org Subject: Documents disappearing Hi, I have encounter a situation that I can't explain. We are indexing documents that are often duplicates so we activated deduplication like this: processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool bool name=overwriteDupestrue/bool str name=signatureFieldsignature/str str name=fieldstitle,text/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor What I can't explain is that when I look at the documents count in the log, I see documents disappearing. 11:24:23 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=0 status=0 QTime=0 14:04:24 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=4065 status=0 QTime=10 14:17:07 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=6499 status=0 QTime=42 14:25:42 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7629 status=0 QTime=1 14:47:12 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=10140 status=0 QTime=12 15:17:22 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=10861 status=0 QTime=13 15:47:31 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=9852 status=0 QTime=19 16:17:42 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=8112 status=0 QTime=13 16:38:17 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=10 16:39:10 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=1 16:47:40 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=46 16:51:24 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=74 17:02:13 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=102 17:17:41 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=8 11:24 was the time at which Solr was started that day. Around 13:30, we started the indexation. At some point during the indexation, I notice that a batch a documents were resend (i.e, documents with the same id field were sent again to the index). And according to the log, NO delete was sent to Solr. I understand that if I send duplicates (either documents with the same id or with the same signature), the count of documents should stay the same. But how can we explain that it is lowering? What are the possible causes of this behavior? Thanks! -- View this message in context: http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Documents-disappearing-tp27659047p27660077.html Sent from the Solr - User mailing list archive at Nabble.com.
KStem download
Hi, I want to try KStem. I'm following the instructions on this page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem ... but the download link doesn't work. Is anyone know the new location to download KStem? -- View this message in context: http://www.nabble.com/KStem-download-tp24375856p24375856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrj : probleme with utf-8 content
Hi, I have that problem to. But I notice that it only happens if I send my data via solrj. If I send it via the solr-ruby gem, everything is fine (http://wiki.apache.org/solr/solr-ruby). Here is my jruby script: --- require 'rubygems' require 'solr' require 'rexml/document' include Java def send_via_solrj(text, url) doc = org.apache.solr.common.SolrInputDocument.new doc.addField('id', '1') doc.addField('text', text) server = org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.new(url) server.add(doc); server.commit(); end def send_via_gem(text, url) solr_doc = Solr::Document.new solr_doc['id'] = '2' solr_doc['text'] = text options = { :autocommit = :on } conn = Solr::Connection.new(url, options) conn.add(solr_doc) end host = 'localhost' port = '' path = '/solr/core0' url = http://#{host}:#{port}#{path}; text = eaiou with circumflexes: êâîôû send_via_solrj(text, url) send_via_gem(text, url) puts done! --- If I watch the http messages with tcpmon, I see that the data sent via solrj is encoded in cp1252 while the data sent via the gem is utf-8. Anyone has an idea of how we can configure sorlj to send in utf-8? Thanks in advance. Walid ABDELKABIR wrote: when executing this code I got in my index the field includes with this value : ? ? ? : --- String content =eaiou with circumflexes: êâîôû; SolrInputDocument doc = new SolrInputDocument(); doc.addField( id, 123, 1.0f ); doc.addField( includes, content, 1.0f ); server.add( doc ); --- but this code works fine : --- String addContent = adddoc boost=1.0 +field name=id123/fieldfield name=includeseaiou with circumflexes:âîôû/field +/doc/add; DirectXmlRequest up = new DirectXmlRequest( /update, addContent ); server.request( up ); --- thanks for help -- View this message in context: http://www.nabble.com/solrj-%3A-probleme-with-utf-8-content-tp22577377p22620317.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrj : probleme with utf-8 content
yes, now it works fine with the trunk sources thanks! Noble Paul നോബിള് नोब्ळ् wrote: SOLR-973 seems to have caused the problem On Fri, Mar 20, 2009 at 11:01 PM, Ryan McKinley ryan...@gmail.com wrote: do you know if your java file is encoded with utf-8? sometimes it will be encoded as something different and that can cause funny problems.. On Mar 18, 2009, at 7:46 AM, Walid ABDELKABIR wrote: when executing this code I got in my index the field includes with this value : ? ? ? : --- String content =eaiou with circumflexes: êâîôû; SolrInputDocument doc = new SolrInputDocument(); doc.addField( id, 123, 1.0f ); doc.addField( includes, content, 1.0f ); server.add( doc ); --- but this code works fine : --- String addContent = adddoc boost=1.0 +field name=id123/fieldfield name=includeseaiou with circumflexes:âîôû/field +/doc/add; DirectXmlRequest up = new DirectXmlRequest( /update, addContent ); server.request( up ); --- thanks for help -- --Noble Paul -- View this message in context: http://www.nabble.com/solrj-%3A-probleme-with-utf-8-content-tp22577377p22627715.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom handler that forwards a request to another core
My problem was that the XMLResponseWriter is using the searcher of the original request to get the matching documents (in the method writeDocList of the class XMLWriter). Since the DocList contains id from the index of the second core, there were not valid in the index of the core receiving the request. To circumvent the problem, I implement a custom response writer. This is not a problem for my project since I have to return a custom format. Pascal Dimassimo wrote: Hi, I'm writing a custom handler that forwards a request to a handler of another core. The custom handler is defined in core0 and the core I try to send the request to is core2 which has a mlt handler. Here is the code of my custom handler (extends RequestHandlerBase and implements SolrCoreAware): public void inform(SolrCore core) { this.core = core; this.cores = core.getCoreDescriptor().getCoreContainer(); this.multiCoreHandler = cores.getMultiCoreHandler(); } public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse response) throws Exception { SolrCore coreToRequest = cores.getCore(core2); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(q, Lucene); params.set(mlt.fl, body); params.set(debugQuery, true); request = new LocalSolrQueryRequest(coreToRequest, params); SolrRequestHandler mlt = coreToRequest.getRequestHandler(/mlt); coreToRequest.execute(mlt, request, response); coreToRequest.close(); } I'm calling this handler from firefox with this url (the path of my custom handler is /nlt): http://localhost:8080/solr/core0/nlt With my debugger, I can see, after the execute() method is executed, this line in the log: 13-Mar-2009 4:25:59 PM org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/nlt params={} webapp=null path=null params={q=Lucenemlt.fl=bodydebugQuery=true} status=0 QTime=125 Which seems logical: the core2 is executing the request (though I'm wondering how core2 knows about the /nlt path) After, I let the debugger resume the program and I see those lines: 13-Mar-2009 4:25:59 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/nlt params={} webapp=null path=null params={q=Lucenemlt.fl=bodydebugQuery=true} status=0 QTime=125 status=0 QTime=141 13-Mar-2009 4:25:59 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.java:259) at org.apache.lucene.index.IndexReader.document(IndexReader.java:632) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:371) at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:479) It looks like core0 is also trying to handle the request. With the debugger, I discover that the code is trying to access a document with an id of the index of core2 within the index of core0, which fails (SolrIndexSearcher.java:371). Any idea with there seems to be two cores that try to handle the request? -- View this message in context: http://www.nabble.com/Custom-handler-that-forwards-a-request-to-another-core-tp22501470p22541459.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom handler that forwards a request to another core
Hi, I'm writing a custom handler that forwards a request to a handler of another core. The custom handler is defined in core0 and the core I try to send the request to is core2 which has a mlt handler. Here is the code of my custom handler (extends RequestHandlerBase and implements SolrCoreAware): public void inform(SolrCore core) { this.core = core; this.cores = core.getCoreDescriptor().getCoreContainer(); this.multiCoreHandler = cores.getMultiCoreHandler(); } public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse response) throws Exception { SolrCore coreToRequest = cores.getCore(core2); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(q, Lucene); params.set(mlt.fl, body); params.set(debugQuery, true); request = new LocalSolrQueryRequest(coreToRequest, params); SolrRequestHandler mlt = coreToRequest.getRequestHandler(/mlt); coreToRequest.execute(mlt, request, response); coreToRequest.close(); } I'm calling this handler from firefox with this url (the path of my custom handler is /nlt): http://localhost:8080/solr/core0/nlt With my debugger, I can see, after the execute() method is executed, this line in the log: 13-Mar-2009 4:25:59 PM org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/nlt params={} webapp=null path=null params={q=Lucenemlt.fl=bodydebugQuery=true} status=0 QTime=125 Which seems logical: the core2 is executing the request (though I'm wondering how core2 knows about the /nlt path) After, I let the debugger resume the program and I see those lines: 13-Mar-2009 4:25:59 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/nlt params={} webapp=null path=null params={q=Lucenemlt.fl=bodydebugQuery=true} status=0 QTime=125 status=0 QTime=141 13-Mar-2009 4:25:59 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.java:259) at org.apache.lucene.index.IndexReader.document(IndexReader.java:632) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:371) at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:479) It looks like core0 is also trying to handle the request. With the debugger, I discover that the code is trying to access a document with an id of the index of core2 within the index of core0, which fails (SolrIndexSearcher.java:371). Any idea with there seems to be two cores that try to handle the request? -- View this message in context: http://www.nabble.com/Custom-handler-that-forwards-a-request-to-another-core-tp22501470p22501470.html Sent from the Solr - User mailing list archive at Nabble.com.
Programmatic access to other handlers
Hi, I've designed a front handler that will send request to other handlers and return a aggregated response. Inside this handler, I call other handlers like this (inside the method handleRequestBody): SolrCore core = req.getCore(); SolrRequestHandler mlt = core.getRequestHandler(/mlt); ModifiableSolrParams params = new ModifiableSolrParams(req.getParams()); params.set(mlt.fl, nFullText); req.setParams(params); mlt.handleRequest(req, rsp); First question: is this the recommended way to call another handler? Second question: how could I call a handler of another core? -- View this message in context: http://www.nabble.com/Programmatic-access-to-other-handlers-tp22477731p22477731.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Programmatic access to other handlers
I found this code to access other core from my custom requesthandler: CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer cores = initializer.initialize(); SolrCore otherCore = cores.getCore(otherCore); It seems to work with some little testing. But is it a recommended approach? Pascal Dimassimo wrote: Hi, I've designed a front handler that will send request to other handlers and return a aggregated response. Inside this handler, I call other handlers like this (inside the method handleRequestBody): SolrCore core = req.getCore(); SolrRequestHandler mlt = core.getRequestHandler(/mlt); ModifiableSolrParams params = new ModifiableSolrParams(req.getParams()); params.set(mlt.fl, nFullText); req.setParams(params); mlt.handleRequest(req, rsp); First question: is this the recommended way to call another handler? Second question: how could I call a handler of another core? -- View this message in context: http://www.nabble.com/Programmatic-access-to-other-handlers-tp22477731p22483357.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Programmatic access to other handlers
Thanks ryantxu for your answer. I implement the interface and it returns me the current core. But how is it different from doing request.getCore() from handleRequestBody()? And I don't see how this can give me access to other cores. I think that what I need is to get access to an instance of CoreContainer, so I can call getCore(name) and getAdminCore to manage the different cores. So I'm wondering if this is a good way to get that instance: CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer cores = initializer.initialize(); ryantxu wrote: If you are doing this in a RequestHandler, implement SolrCoreAware and you will get a callback with the Core http://wiki.apache.org/solr/SolrPlugins#head-8b3ac1fc3584fe1e822924b98af23d72b02ab134 On Mar 12, 2009, at 3:04 PM, Pascal Dimassimo wrote: I found this code to access other core from my custom requesthandler: CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer cores = initializer.initialize(); SolrCore otherCore = cores.getCore(otherCore); It seems to work with some little testing. But is it a recommended approach? Pascal Dimassimo wrote: Hi, I've designed a front handler that will send request to other handlers and return a aggregated response. Inside this handler, I call other handlers like this (inside the method handleRequestBody): SolrCore core = req.getCore(); SolrRequestHandler mlt = core.getRequestHandler(/mlt); ModifiableSolrParams params = new ModifiableSolrParams(req.getParams()); params.set(mlt.fl, nFullText); req.setParams(params); mlt.handleRequest(req, rsp); First question: is this the recommended way to call another handler? Second question: how could I call a handler of another core? -- View this message in context: http://www.nabble.com/Programmatic-access-to-other-handlers-tp22477731p22483357.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Programmatic-access-to-other-handlers-tp22477731p22486235.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about etag
I finally found the reason of this behavior. I realize that if I waited a couple of minutes, Firefox would send the if-none-match header which was responded by the 304 code by solr. What happens is that Firefox keeps a disk cache. If a response contains the header Last-Modified, even if there is a etag header, Firefox computes an expiration date which was about 5 minutes for my request. And during that period, the request was served from the cache. You can see the expiration date by looking at about:cache in Firefox. The rules to compute the expiration time depending on the headers is described here: https://developer.mozilla.org/En/HTTP_Caching_FAQ I realize that this was a Firefox issue. Sorry to have disrupt this list. Pascal Dimassimo wrote: Sorry, the xml of the solrconfig.xml was lost. It is httpCaching lastModifiedFrom=openTime etagSeed=Solr /httpCaching Hi guys, I'm having trouble understanding the behavior of firefox and the etag. After cleaning the cache, I send this request from firefox: GET /solr/select/?q=television HTTP/1.1 Host: localhost:8088 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: JSESSIONID=AA71D602A701BB6287C60083DD6879CD Which solr responds with: HTTP/1.1 200 OK Last-Modified: Thu, 19 Feb 2009 19:57:14 GMT ETag: NmViOTJkMjc1ODgwMDAwMFNvbHI= Content-Type: text/xml; charset=utf-8 Transfer-Encoding: chunked Server: Jetty(6.1.3) (#data following#) So far so good. But then, I press F5 to refresh the page. Now if I understand correctly the way the etag works, firefox should send the request with a if-none-match along with the etag and then the server should return a 304 not modified code. But what happens is that firefox just don't send anything. In the firebug window, I only see 0 requests. Just to make sure I test with tcpmon and nothing is sent by firefox. Is this making sense? Am I missing something? My solrconfig.xml has this config: Thanks! -- View this message in context: http://www.nabble.com/Question-about-etag-tp22125449p22167528.html Sent from the Solr - User mailing list archive at Nabble.com.
Question about etag
Hi guys, I'm having trouble understanding the behavior of firefox and the etag. After cleaning the cache, I send this request from firefox: GET /solr/select/?q=television HTTP/1.1 Host: localhost:8088 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: JSESSIONID=AA71D602A701BB6287C60083DD6879CD Which solr responds with: HTTP/1.1 200 OK Last-Modified: Thu, 19 Feb 2009 19:57:14 GMT ETag: NmViOTJkMjc1ODgwMDAwMFNvbHI= Content-Type: text/xml; charset=utf-8 Transfer-Encoding: chunked Server: Jetty(6.1.3) (#data following#) So far so good. But then, I press F5 to refresh the page. Now if I understand correctly the way the etag works, firefox should send the request with a if-none-match along with the etag and then the server should return a 304 not modified code. But what happens is that firefox just don't send anything. In the firebug window, I only see 0 requests. Just to make sure I test with tcpmon and nothing is sent by firefox. Is this making sense? Am I missing something? My solrconfig.xml has this config: Thanks! _ The new Windows Live Messenger. You don’t want to miss this. http://www.microsoft.com/windows/windowslive/products/messenger.aspx
Re: Question about etag
Sorry, the xml of the solrconfig.xml was lost. It is httpCaching lastModifiedFrom=openTime etagSeed=Solr /httpCaching Hi guys, I'm having trouble understanding the behavior of firefox and the etag. After cleaning the cache, I send this request from firefox: GET /solr/select/?q=television HTTP/1.1 Host: localhost:8088 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: JSESSIONID=AA71D602A701BB6287C60083DD6879CD Which solr responds with: HTTP/1.1 200 OK Last-Modified: Thu, 19 Feb 2009 19:57:14 GMT ETag: NmViOTJkMjc1ODgwMDAwMFNvbHI= Content-Type: text/xml; charset=utf-8 Transfer-Encoding: chunked Server: Jetty(6.1.3) (#data following#) So far so good. But then, I press F5 to refresh the page. Now if I understand correctly the way the etag works, firefox should send the request with a if-none-match along with the etag and then the server should return a 304 not modified code. But what happens is that firefox just don't send anything. In the firebug window, I only see 0 requests. Just to make sure I test with tcpmon and nothing is sent by firefox. Is this making sense? Am I missing something? My solrconfig.xml has this config: Thanks! -- View this message in context: http://www.nabble.com/Question-about-etag-tp22125449p22127322.html Sent from the Solr - User mailing list archive at Nabble.com.