Solr 4.2 update/extract adding unknown field, can we change field type from string to text
hi, while indexing document with unknown fields, its adding unknown fields in schema but its always guessing it as string type. is it possible to specify default field type for unknown fields to some other type, like text so that it gets tokenized? also can we specify other properties by default like indexed/stored/multivalued? PS am using solr4.2. Thanks alot. Jai
Re: SolrCloud - Path must not end with / character
The issue is resolved. I have given all the path inside tomcat as relative paths( solr home, solr war). That was the creating the problem. On Mon, Sep 2, 2013 at 2:19 PM, Prasi S prasi1...@gmail.com wrote: Does this have anyting to do with tomcat? I cannot go back as we already fixed with tomcat. Any suggestions pls. The same setup , if i copy and run it on a different machine, it works fine. Am not sure what is missing. Is it because of some system parameter getting set? On Fri, Aug 30, 2013 at 9:11 PM, Jared Griffith jgriff...@picsauditing.com wrote: I was getting the same errors when trying to implement SolrCloud with Tomcat. I eventually gave up until something came out of this thread. This all works if you just ditch Tomcat and go with the native Jetty server. On Fri, Aug 30, 2013 at 6:28 AM, Prasi S prasi1...@gmail.com wrote: Also, this fails with the default solr 4.4 downlaoded configuration too On Fri, Aug 30, 2013 at 4:19 PM, Prasi S prasi1...@gmail.com wrote: Below is the script i run START /MAX F:\SolrCloud\zookeeper\zk-server-1\zookeeper-3.4.5\bin\zkServer.cmd START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182 -confdir solr-conf -confname solrconf1 START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2182 -collection firstcollection -confname solrconf1 -solrhome ../tomcat1/solr1 START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182 -confdir solr-conf -confname solrconf2 START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2182 -collection seccollection -confname solrconf2 -solrhome ../tomcat1/solr1 START /MAX F:\solrcloud\tomcat1\bin\startup.bat START /MAX F:\solrcloud\tomcat2\bin\startup.bat On Fri, Aug 30, 2013 at 4:07 PM, Prasi S prasi1...@gmail.com wrote: Im still clueless on where the issue could be. There is no much information in the solr logs. i had a running version of cloud in another server. I have copied the same to this server, and started zookeeper, then ran teh below commands, java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2181 -confdir solr-conf -confname solrconfindex java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2181 -collection colindexer -confname solrconfindex -solrhome ../tomcat1/solr1 After this, when i started tomcat, the first tomcat starts fine. When the second tomcat is started, i get the above exception and it stops. Tehn the first tomcat also shows teh same exception. On Thu, Aug 29, 2013 at 7:18 PM, Mark Miller markrmil...@gmail.com wrote: Yeah, you see this when the core could not be created. Check the logs to see if you can find something more useful. I ran into this again the other day - it's something we should fix. You see the same thing in the UI when a core cannot be created and it gives you no hint about the problem and is confusing. - Mark On Aug 29, 2013, at 5:23 AM, sathish_ix skandhasw...@inautix.co.in wrote: Hi , Check your configuration files uploaded into zookeeper is valid and no error in config files uploaded. I think due to this error, solr core will not be created. Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Path-must-not-end-with-character-tp4087159p4087182.html Sent from the Solr - User mailing list archive at Nabble.com. -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC
Problem with Synonyms
Hello, this is my first time writing at this mailing lost, so hello everyone. I am having issues with synonyms. I added the synonym to one of my field types: |fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I also added some Synonyms to my Synonyms.txt which is in the conf folder of my core. However, when i look into the analyzer, the content wont be replaced and I cant figure out where the problem lies. Can anybody help here? Im totally stuck... |
Update field properties via Schema Rest API ?
Hello, I'm pretty new to Solr, as a PHP developer. I'm still reading the tutorials for getting started with Solr, adding and indexing data. I'm still using the example/start.jar, as I still didn't succeed to config a true (production-ready) Solr instance. But doesn't matter. As I can't deal with Java, Tomcat etc, I just want to do the maximum things with the REST API, for not having to edit any file. I have needs in adding and editing fields frequently, so I use the Schema Rest API. However, the wiki http://wiki.apache.org/solr/SchemaRESTAPI explains how to add fields, but not to update or delete them. Can you help me ? I really need to control (e.g. update) the properties of my fields (indexed, stored, multiValued, etc) via the REST API, without having to edit any file each time I need an update. Thanks, Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907.html Sent from the Solr - User mailing list archive at Nabble.com.
solr cloud and DIH, indexation runs only on one shard.
Hello again, I still trying to index a with solr cloud and dih. I can index but it seems that indexation is done on only 1 shard. (my goal was to parallelze that to go fast) This my conf: I have 2 tomcat instances, One with zookeeper embedded in solr 4.4.0 started and 1 shard (port 8080) The other with the second shard. (port 9180) In my admin interface, I see 2 shards, each one is leader When I launch the dih, documents are indexed. But only the shard1 is working. http://localhost:8080/solr-0.4.0-pfd/noticesBIBcollection/dataimportMNb?command=full-importentity=noticebiboptimize=trueindent=trueclean=truecommit=trueverbose=falsedebug=falsewt=jsonrows=1000 In my first shard, I see messages coming from my indexation process: DEBUG 2013-09-03 11:48:57,801 Thread-12 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/002/37002118.xml DEBUG 2013-09-03 11:48:57,832 Thread-12 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/002/37002120.xml DEBUG 2013-09-03 11:48:57,966 Thread-12 org.apache.solr.handler.dataimport.LogTransformer (58) - Notice fichier: 3/7/002/37002120.xml DEBUG 2013-09-03 11:48:57,966 Thread-12 fr.bnf.solr.BnfDateTransformer (696) - NN=37002120 In the second instance, I just have this kind of logs, at it was receiving notifications from zookeeper of new updates INFO 2013-09-03 11:48:57,323 http-9180-7 org.apache.solr.update.processor.LogUpdateProcessor (198) - [noticesBIB] webapp=/solr-0.4.0-pfd path=/update params= {distrib.from=http://172.20.48.237:8080/solr-0.4.0-pfd/noticesBIB/update.distrib=TOLEADERwt=javabinversion=2} {add=[37001748 (1445149264874307584), 37001757 (1445149264879550464), 37001764 (1445149264883744768), 37001786 (1445149264887939072), 37001817 (1445149264891084800), 37001819 (1445149264896327680), 37001837 (1445149264900521984), 37001861 (1445149264903667712), 37001869 (1445149264907862016), 37001963 (1445149264912056320)]} 0 41 I supposed there was a confusion between cores names and collection name, and I tried to change the name of the collection, but it solved nothing. When I come to dih interfaces, in shard1, I see indexation processing, and on shard 2 no information available Is there something specia to do to distributre indexation process? Should I run zookeeper on both instances (even if it's not mandatory? ... Regards Jerome Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 septembre 2013 Avant d'imprimer, pensez à l'environnement.
Re: Solr 4.2 update/extract adding unknown field, can we change field type from string to text
You can use the dynamic fields feature of Solr to map unknown field names to types. For example, a dynamic field named as *_s i.e. any field name ending with _s can be mapped to string and so on. In your cases, if your field names do not follow a set pattern, then you can even specify a dynamic field as * and map it to text type. See https://cwiki.apache.org/confluence/display/solr/Dynamic+Fields On Tue, Sep 3, 2013 at 12:00 PM, Jai jai4l...@gmail.com wrote: hi, while indexing document with unknown fields, its adding unknown fields in schema but its always guessing it as string type. is it possible to specify default field type for unknown fields to some other type, like text so that it gets tokenized? also can we specify other properties by default like indexed/stored/multivalued? PS am using solr4.2. Thanks alot. Jai -- Regards, Shalin Shekhar Mangar.
Re: solr cloud and DIH, indexation runs only on one shard.
DataImportHandler does not parallelize indexing at all. It is a single threaded indexer which runs on a single node. However, the documents themselves are routed to the correct shard by SolrCloud. Therefore, what you are observing on your servers is normal. If you want to parallelize indexing then you can either: a) Use SolrJ or an external client and write the indexing code yourself, or b) Setup DIH in such a way that each shard indexes a disjoint subset of data. This way, you can fire DIH full import on multiple shard/nodes simultaneously. One way of achieving (b) is by using request parameters to substitute placeholders in your DIH configuration. See http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters On Tue, Sep 3, 2013 at 3:25 PM, jerome.dup...@bnf.fr wrote: Hello again, I still trying to index a with solr cloud and dih. I can index but it seems that indexation is done on only 1 shard. (my goal was to parallelze that to go fast) This my conf: I have 2 tomcat instances, One with zookeeper embedded in solr 4.4.0 started and 1 shard (port 8080) The other with the second shard. (port 9180) In my admin interface, I see 2 shards, each one is leader When I launch the dih, documents are indexed. But only the shard1 is working. http://localhost:8080/solr-0.4.0-pfd/noticesBIBcollection/dataimportMNb?command=full-importentity=noticebiboptimize=trueindent=trueclean=truecommit=trueverbose=falsedebug=falsewt=jsonrows=1000 In my first shard, I see messages coming from my indexation process: DEBUG 2013-09-03 11:48:57,801 Thread-12 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/002/37002118.xml DEBUG 2013-09-03 11:48:57,832 Thread-12 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/002/37002120.xml DEBUG 2013-09-03 11:48:57,966 Thread-12 org.apache.solr.handler.dataimport.LogTransformer (58) - Notice fichier: 3/7/002/37002120.xml DEBUG 2013-09-03 11:48:57,966 Thread-12 fr.bnf.solr.BnfDateTransformer (696) - NN=37002120 In the second instance, I just have this kind of logs, at it was receiving notifications from zookeeper of new updates INFO 2013-09-03 11:48:57,323 http-9180-7 org.apache.solr.update.processor.LogUpdateProcessor (198) - [noticesBIB] webapp=/solr-0.4.0-pfd path=/update params= {distrib.from=http://172.20.48.237:8080/solr-0.4.0-pfd/noticesBIB/update.distrib=TOLEADERwt=javabinversion=2} {add=[37001748 (1445149264874307584), 37001757 (1445149264879550464), 37001764 (1445149264883744768), 37001786 (1445149264887939072), 37001817 (1445149264891084800), 37001819 (1445149264896327680), 37001837 (1445149264900521984), 37001861 (1445149264903667712), 37001869 (1445149264907862016), 37001963 (1445149264912056320)]} 0 41 I supposed there was a confusion between cores names and collection name, and I tried to change the name of the collection, but it solved nothing. When I come to dih interfaces, in shard1, I see indexation processing, and on shard 2 no information available Is there something specia to do to distributre indexation process? Should I run zookeeper on both instances (even if it's not mandatory? ... Regards Jerome Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 septembre 2013 Avant d'imprimer, pensez à l'environnement. -- Regards, Shalin Shekhar Mangar.
Re: Problem with Synonyms
SOLR has a nice analysis page. You can use it to get insight what is happening after each filter is applied at index/search time Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Synonyms-tp4087905p4087915.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr cloud and DIH, indexation runs only on one shard.
Hi jerome.dupont please check what is the updateHandler in your solrconfig.xml updateRequestProcessorChain name=sample processor class=solr.LogUpdateProcessorFactory / processor class=solr.NoOpDistributingUpdateProcessorFactory/ -- by default,it is solr.NoOpDistributingUpdateProcessorFactor processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str str name=update.chainsample/str /lst /requestHandler 2013/9/3 jerome.dup...@bnf.fr Hello again, I still trying to index a with solr cloud and dih. I can index but it seems that indexation is done on only 1 shard. (my goal was to parallelze that to go fast) This my conf: I have 2 tomcat instances, One with zookeeper embedded in solr 4.4.0 started and 1 shard (port 8080) The other with the second shard. (port 9180) In my admin interface, I see 2 shards, each one is leader When I launch the dih, documents are indexed. But only the shard1 is working. http://localhost:8080/solr-0.4.0-pfd/noticesBIBcollection/dataimportMNb?command=full-importentity=noticebiboptimize=trueindent=trueclean=truecommit=trueverbose=falsedebug=falsewt=jsonrows=1000 In my first shard, I see messages coming from my indexation process: DEBUG 2013-09-03 11:48:57,801 Thread-12 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/002/37002118.xml DEBUG 2013-09-03 11:48:57,832 Thread-12 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/002/37002120.xml DEBUG 2013-09-03 11:48:57,966 Thread-12 org.apache.solr.handler.dataimport.LogTransformer (58) - Notice fichier: 3/7/002/37002120.xml DEBUG 2013-09-03 11:48:57,966 Thread-12 fr.bnf.solr.BnfDateTransformer (696) - NN=37002120 In the second instance, I just have this kind of logs, at it was receiving notifications from zookeeper of new updates INFO 2013-09-03 11:48:57,323 http-9180-7 org.apache.solr.update.processor.LogUpdateProcessor (198) - [noticesBIB] webapp=/solr-0.4.0-pfd path=/update params= {distrib.from= http://172.20.48.237:8080/solr-0.4.0-pfd/noticesBIB/update.distrib=TOLEADERwt=javabinversion=2 } {add=[37001748 (1445149264874307584), 37001757 (1445149264879550464), 37001764 (1445149264883744768), 37001786 (1445149264887939072), 37001817 (1445149264891084800), 37001819 (1445149264896327680), 37001837 (1445149264900521984), 37001861 (1445149264903667712), 37001869 (1445149264907862016), 37001963 (1445149264912056320)]} 0 41 I supposed there was a confusion between cores names and collection name, and I tried to change the name of the collection, but it solved nothing. When I come to dih interfaces, in shard1, I see indexation processing, and on shard 2 no information available Is there something specia to do to distributre indexation process? Should I run zookeeper on both instances (even if it's not mandatory? ... Regards Jerome Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 septembre 2013 Avant d'imprimer, pensez à l'environnement.
Re: Update field properties via Schema Rest API ?
The Schema REST API is a new feature and supports only adding fields (and that too since Solr 4.4). It doesn't support modifying fields yet. On Tue, Sep 3, 2013 at 2:39 PM, bengates benga...@aliceadsl.fr wrote: Hello, I'm pretty new to Solr, as a PHP developer. I'm still reading the tutorials for getting started with Solr, adding and indexing data. I'm still using the example/start.jar, as I still didn't succeed to config a true (production-ready) Solr instance. But doesn't matter. As I can't deal with Java, Tomcat etc, I just want to do the maximum things with the REST API, for not having to edit any file. I have needs in adding and editing fields frequently, so I use the Schema Rest API. However, the wiki http://wiki.apache.org/solr/SchemaRESTAPI explains how to add fields, but not to update or delete them. Can you help me ? I really need to control (e.g. update) the properties of my fields (indexed, stored, multiValued, etc) via the REST API, without having to edit any file each time I need an update. Thanks, Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Starting Solr in Tomcat with specifying ZK host(s)
Hi, I've setup a ZK instance and also deployed Solr in Tomcat7 on a different instance in Amazon EC2. Afterwards I tried starting tomcat specifying the ZK host IP, like so: sudo service tomcat7 start -DzkHost=zk ip:2181 -DnumShards=3 -Dcollection.configName=myconf -Dbootstrap_confdir=/usr/share/solr/example/solr/collection1/conf Solr loads fine, but is not in the cloud. Any idea what am i doing wrong here? -- View this message in context: http://lucene.472066.n3.nabble.com/Starting-Solr-in-Tomcat-with-specifying-ZK-host-s-tp4087916.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Update field properties via Schema Rest API ?
Hello, Thanks for your quick reply. This is what I feared. Do you know if this is planned for Solr 4.5 or Solr 5.0 ? I didn't see anything about it in the roadmap. Thank you, Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907p4087920.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with Synonyms
Am 03.09.2013 12:11, schrieb pravesh: SOLR has a nice analysis page. You can use it to get insight what is happening after each filter is applied at index/search time Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Synonyms-tp4087905p4087915.html Sent from the Solr - User mailing list archive at Nabble.com. Yeah thats the thing. It applies the Synonym Filter, but nothing really happens. Is there a way to see if the SF load my Synonym file?
Re: db-data-config.xml ?
Did you find any other exceptions in the logs? When I pasted the script section of your data config into my test setup, I got an error saying that there is an unclosed string literal in line 6 On Tue, Sep 3, 2013 at 12:23 AM, Kunzman, Doug dkunz...@usgs.gov wrote: Hi - I'm new to Solr and am trying to combine a script:and RegExTransformer in a db-dataconfig.xml that is used to ingest data into Solr. Can anyone be of any help? There is definitly a comma between my script:add , and addRegexTransfomer lines. Any help would be appreciated. My db-data-config.xml looks like this? dataConfig dataSource type=JdbcDataSource driver=org.postgresql.Driver url=jdbc:postgresql://localhost:/test?netTimeoutForStreamingResults=24000 autoReconnect=true user=postgres password= batchSize =10 responseBuffering=adaptive/ script![CDATA[ function add(row){ var latlon_s= row.get('longitude')+','+row.get('latitude'); var provider = row.get('provider'); var pointPath = '/'+ row.get('longitude')+','+row.get('latitude')+'/'+row.get('basis_of_record'); if ('NatureServe'.equalsIgnoreCase(provider) || 'USDA PLANTS'.equalsIgnoreCase(provider)) { pointPath += '/centroid'; } row.put('latlon_s', latlon_s); row.put('pointPath_s',pointPath); var provider_id = row.get('provider_id'); var resource_id = row.get('resource_id_s'); var hierarchy = row.get('hierarchy_string'); row.put('hierarchy_homonym_string', '-' + hierarchy + '-'); row.put('BISONResourceID', '/' + provider_id + '/' + resource_id +'/'); return row; } ]]/script document name=itis_to_portal.occurence !--entity name=occurrence pk=id transformer=script:add query=select id, scientific_name, latitude, longitude, year, basis_of_record, provider_id,resource_id_s, occurrence_date, tsns, parent_tsn, hierarchy_string, collector, ambiguous, statecomputedfips, countycomputedfips from itis_to_portal.solr transformer=RegexTransformer -- entity name=occurrence pk=id query=select id, scientific_name, latitude, longitude, year, basis_of_record, provider_id,resource_id_s, occurrence_date, tsns, parent_tsn, hierarchy_string, collector, ambiguous, statecomputedfips, countycomputedfips from itis_to_portal.solr transformer=RegexTransformer,script:add and at runtime import I'm getting the following error message, SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not invoke method :addRegexTransformer Thanks, Doug -- Regards, Shalin Shekhar Mangar.
Re: Apostrophes in fields
in my case - the fields with apostrophe not returned in results When I search for -- dev it shows me following results dev dev's devendra but when I search for -- dev' (dev with apo only) Nothing comes out as result ? What could be the workaround ? Thanks Devendra -- View this message in context: http://lucene.472066.n3.nabble.com/Apostrophes-in-fields-tp475058p4087910.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: phonetic search
Hmmm, seems like it should work. First thing I'd try is using the admin interface and look at the analysis page to see how the input is tokenized both at index and search time, that's sometimes surprising. Second, again using the browser, attach debug=query to the URL. That will echo back what the query actually parsed to. Combined with the analysis page this last bit of information is often enough to figure it out. If that doesn't show it, please cut/paste the results back here You can do the same from your SolrJ program, but the admin UI is usually faster. Best Erick On Mon, Sep 2, 2013 at 3:25 PM, Sergio Stateri stat...@gmail.com wrote: Thanks Erick, I´m trying to looking for english texts now. I put a field type like this: fieldtype name=myPhonetic stored=false indexed=true class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true/ /analyzer /fieldtype ... field name=descricaoRoteiroPhonetic type=myPhonetic indexed=true required=true stored=true/ Then I´m trying to find CITY, like this: SolrQuery query = new SolrQuery(); query.setQuery((descricaoRoteiroPhonetic:CITY) ); QueryResponse rsp = server.query( query ); QueryResponse rsp = server.query( query ); But there is not results, and I have a lot of documents with CITY in the descricaoRoteiroPhonetic field. Do you know what I´m doing wrong? Thanks a lot. Sergio Stateri Junior. 2013/9/2 Erick Erickson erickerick...@gmail.com What you need to do is include one of the phonetic filters in your analysis chain, see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory All you've done with the stemmer is make things like (sorry, English examples are all I can do) running, runner etc. be indexed and searched as run, not phonetic processing There are several variants, each uses a different algorithms at the link above. Not sure what to tweak for handling Brazilian Portuguese though... Best Erick On Mon, Sep 2, 2013 at 1:41 PM, Sergio Stateri stat...@gmail.com wrote: Please, How can I make I phonetic search in Solr with portuguese (brazilian) language? I tryied including this field type: fieldType name=brazilianPhonetic class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.BrazilianStemFilterFactory/ /analyzer /fieldType ... field name=descricaoRoteiroPhonetic type=brazilianPhonetic multiValued=true indexed=true required=true stored=true/ But this didn´t work. I have no idea about how to make a phonetic search. I´m using Solr 4. Thanks in advance, -- Sergio Stateri Jr. stat...@gmail.com -- Sergio Stateri Jr. stat...@gmail.com
Re: Update field properties via Schema Rest API ?
Is editing a text file really all that onerous? You can edit the schema.xml file with any editor you're comfortable with and issue the core RELOAD command in the interim. Best Erick On Tue, Sep 3, 2013 at 6:20 AM, bengates benga...@aliceadsl.fr wrote: Hello, Thanks for your quick reply. This is what I feared. Do you know if this is planned for Solr 4.5 or Solr 5.0 ? I didn't see anything about it in the roadmap. Thank you, Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907p4087920.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with Synonyms
Please explain exactly what but nothing really happens means. Do you mean that you see the SF in the analysis page but there are no substitutions? Or you don't get search results? Or??? You have to reload the core after making changes at minimum, you can restart the Solr instance if you're paranoid. And you have to re-index for changes in the index part of the analysis chain to take effect. Best Erick On Tue, Sep 3, 2013 at 6:33 AM, Christian Loock c...@vkf-renzel.de wrote: Am 03.09.2013 12:11, schrieb pravesh: SOLR has a nice analysis page. You can use it to get insight what is happening after each filter is applied at index/search time Regards Pravesh -- View this message in context: http://lucene.472066.n3.** nabble.com/Problem-with-**Synonyms-tp4087905p4087915.**htmlhttp://lucene.472066.n3.nabble.com/Problem-with-Synonyms-tp4087905p4087915.html Sent from the Solr - User mailing list archive at Nabble.com. Yeah thats the thing. It applies the Synonym Filter, but nothing really happens. Is there a way to see if the SF load my Synonym file?
Re: Problem with Synonyms
The SF part is in the analysis page but nothing is substituted. I reloaded, removed and readded the core, reindexednothing worked :( I wonder if the SF actually uses the correct file for synonyms. I have it laying in the conf folder of the core. Is that correct? Am 03.09.2013 13:32, schrieb Erick Erickson: Please explain exactly what but nothing really happens means. Do you mean that you see the SF in the analysis page but there are no substitutions? Or you don't get search results? Or??? You have to reload the core after making changes at minimum, you can restart the Solr instance if you're paranoid. And you have to re-index for changes in the index part of the analysis chain to take effect. Best Erick On Tue, Sep 3, 2013 at 6:33 AM, Christian Loock c...@vkf-renzel.de wrote: Am 03.09.2013 12:11, schrieb pravesh: SOLR has a nice analysis page. You can use it to get insight what is happening after each filter is applied at index/search time Regards Pravesh -- View this message in context: http://lucene.472066.n3.** nabble.com/Problem-with-**Synonyms-tp4087905p4087915.**htmlhttp://lucene.472066.n3.nabble.com/Problem-with-Synonyms-tp4087905p4087915.html Sent from the Solr - User mailing list archive at Nabble.com. Yeah thats the thing. It applies the Synonym Filter, but nothing really happens. Is there a way to see if the SF load my Synonym file?
Memory usage during aggregation - SolrCloud with very large numbers of facet terms.
Hi, We have a large, sharded SolrCloud index of 300 million documents which we use to explore our web archives. We want to facet on fields that have very large numbers of distinct values, e.g. host names and domain names of pages and links. Thus, overall, we expect to have millions of distinct terms for those fields. We also want to sort on other fields (e.g. date of harvest). We have experimented with various RAM and facet configurations, and are currently finding facet.method=enum + minDf to be more stable than fc. We currently have eight shards, and although the queries are slow, we are finding individual shards to be fairly reliable with a few GB of RAM (about 5GB per shard right now). This seems to be consistent with guidelines for estimating RAM usage (e.g. http://stackoverflow.com/questions/4499630/solr-faceted-navigation-on-la rge-index). However, the Solr instance we direct our client query to is consuming significantly more RAM (10GB) and is still failing after a few queries when it runs out of heap space. This is presumably due to the role it plays, aggregating the results from each shard. Is there any way we can estimate the amount of RAM that server will need? Alternatively, given our dataset, should be we pursuing a different approach? Should we re-index with the facet partition size set to something smaller (e.g. 10,000 rather than Integer.MAX_VALUE)? Should we be using facet.method=fc and buying more RAM? Best wishes, Andy Jackson -- Dr Andrew N Jackson Web Archiving Technical Lead The British Library Tel: 01937 546602 Mobile: 07765 897948 Web: www.webarchive.org.uk http://www.webarchive.org.uk/ Twitter: @UKWebArchive
Re: Measuring SOLR performance
Hi Roman, Thanks, the --additionalSolrParams was just what I wanted and works fine. BTW, if you have some special bug tracking forum for the tool, I'm happy to submit questions / bug reports there. Otherwise, this email list is ok (for me at least). One other thing I have noticed in the err logs was a series of messages of this sort upon generating the perf test report. Seems to be jmeter related (the err messages disappear, if extra lib dir is present under ext directory). java.lang.Throwable: Could not access /home/dmitry/projects/lab/solrjmeter7/solrjmeter/jmeter/lib/ext/lib at kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109) at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55) at kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109) at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55) at kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109) at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55) On Tue, Sep 3, 2013 at 2:50 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, If it is something you want to pass with every request (which is my use case), you can pass it as additional solr params, eg. python solrjmeter --additionalSolrParams=fq=other_field:bar+facet=true+facet.field=facet_field_name the string should be url encoded. If it is something that changes with every request, you should modify the jmeter test. If you open/load it with jmeter GUI, in the HTTP request processor you can define other additional fields to pass with the request. These values can come from the CSV file, you'll see an example how to use that when you open the test difinition file. Cheers, roman On Mon, Sep 2, 2013 at 3:12 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi Erick, Agree, this is perfectly fine to mix them in solr. But my question is about solrjmeter input query format. Just couldn't find a suitable example on the solrjmeter's github. Dmitry On Mon, Sep 2, 2013 at 5:40 PM, Erick Erickson erickerick...@gmail.com wrote: filter and facet queries can be freely intermixed, it's not a problem. What problem are you seeing when you try this? Best, Erick On Mon, Sep 2, 2013 at 7:46 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, What's the format for running the facet+filter queries? Would something like this work: field:foo =50 fq=other_field:bar facet=true facet.field=facet_field_name Thanks, Dmitry On Fri, Aug 23, 2013 at 2:34 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, With adminPath=/admin or adminPath=/admin/cores, no. Interestingly enough, though, I can access http://localhost:8983/solr/statements/admin/system But I can access http://localhost:8983/solr/admin/cores, only when with adminPath=/admin/cores (which suggests that this is the right value to be used for cores), and not with adminPath=/admin. Bottom line, these core configuration is not self-evident. Dmitry On Fri, Aug 23, 2013 at 4:18 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, So it seems solrjmeter should not assume the adminPath - and perhaps needs to be passed as an argument. When you set the adminPath, are you able to access localhost:8983/solr/statements/admin/cores ? roman On Wed, Aug 21, 2013 at 7:36 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, I have noticed a difference with different solr.xml config contents. It is probably legit, but thought to let you know (tests run on fresh checkout as of today). As mentioned before, I have two cores configured in solr.xml. If the file is: [code] solr persistent=false !-- adminPath: RequestHandler path to manage cores. If 'null' (or absent), cores will not be manageable via request handler -- cores adminPath=/admin/cores host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} core name=metadata instanceDir=metadata / core name=statements instanceDir=statements / /cores /solr [/code] then the instruction: python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60 -R cms -t /solr/statements -e statements -U 100 works just fine. If however the solr.xml has adminPath set to /admin solrjmeter produces an error: [error] **ERROR** File solrjmeter.py, line 1386, in module main(sys.argv) File solrjmeter.py, line 1278,
Re: Update field properties via Schema Rest API ?
Hello Erick, Thank you for your reply. Unfortunately, yes it is. I work with a company that has a catalog with many new attributes every day, and sometimes the existing ones change. For instance, one attribute may live with the unit for months (e.g. screen_size =32 cm) and one day my provider change it to an integer (screen_size = 32), making it easier to create ranges. Besides I want my business users to be able to add and edit new features on their products, and my php middle-end app just should communicate with solr without me. We actually work with a solution that works that way (users do everything, the middle-app controls the users' inputs and deals with the back-end), and dealing with the open-source solr is really hard if that essential feature isn't provided... :( That's why I was very happy when Solr 4.4 introduced the add field by the REST API, that works very well, but I was disapointed that any new field couldn't be edited (just indexed and stored true / false would be amazing) without I have to edit any file. So I really hope this part of the API will soon be completed :) Best regards, Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907p4087951.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Memory usage during aggregation - SolrCloud with very large numbers of facet terms.
However, the Solr instance we direct our client query to is consuming significantly more RAM (10GB) and is still failing after a few queries when it runs out of heap space. This is presumably due to the role it plays, aggregating the results from each shard. That seems quite odd... What facet parameters are you using in the query? I could imagine memory issues if you're using facet.limit=-1, or some very large number. -Michael
SolrCloud - shard containing an invalid host:port
Hi, I have setup SolrCloud with tomcat. I use solr 4.1. I have zookeeper running on 192.168.1.10. A tomcat running solr_myidx on 192.168.1.10 on port 8080. A tomcat running solr_myidx on 192.168.1.11 on port 8080. My solr.xml is like this: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true collection.configName=myidx cores adminPath=/admin/cores defaultCoreName=collection1 hostPort=8080 hostContext=solr_myidx zkClientTimeout=2 core name=collection1 instanceDir=./ /cores /solr I have tomcat starting with: -Dbootstrap_conf=true -DzkHost=192.168.1.10:2181 Both tomcat startup all good but when I go to the Cloud tab in the solr admin, I see the following: collection1 -- shard1 -- 192.168.1.10:8983/solr 192.168.1.11:8080/solr_ugc 192.168.1.10:8080/solr_ugc I don't know what is 192.168.1.10:8983/solr doing there. Do you know how I can remove it? It's causing the following error when I try to query the index: SEVERE: Error while trying to recover. core=collection1:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://192.168.10.206:8983/solr Thanks, Marc
Re: Starting Solr in Tomcat with specifying ZK host(s)
When i try to deploy using jetty, everything works fine, and the solr instance gets in the cloud sudo java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkHost=zk ip:2181 -DnumShards=3 -jar start.jar -- View this message in context: http://lucene.472066.n3.nabble.com/Starting-Solr-in-Tomcat-with-specifying-ZK-host-s-tp4087916p4087962.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Memory usage during aggregation - SolrCloud with very large numbers of facet terms.
The default facet.limit is 10, but it's set to 50 for most of the facets. I've included the query parameters below. In case it makes any difference, there are quite a lot of facet fields with large numbers of terms, and the queries are being generated by the Sarnia Drupal module. Thanks, Andy --- lst name=params str name=f.links_public_suffixes.facet.limit50/str str name=f.postcode_district.facet.limit50/str str name=facettrue/str str name=sortwayback_date asc/str str name=f.content_type_served.facet.limit50/str str name=f.sentiment.facet.limit50/str str name=facet.limit10/str str name=f.content_encoding.facet.limit50/str str name=f.last_modified_year.facet.limit50/str str name=f.links_hosts.facet.limit50/str str name=facet.methodenum/str str name=f.author.facet.limit50/str str name=fl*,score/str str name=f.content_type_full.facet.limit50/str str name=f.content_type.facet.limit50/str str name=f.content_type_ext.facet.limit50/str arr name=facet.field strsentiment/str strprivate_suffix/str strpublic_suffix/str strpostcode_district/str strcontent_type_ext/str strcontent_type_full/str str{!ex=content_type_norm}content_type_norm/str strcontent_type_served/str strcontent_type/str strcontent_language/str strauthor/str strcontent_encoding/str strcontent_ffb/str str{!ex=crawl_year}crawl_year/str strdomain/str strlinks_public_suffixes/str strlinks_private_suffixes/str strlinks_hosts/str strgenerator/str strlast_modified_year/str /arr str name=qtstandard/str str name=facet.enum.cache.minDf500/str str name=facet.missingfalse/str str name=f.crawl_year.facet.limit50/str str name=facet.mincount1/str str name=f.content_language.facet.limit50/str str name=json.nlmap/str str name=wtxml/str str name=f.private_suffix.facet.limit50/str str name=rows20/str str name=f.content_ffb.facet.limit50/str str name=f.generator.facet.limit50/str str name=f.links_private_suffixes.facet.limit50/str str name=f.domain.facet.limit50/str str name=facet.sortcount/str str name=start0/str str name=q*:*/str str name=f.public_suffix.facet.limit50/str str name=f.content_type_norm.facet.limit50/str /lst --- -Original Message- From: Michael Ryan [mailto:mr...@moreover.com] Sent: 03 September 2013 13:41 To: solr-user@lucene.apache.org Subject: RE: Memory usage during aggregation - SolrCloud with very large numbers of facet terms. However, the Solr instance we direct our client query to is consuming significantly more RAM (10GB) and is still failing after a few queries when it runs out of heap space. This is presumably due to the role it plays, aggregating the results from each shard. That seems quite odd... What facet parameters are you using in the query? I could imagine memory issues if you're using facet.limit=-1, or some very large number. -Michael
Re: Measuring SOLR performance
Hi Dmitry, Thanks for the feedback. Yes, it is indeed jmeter issue (or rather, the issue of the plugin we use to generate charts). You may want to use the github for whatever comes next https://github.com/romanchyla/solrjmeter/issues Cheers, roman On Tue, Sep 3, 2013 at 7:54 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, Thanks, the --additionalSolrParams was just what I wanted and works fine. BTW, if you have some special bug tracking forum for the tool, I'm happy to submit questions / bug reports there. Otherwise, this email list is ok (for me at least). One other thing I have noticed in the err logs was a series of messages of this sort upon generating the perf test report. Seems to be jmeter related (the err messages disappear, if extra lib dir is present under ext directory). java.lang.Throwable: Could not access /home/dmitry/projects/lab/solrjmeter7/solrjmeter/jmeter/lib/ext/lib at kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109) at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55) at kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109) at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55) at kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109) at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55) On Tue, Sep 3, 2013 at 2:50 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, If it is something you want to pass with every request (which is my use case), you can pass it as additional solr params, eg. python solrjmeter --additionalSolrParams=fq=other_field:bar+facet=true+facet.field=facet_field_name the string should be url encoded. If it is something that changes with every request, you should modify the jmeter test. If you open/load it with jmeter GUI, in the HTTP request processor you can define other additional fields to pass with the request. These values can come from the CSV file, you'll see an example how to use that when you open the test difinition file. Cheers, roman On Mon, Sep 2, 2013 at 3:12 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi Erick, Agree, this is perfectly fine to mix them in solr. But my question is about solrjmeter input query format. Just couldn't find a suitable example on the solrjmeter's github. Dmitry On Mon, Sep 2, 2013 at 5:40 PM, Erick Erickson erickerick...@gmail.com wrote: filter and facet queries can be freely intermixed, it's not a problem. What problem are you seeing when you try this? Best, Erick On Mon, Sep 2, 2013 at 7:46 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, What's the format for running the facet+filter queries? Would something like this work: field:foo =50 fq=other_field:bar facet=true facet.field=facet_field_name Thanks, Dmitry On Fri, Aug 23, 2013 at 2:34 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, With adminPath=/admin or adminPath=/admin/cores, no. Interestingly enough, though, I can access http://localhost:8983/solr/statements/admin/system But I can access http://localhost:8983/solr/admin/cores, only when with adminPath=/admin/cores (which suggests that this is the right value to be used for cores), and not with adminPath=/admin. Bottom line, these core configuration is not self-evident. Dmitry On Fri, Aug 23, 2013 at 4:18 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, So it seems solrjmeter should not assume the adminPath - and perhaps needs to be passed as an argument. When you set the adminPath, are you able to access localhost:8983/solr/statements/admin/cores ? roman On Wed, Aug 21, 2013 at 7:36 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, I have noticed a difference with different solr.xml config contents. It is probably legit, but thought to let you know (tests run on fresh checkout as of today). As mentioned before, I have two cores configured in solr.xml. If the file is: [code] solr persistent=false !-- adminPath: RequestHandler path to manage cores. If 'null' (or absent), cores will not be manageable via request handler -- cores adminPath=/admin/cores host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} core name=metadata instanceDir=metadata / core name=statements instanceDir=statements / /cores /solr [/code]
Solr 4.3: Recovering from Too many values for UnInvertedField faceting on field
We are harvesting and indexing bibliographic data, thus having many distinct author names in our index. While testing Solr 4 I believe I had pushed a single core to 100 million records (91GB of data) and everything was working fine and fast. After adding a little more to the index, then following started to happen: 17328668 [searcherExecutor-4-thread-1] WARN org.apache.solr.core.SolrCore – Approaching too many values for UnInvertedField faceting on field 'author_exact' : bucket size=16726546 17328701 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore – UnInverted multi-valued field {field=author_exact,memSize=336715415,tindexSize=5001903,time=31595,phase1=31465,nTerms=12048027,bigTerms=0,termInstances=57751332,uses=0} 18103757 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field author_exact at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:181) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664) I can see that we reached a limit of bucket size. Is there a way to adjust this? The index also seem to explode in size (217GB). Thinking that I had reached a limit for what a single core could handle in terms of facet, I deleted records in the index, but even now at 1/3 (32 million) it will still fails with above error. I have optimised with expungeDeleted=true. The index is somewhat larger (76GB) than I would have expected. While we can still use the index and get facets back using enum method on that field, I would still like a way to fix the index if possible. Any suggestions? cheers, :-Dennis
Re: solr cloud and DIH, indexation runs only on one shard.
It works I've done what you said: _ In my request to get list of documents, I add a where clause filtering on the select getting the documents to index: where noticebib.numnoticebib LIKE '%${dataimporter.request.suffixeNotice}' _ And I called my dih on each shard with the parameter suffixeNotice=2 or suffixeNotice=1 Each shard indexed its part on the same time. (more or less 1000 do each one). When I execute a select on the collection, I get more or less 2000 documents. No my goad is to merge indexes, but that's another story. Another possiblity would have been to play with rows and start parameters, but it supooses 2 things _ to know the number of documents _ add an order by clause to make sure the subsets of document are disjoints (and even in that case, I'm not completly sure, because the source database can change) Thanks very much !! Jerôme Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 septembre 2013 Avant d'imprimer, pensez à l'environnement.
Re: dataimporter tika doesn't extract certain div
I don't know much about Tika but in the example data-config.xml that you posted, the xpath attribute on the field text won't work because the xpath attribute is used only by a XPathEntityProcessor. On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote: I want tika to only index the content in div id=content.../div for the field text. unfortunately it's indexing the hole page. Can't xpath do this? data-config.xml: dataConfig dataSource type=BinFileDataSource name=data/ dataSource type=BinURLDataSource name=dataUrl/ dataSource type=URLDataSource name=main/ document entity name=rec processor=XPathEntityProcessor url=http://127.0.0.1/tkb/internet/docImportUrl.xml; forEach=/docs/doc dataSource=main !--transformer=script:GenerateId-- field column=title xpath=//title / field column=id xpath=//id / field column=file xpath=//file / field column=path xpath=//path / field column=url xpath=//url / field column=Author xpath=//author / entity name=tika processor=TikaEntityProcessor url=${rec.path}${rec.file} dataSource=dataUrl onError=skip htmlMapper=identity format=html field column=text xpath=//div[@id='content'] / /entity /entity /document /dataConfig -- Regards, Shalin Shekhar Mangar.
Re: Can we used CloudSolrServer for searching data
CloudSolrServer can only be used if you are actually using SolrCloud (i.e. a ZooKeeper aware setup). If you only have a multi-core setup, then you can use LBHttpSolrServer. See http://wiki.apache.org/solr/LBHttpSolrServer On Tue, Aug 27, 2013 at 2:11 PM, Dharmendra Jaiswal dharmendra.jais...@gmail.com wrote: Hello, I am using multi-core mechnism with Solr4.4.0. And each core is dedicated to a particular client (each core is a collection) Like If we search data from SiteA, it will provide search result from CoreA And if we search data from SiteB, it will provide search result from CoreB and similar case with other client. Right now i am using HttpSolrServer (SolrJ API) for connecting with Solr for search. As per my understanding it will try to connect directly to a particular Solr instance for searching and if that node will be down searching will fail. please let me know if my assumption is wrong. My query is that is it possible to connect with Solr using CloudSolrServer instead of HTTPSolrServer for searching. so that in case one node will be down cloud solr server will pick data from other instance of Solr. Any pointer and link will be helpful. it will be better if some one shared me some example related to connection using ClouSolrServer. Note: I am Using Windows machine for deployment of Solr. And we are indexing data from database using DIH Thanks, Dharmendra jaiswal -- View this message in context: http://lucene.472066.n3.nabble.com/Can-we-used-CloudSolrServer-for-searching-data-tp4086766.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Dynamic Query Analyzer
Hi, We have a need to specify a different query analyzer depending on input parameters dynamically. We need this so that we can use different stopword lists at query time. Would any one know how I might be able to achieve this in solr? I'm aware of the solution to specify different field types, each with a different query analyzer, but I'd like not to have to index the field multiple times. Many thanks Dab
Re: SolrCloud Set up
I think I have it all sorted out. There are some weird network issues here where my test set up is, so that may have been part of the over all issue. Timeouts wouldn't have fixed this issue, that's for sure. On Sat, Aug 31, 2013 at 7:17 AM, Erick Erickson erickerick...@gmail.comwrote: bq: Though I am seeing some funkiness that I wasn't seeing with Solr Zookeeper running together Then I suspect you've set something up inconsistently, _or_ you need to extend some timeouts because SolrCloud is being run with separate ZKs by quite a few people so I'd be surprised if it were anything except config issues. If you _do_ uncover something in that realm other than timeouts and such, we need to know Best, Erick On Fri, Aug 30, 2013 at 2:15 PM, Jared Griffith jgriff...@picsauditing.comwrote: That's what I was thinking. Though I am seeing some funkiness that I wasn't seeing with Solr Zookeeper running together. On Fri, Aug 30, 2013 at 9:40 AM, Shawn Heisey s...@elyograg.org wrote: On 8/30/2013 9:43 AM, Jared Griffith wrote: One last thing. Is there any real benefit in running SolrCloud and Zookeeper separate? I am seeing some funkiness with the separation of the two, funkiness I wasn't seeing when running SolrCloud + Zookeeper together as outlined in the Wiki. For a robust install, you want zookeeper to be a separate process. It can run on the same server as Solr, but the embedded zookeeper (-DzkRun) should not be used except for dev and proof of concept work. The reason is simple. Zookeeper is the central coordinator for SolrCloud. In order for it to remain stable, it should not be restarted without good reason. If you are running zookeeper as part of Solr, then you will be affecting zookeeper operation anytime you restart that instance of Solr. Making changes to your Solr setup often requires that you restart Solr. This includes upgrading Solr and changing some aspects of its configuration. Some configuration aspects can be changed with just a collection reload, but others require a full application restart. Thanks, Shawn -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC
DIH + Solr Cloud
Hi, Quick question about data import handlers in Solr cloud. Does anyone use more than one instance to support the DIH process? Or is the typical setup to have one box setup as only the DIH and keep this responsibility outside of the Solr cloud environment? I'm just trying to get picture of his this is typically deployed. Thanks! Alejandro
Re: SolrCloud - Path must not end with / character
Interesting because I was getting the issue when I was passing the full path (without the trailing / ) to Tomcat. On Mon, Sep 2, 2013 at 11:34 PM, Prasi S prasi1...@gmail.com wrote: The issue is resolved. I have given all the path inside tomcat as relative paths( solr home, solr war). That was the creating the problem. On Mon, Sep 2, 2013 at 2:19 PM, Prasi S prasi1...@gmail.com wrote: Does this have anyting to do with tomcat? I cannot go back as we already fixed with tomcat. Any suggestions pls. The same setup , if i copy and run it on a different machine, it works fine. Am not sure what is missing. Is it because of some system parameter getting set? On Fri, Aug 30, 2013 at 9:11 PM, Jared Griffith jgriff...@picsauditing.com wrote: I was getting the same errors when trying to implement SolrCloud with Tomcat. I eventually gave up until something came out of this thread. This all works if you just ditch Tomcat and go with the native Jetty server. On Fri, Aug 30, 2013 at 6:28 AM, Prasi S prasi1...@gmail.com wrote: Also, this fails with the default solr 4.4 downlaoded configuration too On Fri, Aug 30, 2013 at 4:19 PM, Prasi S prasi1...@gmail.com wrote: Below is the script i run START /MAX F:\SolrCloud\zookeeper\zk-server-1\zookeeper-3.4.5\bin\zkServer.cmd START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182 -confdir solr-conf -confname solrconf1 START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2182 -collection firstcollection -confname solrconf1 -solrhome ../tomcat1/solr1 START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2182 -confdir solr-conf -confname solrconf2 START /MAX F:\solrcloud\zookeeper java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2182 -collection seccollection -confname solrconf2 -solrhome ../tomcat1/solr1 START /MAX F:\solrcloud\tomcat1\bin\startup.bat START /MAX F:\solrcloud\tomcat2\bin\startup.bat On Fri, Aug 30, 2013 at 4:07 PM, Prasi S prasi1...@gmail.com wrote: Im still clueless on where the issue could be. There is no much information in the solr logs. i had a running version of cloud in another server. I have copied the same to this server, and started zookeeper, then ran teh below commands, java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2181 -confdir solr-conf -confname solrconfindex java -classpath .;solr-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:2181 -collection colindexer -confname solrconfindex -solrhome ../tomcat1/solr1 After this, when i started tomcat, the first tomcat starts fine. When the second tomcat is started, i get the above exception and it stops. Tehn the first tomcat also shows teh same exception. On Thu, Aug 29, 2013 at 7:18 PM, Mark Miller markrmil...@gmail.com wrote: Yeah, you see this when the core could not be created. Check the logs to see if you can find something more useful. I ran into this again the other day - it's something we should fix. You see the same thing in the UI when a core cannot be created and it gives you no hint about the problem and is confusing. - Mark On Aug 29, 2013, at 5:23 AM, sathish_ix skandhasw...@inautix.co.in wrote: Hi , Check your configuration files uploaded into zookeeper is valid and no error in config files uploaded. I think due to this error, solr core will not be created. Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Path-must-not-end-with-character-tp4087159p4087182.html Sent from the Solr - User mailing list archive at Nabble.com. -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC
Re: Solr 4.2 update/extract adding unknown field, can we change field type from string to text
Your email is vague in terms of what you are actually *doing* and what behavior you are seeing. Providing specific details like This is my schema.xml and this is my solrconfig.xml; when i POST this file to this URL i get this result and i would instead like to get this result is useful for other people to provide you with meaningful help... https://wiki.apache.org/solr/UsingMailingLists My best guess is that you are refering specifically to the behavior of ExtractingRequestHandler and the fields it tries to include in documents that are exstracted, and how those fileds are indexed -- in which case you can use the uprefix option to add a prefix to the name of all fields generated by Tika that aren't already in your schema, and you can then define a dynamicField matching hat prefix to ontrol every aspect of the resulting fields... https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika#UploadingDatawithSolrCellusingApacheTika-InputParameters -Hoss
Re: SolrCloud Set up
Ah, thanks for the closure, it's always nice to know. I used to work with a guy who had a list of network fallacies, that amounted to you can't trust them fully Erick On Tue, Sep 3, 2013 at 12:12 PM, Jared Griffith jgriff...@picsauditing.comwrote: I think I have it all sorted out. There are some weird network issues here where my test set up is, so that may have been part of the over all issue. Timeouts wouldn't have fixed this issue, that's for sure. On Sat, Aug 31, 2013 at 7:17 AM, Erick Erickson erickerick...@gmail.com wrote: bq: Though I am seeing some funkiness that I wasn't seeing with Solr Zookeeper running together Then I suspect you've set something up inconsistently, _or_ you need to extend some timeouts because SolrCloud is being run with separate ZKs by quite a few people so I'd be surprised if it were anything except config issues. If you _do_ uncover something in that realm other than timeouts and such, we need to know Best, Erick On Fri, Aug 30, 2013 at 2:15 PM, Jared Griffith jgriff...@picsauditing.comwrote: That's what I was thinking. Though I am seeing some funkiness that I wasn't seeing with Solr Zookeeper running together. On Fri, Aug 30, 2013 at 9:40 AM, Shawn Heisey s...@elyograg.org wrote: On 8/30/2013 9:43 AM, Jared Griffith wrote: One last thing. Is there any real benefit in running SolrCloud and Zookeeper separate? I am seeing some funkiness with the separation of the two, funkiness I wasn't seeing when running SolrCloud + Zookeeper together as outlined in the Wiki. For a robust install, you want zookeeper to be a separate process. It can run on the same server as Solr, but the embedded zookeeper (-DzkRun) should not be used except for dev and proof of concept work. The reason is simple. Zookeeper is the central coordinator for SolrCloud. In order for it to remain stable, it should not be restarted without good reason. If you are running zookeeper as part of Solr, then you will be affecting zookeeper operation anytime you restart that instance of Solr. Making changes to your Solr setup often requires that you restart Solr. This includes upgrading Solr and changing some aspects of its configuration. Some configuration aspects can be changed with just a collection reload, but others require a full application restart. Thanks, Shawn -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC
Re: distributed query result order tie break question
: like to understand how the ordering is defined so that I can compute an : integer that is sorted in the same way. For example (shard id 24) | : docid or something like that. If you want to ensure a consistent ordering, you have to index a (unique) value that you use as a secondary sort -- you can't trust the internal docids will remain unchanged. -Hoss
Re: SolrCloud Set up
Those are the Fallacies of Distributed Computing from L. Peter Deutsch. The first fallacy is The network is reliable. http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing wunder On Sep 3, 2013, at 10:26 AM, Erick Erickson wrote: Ah, thanks for the closure, it's always nice to know. I used to work with a guy who had a list of network fallacies, that amounted to you can't trust them fully Erick On Tue, Sep 3, 2013 at 12:12 PM, Jared Griffith jgriff...@picsauditing.comwrote: I think I have it all sorted out. There are some weird network issues here where my test set up is, so that may have been part of the over all issue. Timeouts wouldn't have fixed this issue, that's for sure. On Sat, Aug 31, 2013 at 7:17 AM, Erick Erickson erickerick...@gmail.com wrote: bq: Though I am seeing some funkiness that I wasn't seeing with Solr Zookeeper running together Then I suspect you've set something up inconsistently, _or_ you need to extend some timeouts because SolrCloud is being run with separate ZKs by quite a few people so I'd be surprised if it were anything except config issues. If you _do_ uncover something in that realm other than timeouts and such, we need to know Best, Erick On Fri, Aug 30, 2013 at 2:15 PM, Jared Griffith jgriff...@picsauditing.comwrote: That's what I was thinking. Though I am seeing some funkiness that I wasn't seeing with Solr Zookeeper running together. On Fri, Aug 30, 2013 at 9:40 AM, Shawn Heisey s...@elyograg.org wrote: On 8/30/2013 9:43 AM, Jared Griffith wrote: One last thing. Is there any real benefit in running SolrCloud and Zookeeper separate? I am seeing some funkiness with the separation of the two, funkiness I wasn't seeing when running SolrCloud + Zookeeper together as outlined in the Wiki. For a robust install, you want zookeeper to be a separate process. It can run on the same server as Solr, but the embedded zookeeper (-DzkRun) should not be used except for dev and proof of concept work. The reason is simple. Zookeeper is the central coordinator for SolrCloud. In order for it to remain stable, it should not be restarted without good reason. If you are running zookeeper as part of Solr, then you will be affecting zookeeper operation anytime you restart that instance of Solr. Making changes to your Solr setup often requires that you restart Solr. This includes upgrading Solr and changing some aspects of its configuration. Some configuration aspects can be changed with just a collection reload, but others require a full application restart. Thanks, Shawn -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC -- Walter Underwood wun...@wunderwood.org
Re: Dynamic Query Analyzer
You don't need to index fields several times, you can index is just into one field, and use the different query analyzers just to build the query. We're doing this for authors, for example - if query language says =author:einstein, the query parser knows this field should be analyzed differently (that is the part of your application logic, of your query language semantics - so it can vary). The parser will change the 'author' to 'nosynonym_author', this means 'nosynonym_author' analyzer to be used for analysis phase, and after the query has been prepared, we 'simply' change the query field from 'nosynonym_author' into 'author'. Seems complex, but it is actually easy. But it depends on what a query parser you can/want to use. I use this: https://issues.apache.org/jira/browse/LUCENE-5014 roman On Tue, Sep 3, 2013 at 11:41 AM, Daniel Rosher rosh...@gmail.com wrote: Hi, We have a need to specify a different query analyzer depending on input parameters dynamically. We need this so that we can use different stopword lists at query time. Would any one know how I might be able to achieve this in solr? I'm aware of the solution to specify different field types, each with a different query analyzer, but I'd like not to have to index the field multiple times. Many thanks Dab
Re: Problem parsing suggest response
: 2. The items at and l are not preceded by name. you're getting back a list of items, the odd items (at, l) are strings, and the even items are more complex objects associated with those strings : Can I interfere with the structure? You can choose how the JSON Writer represents the internal structures of pairs contained inside of a NamedList using the json.nl option. by default it is json.nl is flat (the alternative list mentioned above) but you can also arrarr (list of 2 item lists) or map which is souds like what you are looking for -- however it's important to realize that the the map option can in some situations generate the same key multiple times depending on the situation / internal data. This is valid JSON, but many client libraries can't handle it, or handle it in a way that users don't like -- hence it is not hte default. https://wiki.apache.org/solr/SolJSON#JSON_specific_parameters -Hoss
Re: Dynamic Query Analyzer
Sounds like it would be better for you to preprocess the query in your application layer. Your requirements seem too open-ended to wire into Solr. But, to be sure, please elaborate exactly what sort of variations you need in query analysis. -- Jack Krupansky -Original Message- From: Daniel Rosher Sent: Tuesday, September 03, 2013 11:41 AM To: solr-user Subject: Dynamic Query Analyzer Hi, We have a need to specify a different query analyzer depending on input parameters dynamically. We need this so that we can use different stopword lists at query time. Would any one know how I might be able to achieve this in solr? I'm aware of the solution to specify different field types, each with a different query analyzer, but I'd like not to have to index the field multiple times. Many thanks Dab
Re: SolrCloud Set up
Thankfully it's none of those but more than likely a bad DHCP server (Windows) or client (or combo there of) that is causing the network to freak out. I'll try adjusting the timeouts up to see if it will alleviate this. I am seeing that when I try to restart the solr instances sometimes they seem to not join the cluster at all (nothing in the logs about issues). Even after restarting the nodes that are reporting down a couple of times, they never join the cluster again. I have 3 zookeeper instances on 3 separate physical machines, and 4 solr instances running on the same machine. On Tue, Sep 3, 2013 at 10:38 AM, Walter Underwood wun...@wunderwood.orgwrote: Those are the Fallacies of Distributed Computing from L. Peter Deutsch. The first fallacy is The network is reliable. http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing wunder On Sep 3, 2013, at 10:26 AM, Erick Erickson wrote: Ah, thanks for the closure, it's always nice to know. I used to work with a guy who had a list of network fallacies, that amounted to you can't trust them fully Erick On Tue, Sep 3, 2013 at 12:12 PM, Jared Griffith jgriff...@picsauditing.comwrote: I think I have it all sorted out. There are some weird network issues here where my test set up is, so that may have been part of the over all issue. Timeouts wouldn't have fixed this issue, that's for sure. On Sat, Aug 31, 2013 at 7:17 AM, Erick Erickson erickerick...@gmail.com wrote: bq: Though I am seeing some funkiness that I wasn't seeing with Solr Zookeeper running together Then I suspect you've set something up inconsistently, _or_ you need to extend some timeouts because SolrCloud is being run with separate ZKs by quite a few people so I'd be surprised if it were anything except config issues. If you _do_ uncover something in that realm other than timeouts and such, we need to know Best, Erick On Fri, Aug 30, 2013 at 2:15 PM, Jared Griffith jgriff...@picsauditing.comwrote: That's what I was thinking. Though I am seeing some funkiness that I wasn't seeing with Solr Zookeeper running together. On Fri, Aug 30, 2013 at 9:40 AM, Shawn Heisey s...@elyograg.org wrote: On 8/30/2013 9:43 AM, Jared Griffith wrote: One last thing. Is there any real benefit in running SolrCloud and Zookeeper separate? I am seeing some funkiness with the separation of the two, funkiness I wasn't seeing when running SolrCloud + Zookeeper together as outlined in the Wiki. For a robust install, you want zookeeper to be a separate process. It can run on the same server as Solr, but the embedded zookeeper (-DzkRun) should not be used except for dev and proof of concept work. The reason is simple. Zookeeper is the central coordinator for SolrCloud. In order for it to remain stable, it should not be restarted without good reason. If you are running zookeeper as part of Solr, then you will be affecting zookeeper operation anytime you restart that instance of Solr. Making changes to your Solr setup often requires that you restart Solr. This includes upgrading Solr and changing some aspects of its configuration. Some configuration aspects can be changed with just a collection reload, but others require a full application restart. Thanks, Shawn -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC -- Walter Underwood wun...@wunderwood.org -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC
Re: SolrCloud - shard containing an invalid host:port
Was it a test instance that you created 8983 is the default port, so possibly you started an instance before you had the ports setup properly, and it registered in zookeeper as a valid instance. You can use the Core API to UNLOAD it (if it is still running), if it isn't running anymore, I have yet to find a way to remove something from ZK We normally end up wiping zoo_data and bouncing everything at that point, instances should re-register themselves as they start up. But that is the sledgehammer to crack a walnut approach. :) On 3 September 2013 13:55, Marc des Garets m...@ttux.net wrote: Hi, I have setup SolrCloud with tomcat. I use solr 4.1. I have zookeeper running on 192.168.1.10. A tomcat running solr_myidx on 192.168.1.10 on port 8080. A tomcat running solr_myidx on 192.168.1.11 on port 8080. My solr.xml is like this: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true collection.configName=myidx cores adminPath=/admin/cores defaultCoreName=collection1 hostPort=8080 hostContext=solr_myidx zkClientTimeout=2 core name=collection1 instanceDir=./ /cores /solr I have tomcat starting with: -Dbootstrap_conf=true -DzkHost= 192.168.1.10:2181 Both tomcat startup all good but when I go to the Cloud tab in the solr admin, I see the following: collection1 -- shard1 -- 192.168.1.10:8983/solr 192.168.1.11:8080/solr_ugc 192.168.1.10:8080/solr_ugc I don't know what is 192.168.1.10:8983/solr doing there. Do you know how I can remove it? It's causing the following error when I try to query the index: SEVERE: Error while trying to recover. core=collection1:org.apache.** solr.client.solrj.**SolrServerException: Server refused connection at: http://192.168.10.206:8983/**solr http://192.168.10.206:8983/solr Thanks, Marc
Re: SolrCloud Set up
Yep, that's the one, thanks... On Tue, Sep 3, 2013 at 1:38 PM, Walter Underwood wun...@wunderwood.orgwrote: Those are the Fallacies of Distributed Computing from L. Peter Deutsch. The first fallacy is The network is reliable. http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing wunder On Sep 3, 2013, at 10:26 AM, Erick Erickson wrote: Ah, thanks for the closure, it's always nice to know. I used to work with a guy who had a list of network fallacies, that amounted to you can't trust them fully Erick On Tue, Sep 3, 2013 at 12:12 PM, Jared Griffith jgriff...@picsauditing.comwrote: I think I have it all sorted out. There are some weird network issues here where my test set up is, so that may have been part of the over all issue. Timeouts wouldn't have fixed this issue, that's for sure. On Sat, Aug 31, 2013 at 7:17 AM, Erick Erickson erickerick...@gmail.com wrote: bq: Though I am seeing some funkiness that I wasn't seeing with Solr Zookeeper running together Then I suspect you've set something up inconsistently, _or_ you need to extend some timeouts because SolrCloud is being run with separate ZKs by quite a few people so I'd be surprised if it were anything except config issues. If you _do_ uncover something in that realm other than timeouts and such, we need to know Best, Erick On Fri, Aug 30, 2013 at 2:15 PM, Jared Griffith jgriff...@picsauditing.comwrote: That's what I was thinking. Though I am seeing some funkiness that I wasn't seeing with Solr Zookeeper running together. On Fri, Aug 30, 2013 at 9:40 AM, Shawn Heisey s...@elyograg.org wrote: On 8/30/2013 9:43 AM, Jared Griffith wrote: One last thing. Is there any real benefit in running SolrCloud and Zookeeper separate? I am seeing some funkiness with the separation of the two, funkiness I wasn't seeing when running SolrCloud + Zookeeper together as outlined in the Wiki. For a robust install, you want zookeeper to be a separate process. It can run on the same server as Solr, but the embedded zookeeper (-DzkRun) should not be used except for dev and proof of concept work. The reason is simple. Zookeeper is the central coordinator for SolrCloud. In order for it to remain stable, it should not be restarted without good reason. If you are running zookeeper as part of Solr, then you will be affecting zookeeper operation anytime you restart that instance of Solr. Making changes to your Solr setup often requires that you restart Solr. This includes upgrading Solr and changing some aspects of its configuration. Some configuration aspects can be changed with just a collection reload, but others require a full application restart. Thanks, Shawn -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC -- Walter Underwood wun...@wunderwood.org
Re: Starting Solr in Tomcat with specifying ZK host(s)
On 9/3/2013 4:13 AM, maephisto wrote: I've setup a ZK instance and also deployed Solr in Tomcat7 on a different instance in Amazon EC2. Afterwards I tried starting tomcat specifying the ZK host IP, like so: sudo service tomcat7 start -DzkHost=zk ip:2181 -DnumShards=3 -Dcollection.configName=myconf -Dbootstrap_confdir=/usr/share/solr/example/solr/collection1/conf Solr loads fine, but is not in the cloud. The tomcat init script likely does not pay attention to anything that you put on the commandline other than a command (like start/stop/status) for the service. The java command is buried in that script. It works with jetty because you are running java directly, not a script. Helping you with tomcat is outside the scope of this mailing list, but you may be able to modify the JAVA_OPTS environment variable in a file with a name like one of the following: /etc/default/tomcat7 /etc/sysconfig/tomcat7 Many init scripts for packaged software will load environment information from a central user-modifiable config file. If this information is not directly usable to you, please consult a tomcat mailing list, IRC channel, or other support avenue. Although Solr does usually work with tomcat, there is no official testing. Solr is only tested using the Jetty that is bundled with it. Side note: I hope you realize that if you're only connecting to one zookeeper instance, then SolrCloud will not function if that zookeeper instance goes down. You need three instances minimum (running on separate hardware) for robust operation, and Solr must know about all of them. Thanks, Shawn
Re: Apostrophes in fields
On 9/3/2013 3:59 AM, devendra W wrote: in my case - the fields with apostrophe not returned in results Don't use special characters in field names. If it wouldn't work as an variable name, function name (or other identifier) in a typical programming language (Java, C, Perl), then it will probably cause you problems with a field name. This basically means: 7-bit ASCII only. Starts with a letter, contains only letters, numbers, and the underscore. Most punctuation other than the underscore has a special meaning to Solr. Using extended characters (UTF-8, or those beyond 7-bit ASCII) *might* work, but it's fairly easy to screw that up and use the wrong character set, so it's better if you just don't do it. Thanks, Shawn
Solr Cloud hangs when replicating updates
I was having problems updating SolrCloud with a large batch of records. The records are coming in bursts with lulls between updates. At first, I just tried large updates of 100,000 records at a time. Eventually, this caused Solr to hang. When hung, I can still query Solr. But I cannot do any deletes or other updates to the index. At first, my updates were going as SolrJ CSV posts. I have also tried local file updates and had similar results. I finally slowed things down to just use SolrJ's Update feature, which is basically just JavaBin. I am also sending over just 100 at a time in 10 threads. Again, it eventually hung. Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs right away. These are my commit settings: autoCommit maxTime15000/maxTime maxDocs5000/maxDocs openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime3/maxTime /autoSoftCommit I have tried quite a few variations with the same results. I also tried various JVM settings with the same results. The only variable seems to be that reducing the cluster size from 2 to 1 is the only thing that helps. I also did a jstack trace. I did not see any explicit deadlocks, but I did see quite a few threads in WAITING or TIMED_WAITING. It is typically something like this: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00074039a450 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474) at org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395) at org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44) at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364) at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) It basically appears that Solr gets stuck while trying to acquire a semaphore that never becomes available. Anyone have any ideas? This is definitely causing major problems for us. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL
Re: Apostrophes in fields
Show us your full field type with analyzer. I suspect that the problem is that one of the index-time filters is turning dev's into devs (WDF does that), but at query-time there is no filter that removes a trailing apostrophe. Use the Solr Admin UI Analysis page to see home dev's gets indexed and how dev' gets analyzed at query time. -- Jack Krupansky -Original Message- From: devendra W Sent: Tuesday, September 03, 2013 5:59 AM To: solr-user@lucene.apache.org Subject: Re: Apostrophes in fields in my case - the fields with apostrophe not returned in results When I search for -- dev it shows me following results dev dev's devendra but when I search for -- dev' (dev with apo only) Nothing comes out as result ? What could be the workaround ? Thanks Devendra -- View this message in context: http://lucene.472066.n3.nabble.com/Apostrophes-in-fields-tp475058p4087910.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Change the score of a document based on the *value* of a multifield using dismax
If you want to alter the score in a customized way based on indexed text data on a per-value basis then index Lucene payloads, and use PayloadTermQuery. See the javadocs for PayloadTermQuery in particular and follow the references. This is a bit dated but read this: http://searchhub.org/2009/08/05/getting-started-with-payloads/ You can get this done. Almost anything is doable if you have sufficient time and determination. ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Change-the-score-of-a-document-based-on-the-value-of-a-multifield-tp4087503p4088086.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: distributed query result order tie break question
On 09/03/2013 12:50 PM, Chris Hostetter wrote: : like to understand how the ordering is defined so that I can compute an : integer that is sorted in the same way. For example (shard id 24) | : docid or something like that. If you want to ensure a consistent ordering, you have to index a (unique) value that you use as a secondary sort -- you can't trust the internal docids will remain unchanged. Thanks, Hoss - that was the conclusion that I was coming to. It's good to have it confirmed. -Mike
Re: Solr 4.3: Recovering from Too many values for UnInvertedField faceting on field
Our index is too large to uninvert on the fly, so we've been looking into using DocValues to keep a particular field uninverted at index time. See http://wiki.apache.org/solr/DocValues I don't know if this will solve your problem, but it might be worth trying it out. -Greg On Tue, Sep 3, 2013 at 7:04 AM, Dennis Schafroth den...@indexdata.com wrote: We are harvesting and indexing bibliographic data, thus having many distinct author names in our index. While testing Solr 4 I believe I had pushed a single core to 100 million records (91GB of data) and everything was working fine and fast. After adding a little more to the index, then following started to happen: 17328668 [searcherExecutor-4-thread-1] WARN org.apache.solr.core.SolrCore – Approaching too many values for UnInvertedField faceting on field 'author_exact' : bucket size=16726546 17328701 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore – UnInverted multi-valued field {field=author_exact,memSize=336715415,tindexSize=5001903,time=31595,phase1=31465,nTerms=12048027,bigTerms=0,termInstances=57751332,uses=0} 18103757 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field author_exact at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:181) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664) I can see that we reached a limit of bucket size. Is there a way to adjust this? The index also seem to explode in size (217GB). Thinking that I had reached a limit for what a single core could handle in terms of facet, I deleted records in the index, but even now at 1/3 (32 million) it will still fails with above error. I have optimised with expungeDeleted=true. The index is somewhat larger (76GB) than I would have expected. While we can still use the index and get facets back using enum method on that field, I would still like a way to fix the index if possible. Any suggestions? cheers, :-Dennis
SolrCloud 4.x hangs under high update volume
Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1030) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:201) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at
mm, tie, qs, ps and CJKBigramFilter and edismax and dismax
When I have a field using CJKBigramFilter, parsed CJK chars have a different parsedQuery than non-CJK queries. (旧小说 is 3 chars, so 2 bigrams) args sent in: q={!qf=bi_fld}旧小说pf=pf2=pf3= debugQuery str name=rawquerystring{!qf=bi_fld}旧小说/str str name=querystring{!qf=bi_fld}旧小说/str str name=parsedquery(+DisjunctionMaxQuerybi_fld:旧小 bi_fld:小说)~2))~0.01) ())/no_coord/str str name=parsedquery_toString+(((bi_fld:旧小 bi_fld:小说)~2))~0.01 ()/str If i use a non-CJK query string, with the same field: args sent in: q={!qf=bi_fld}foo barpf=pf2=pf3= debugQuery: str name=rawquerystring{!qf=bi_fld}foo bar/str str name=querystring{!qf=bi_fld}foo bar/str str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:foo)~0.01) DisjunctionMaxQuery((bi_fld:bar)~0.01))~2))/no_coord/str str name=parsedquery_toString+(((bi_fld:foo)~0.01 (bi_fld:bar)~0.01)~2)/str Why are the parsedquery_toString formula different? And is there any difference in the actual relevancy formula? How can you tell the difference between the MinNrShouldMatch and a qs or ps or tie value, if they are all represented as ~n in the parsedQuery string? To try to get a handle on qs, ps, tie and mm: args: q={!qf=bi_fld pf=bi_fld}a b c dqs=5ps=4 debugQuery: str name=rawquerystring{!qf=bi_fld pf=bi_fld}a b c d/str str name=querystring{!qf=bi_fld pf=bi_fld}a b c d/str str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) DisjunctionMaxQuery((bi_fld:c d~4)~0.01))/no_coord/str str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 (bi_fld:d)~0.01)~3) (bi_fld:c d~4)~0.01/str I get that qs, the query slop, is for explicit phrases in the query, so a b~5 makes sense. I also get that ps is for boosting of phrases, so I get (bi_fld:c d~4) … but where is (cjk_uni_pub_search:a b c d~4) ? Using dismax (instead of edismax): args: q={!dismax qf=bi_fld pf=bi_fld}a b c dqs=5ps=4 debugQuery: str name=rawquerystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str str name=querystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) DisjunctionMaxQuery((bi_fld:a b c d~4)~0.01))/no_coord/str str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 (bi_fld:d)~0.01)~3) (bi_fld:a b c d~4)~0.01/str So is this an edismax bug? FYI, I am running Solr 4.4. I have fields defined like so: fieldtype name=text_cjk_bi class=solr.TextField positionIncrementGap=1 autoGeneratePhraseQueries=false analyzer tokenizer class=solr.ICUTokenizerFactory / filter class=solr.CJKWidthFilterFactory/ filter class=solr.ICUTransformFilterFactory id=Traditional-Simplified/ filter class=solr.ICUTransformFilterFactory id=Katakana-Hiragana/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.CJKBigramFilterFactory han=true hiragana=true katakana=true hangul=true outputUnigrams=false / /analyzer /fieldtype The request handler uses edismax: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=defTypeedismax/str str name=q.alt:/str str name=mm6-1 690%/str int name=qs1/int int name=ps0/int
Re: mm, tie, qs, ps and CJKBigramFilter and edismax and dismax
Re the relevancy changes I note below for edismax, there are already some issues filed: pertaining to the difference in how the phrase queries are merged into the main query: See Michael Dodsworth's comment of 25/Sep/12 on this issue: https://issues.apache.org/jira/browse/SOLR-2058 -- ticket is closed, but this issue is not addressed. and pertaining to skipping terms in phrase boosting when part of the query is a phrase: https://issues.apache.org/jira/browse/SOLR-4130 - Naomi On Sep 3, 2013, at 5:54 PM, Naomi Dushay wrote: When I have a field using CJKBigramFilter, parsed CJK chars have a different parsedQuery than non-CJK queries. (旧小说 is 3 chars, so 2 bigrams) args sent in: q={!qf=bi_fld}旧小说pf=pf2=pf3= debugQuery str name=rawquerystring{!qf=bi_fld}旧小说/str str name=querystring{!qf=bi_fld}旧小说/str str name=parsedquery(+DisjunctionMaxQuerybi_fld:旧小 bi_fld:小说)~2))~0.01) ())/no_coord/str str name=parsedquery_toString+(((bi_fld:旧小 bi_fld:小说)~2))~0.01 ()/str If i use a non-CJK query string, with the same field: args sent in: q={!qf=bi_fld}foo barpf=pf2=pf3= debugQuery: str name=rawquerystring{!qf=bi_fld}foo bar/str str name=querystring{!qf=bi_fld}foo bar/str str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:foo)~0.01) DisjunctionMaxQuery((bi_fld:bar)~0.01))~2))/no_coord/str str name=parsedquery_toString+(((bi_fld:foo)~0.01 (bi_fld:bar)~0.01)~2)/str Why are the parsedquery_toString formula different? And is there any difference in the actual relevancy formula? How can you tell the difference between the MinNrShouldMatch and a qs or ps or tie value, if they are all represented as ~n in the parsedQuery string? To try to get a handle on qs, ps, tie and mm: args: q={!qf=bi_fld pf=bi_fld}a b c dqs=5ps=4 debugQuery: str name=rawquerystring{!qf=bi_fld pf=bi_fld}a b c d/str str name=querystring{!qf=bi_fld pf=bi_fld}a b c d/str str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) DisjunctionMaxQuery((bi_fld:c d~4)~0.01))/no_coord/str str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 (bi_fld:d)~0.01)~3) (bi_fld:c d~4)~0.01/str I get that qs, the query slop, is for explicit phrases in the query, so a b~5makes sense. I also get that ps is for boosting of phrases, so I get (bi_fld:c d~4) … but where is (cjk_uni_pub_search:a b c d~4) ? Using dismax (instead of edismax): args: q={!dismax qf=bi_fld pf=bi_fld}a b c dqs=5ps=4 debugQuery: str name=rawquerystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str str name=querystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) DisjunctionMaxQuery((bi_fld:a b c d~4)~0.01))/no_coord/str str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 (bi_fld:d)~0.01)~3) (bi_fld:a b c d~4)~0.01/str So is this an edismax bug? FYI, I am running Solr 4.4. I have fields defined like so: fieldtype name=text_cjk_bi class=solr.TextField positionIncrementGap=1 autoGeneratePhraseQueries=false analyzer tokenizer class=solr.ICUTokenizerFactory / filter class=solr.CJKWidthFilterFactory/ filter class=solr.ICUTransformFilterFactory id=Traditional-Simplified/ filter class=solr.ICUTransformFilterFactory id=Katakana-Hiragana/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.CJKBigramFilterFactory han=true hiragana=true katakana=true hangul=true outputUnigrams=false / /analyzer /fieldtype The request handler uses edismax: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=defTypeedismax/str str name=q.alt:/str str name=mm6-1 690%/str int name=qs1/int int name=ps0/int
Re: mm, tie, qs, ps and CJKBigramFilter and edismax and dismax
The query parser sees q=foo bar as two separate source query terms and analyzes each separately, but q=旧小说 is seen by the query parser as a single source query term and then that one source query term gets tokenized by the query term analyzer as two CJK bigrams. Try q=foo-bar and you should then get comparable structure to the generated queries. -- Jack Krupansky -Original Message- From: Naomi Dushay Sent: Tuesday, September 03, 2013 8:54 PM To: solr-user@lucene.apache.org Subject: mm, tie, qs, ps and CJKBigramFilter and edismax and dismax When I have a field using CJKBigramFilter, parsed CJK chars have a different parsedQuery than non-CJK queries. (旧小说 is 3 chars, so 2 bigrams) args sent in: q={!qf=bi_fld}旧小说pf=pf2=pf3= debugQuery str name=rawquerystring{!qf=bi_fld}旧小说/str str name=querystring{!qf=bi_fld}旧小说/str str name=parsedquery(+DisjunctionMaxQuerybi_fld:旧小 bi_fld:小说)~2))~0.01) ())/no_coord/str str name=parsedquery_toString+(((bi_fld:旧小 bi_fld:小说)~2))~0.01 ()/str If i use a non-CJK query string, with the same field: args sent in: q={!qf=bi_fld}foo barpf=pf2=pf3= debugQuery: str name=rawquerystring{!qf=bi_fld}foo bar/str str name=querystring{!qf=bi_fld}foo bar/str str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:foo)~0.01) DisjunctionMaxQuery((bi_fld:bar)~0.01))~2))/no_coord/str str name=parsedquery_toString+(((bi_fld:foo)~0.01 (bi_fld:bar)~0.01)~2)/str Why are the parsedquery_toString formula different? And is there any difference in the actual relevancy formula? How can you tell the difference between the MinNrShouldMatch and a qs or ps or tie value, if they are all represented as ~n in the parsedQuery string? To try to get a handle on qs, ps, tie and mm: args: q={!qf=bi_fld pf=bi_fld}a b c dqs=5ps=4 debugQuery: str name=rawquerystring{!qf=bi_fld pf=bi_fld}a b c d/str str name=querystring{!qf=bi_fld pf=bi_fld}a b c d/str str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) DisjunctionMaxQuery((bi_fld:c d~4)~0.01))/no_coord/str str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 (bi_fld:d)~0.01)~3) (bi_fld:c d~4)~0.01/str I get that qs, the query slop, is for explicit phrases in the query, so a b~5makes sense. I also get that ps is for boosting of phrases, so I get (bi_fld:c d~4) … but where is (cjk_uni_pub_search:a b c d~4) ? Using dismax (instead of edismax): args: q={!dismax qf=bi_fld pf=bi_fld}a b c dqs=5ps=4 debugQuery: str name=rawquerystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str str name=querystring{!dismax qf=bi_fld pf=bi_fld}a b c d/str str name=parsedquery(+((DisjunctionMaxQuery((bi_fld:a b~5)~0.01) DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) DisjunctionMaxQuery((bi_fld:a b c d~4)~0.01))/no_coord/str str name=parsedquery_toString+(((bi_fld:a b~5)~0.01 (bi_fld:c)~0.01 (bi_fld:d)~0.01)~3) (bi_fld:a b c d~4)~0.01/str So is this an edismax bug? FYI, I am running Solr 4.4. I have fields defined like so: fieldtype name=text_cjk_bi class=solr.TextField positionIncrementGap=1 autoGeneratePhraseQueries=false analyzer tokenizer class=solr.ICUTokenizerFactory / filter class=solr.CJKWidthFilterFactory/ filter class=solr.ICUTransformFilterFactory id=Traditional-Simplified/ filter class=solr.ICUTransformFilterFactory id=Katakana-Hiragana/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.CJKBigramFilterFactory han=true hiragana=true katakana=true hangul=true outputUnigrams=false / /analyzer /fieldtype The request handler uses edismax: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=defTypeedismax/str str name=q.alt:/str str name=mm6-1 690%/str int name=qs1/int int name=ps0/int