solr replication
Hi all, I try to setup solr replication by following http://wiki.apache.org/solr/SolrReplication tutorial.Everything is working ok. My question is, should i define two SolrServer in Solrj (one for master and one for slave) in order to redirect indexes into master and queries into slave? In the future if i want to add new slave then should i change code? What is the recommended way? Thanks in advance. Regards
Re: solr replication
Here is the way I see it (and implemented it), while using SolrJ api you have to fire : - Indexation commands to your /indexation solr instance/ (master) example : http://myMaster:80/myCore/ - Query commands to your /search solr instance/ (slave). You may have several slaves, and also find alternative as broker to make load balancing betweeen each http://mySlave1:80/myCore/ http://mySlave2:80/myCore/ ... You do not need any changes in code normally, replication is made automatically and defined in your solrconfig.xml configuration file. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-tp3687106p3687168.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: index-time over boosted
Hi, it worked (I'm using Solr-3.4.0, not that it matters)!! I'll try to figure out what went wrong ...with my limited skills. The solution omitNorms=true works for now but it's not a long term solution in my opinion. I also need to figure out how to make all that work. Thanks again Jan!! Remi On Tue, Jan 24, 2012 at 5:58 PM, Jan Høydahl jan@cominvent.com wrote: Hi, Well, I think you do it right, but get tricked by either editing the wrong file, a typo or browser caching. Why not try to start with a fresh Solr3.5.0, start the example app, index all exampledocs, search for Podcasts, you get one hit, in fields text and features. Then change solr/example/solr/conf/schema.xml and add omitNorms=true to these two fields. Then stop Solr, delete your index, start Solr, re-index the docs and try again. fieldNorm is now 1.0. Once you get that working you can start debugging where you got it wrong in your own setup. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 24. jan. 2012, at 14:55, remi tassing wrote: Hello, thanks for helping out Jan, I really appreciate that! These are full explains of two results: Result#1.-- 3.0412199E-5 = (MATCH) max of: 3.0412199E-5 = (MATCH) weight(content:mobil broadband^0.5 in 19081), product of: 0.13921623 = queryWeight(content:mobil broadband^0.5), product of: 0.5 = boost 6.3531075 = idf(content: mobil=5270 broadband=2392) 0.043826185 = queryNorm 2.1845297E-4 = fieldWeight(content:mobil broadband in 19081), product of: 3.6055512 = tf(phraseFreq=13.0) 6.3531075 = idf(content: mobil=5270 broadband=2392) 9.536743E-6 = fieldNorm(field=content, doc=19081) Result#2.- 2.6991445E-5 = (MATCH) max of: 2.6991445E-5 = (MATCH) weight(content:mobil broadband^0.5 in 15306), product of: 0.13921623 = queryWeight(content:mobil broadband^0.5), product of: 0.5 = boost 6.3531075 = idf(content: mobil=5270 broadband=2392) 0.043826185 = queryNorm 1.9388145E-4 = fieldWeight(content:mobil broadband in 15306), product of: 1.0 = tf(phraseFreq=1.0) 6.3531075 = idf(content: mobil=5270 broadband=2392) 3.0517578E-5 = fieldNorm(field=content, doc=15306) Remi On Tue, Jan 24, 2012 at 3:38 PM, Jan Høydahl jan@cominvent.com wrote: That looks right. Can you restart your Solr, do a new search with debugQuery=true and copy/paste the full EXPLAIN output for your query? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 24. jan. 2012, at 13:22, remi tassing wrote: Any idea? This is a snippet of my schema.xml now: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more ... !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true/ field name=site type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ field name=content type=text stored=true indexed=true omitNorms=true/ field name=cache type=string stored=true indexed=false/ field name=tstamp type=long stored=true indexed=false/ !-- fields for index-anchor plugin -- field name=anchor type=string stored=true indexed=true multiValued=true/ ... !-- uncomment the following to ignore any fields that don't already match an existing field name or dynamic field, rather than reporting them as an error. alternately, change the type=ignored to some other type e.g. text if you want unknown fields indexed and/or stored by default -- !--dynamicField name=* type=ignored multiValued=true /-- /fields !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent ... /schema Remi On Sun, Jan 22, 2012 at 6:31 PM, remi tassing tassingr...@gmail.com wrote: Hi, I got wrong in beginning but putting omitNorms in the query url. Now following your advice, I merged the schema.xml from Nutch and Solr and made sure omitNorms was set to true for the content, just as you said. Unfortunately the problem remains :-( On Thursday, January 19, 2012, Jan Høydahl jan@cominvent.com wrote: Hi, The schema you pasted in your mail is NOT Solr3.5's default example schema. Did you get it from the Nutch project? And the omitNorms parameter is supposed to go in the field tag in schema.xml, and the content field in the example schema does not have omitNorms=true. Try to change field name=content type=text stored=false
Re: solr replication
Then as you say , shouldn't i define three SolrServer() using SolrJ? For indexing call solrMasterServer, and for querying call solrSlaveServer1 or solrSlaveServer2? On Wed, Jan 25, 2012 at 11:09 AM, darul daru...@gmail.com wrote: Here is the way I see it (and implemented it), while using SolrJ api you have to fire : - Indexation commands to your /indexation solr instance/ (master) example : http://myMaster:80/myCore/ - Query commands to your /search solr instance/ (slave). You may have several slaves, and also find alternative as broker to make load balancing betweeen each http://mySlave1:80/myCore/ http://mySlave2:80/myCore/ ... You do not need any changes in code normally, replication is made automatically and defined in your solrconfig.xml configuration file. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-tp3687106p3687168.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication
You may define your specific configuration as a Grid with all your solr instances and then using SolrJ and CommonsHttpSolrServer choose the right url depending on indexation or search task. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-tp3687106p3687208.html Sent from the Solr - User mailing list archive at Nabble.com.
Difference between #indexed documents and #results in *:* query
Hello, I have seen that I am getting 913 documents indexed: str name=Total Requests made to DataSource1/str str name=Total Rows Fetched913/str str name=Total Documents Skipped0/str str name=Full Dump Started2012-01-25 10:22:39/str str name=Indexing completed. Added/Updated: 913 documents. Deleted 0 documents./str str name=Committed2012-01-25 10:22:44/str str name=Optimized2012-01-25 10:22:44/str str name=Total Documents Processed913/str str name=Time taken 0:0:5.10/str ... and, when I do a search for *:* (all documents) I get 383 results result name=response numFound=383 start=0 maxScore=1.0 Is this normal? if it is not, do you know why it could be this way and what could I do to fix it? Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-indexed-documents-and-results-in-query-tp3687217p3687217.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication
Thank you for your response. What you mean by Grid? Can you please send me any example or any link? On Wed, Jan 25, 2012 at 11:30 AM, darul daru...@gmail.com wrote: You may define your specific configuration as a Grid with all your solr instances and then using SolrJ and CommonsHttpSolrServer choose the right url depending on indexation or search task. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-tp3687106p3687208.html Sent from the Solr - User mailing list archive at Nabble.com.
solr FieldCollapsing, label and locale parameter
Hi, I'm using FieldCollapsing to group the results. Example: I search for /:/ and group by names, like: |http://localhost:port/solr/select/?q=*:* group=true group.limit=200 group.query=Jim group.query=Jon group.query=Frank Sinatra |It looks like, solr is running (internal) an separate query for every name. (whatever) The point is, that i have to change the search parameter local in order to set an different search operator (from OR to AND). To get valid results I need an query like this: |http://localhost:port/solr/select/?q=*:* group=true group.limit=200 group.query={!q.op=AND defType=edismax}Jim group.query={!q.op=AND defType=edismax}Jon group.query={!q.op=AND defType=edismax}Frank Sinatra |This works very well. The Problem is, that solr returns the label of the group including the locale parameter! |lst name=grouped lst name={!q.op=AND defType=edismax}Frank Sinatra wrong label int name=matches785/intresult name=doclist numFound=10 start=0 doc [...] |An valide result is be: |lst name=grouped lst name=Frank Sinatra int name=matches785/intresult name=doclist numFound=10 start=0 doc [...] |Is there a way to change to label to the real term where solr is searching for? Thanks and best regards Ralf
Re: solr replication
I mean by grid the list of your instances : String masterUrl = http://masterUrl/core/...; String[] slaveUrls = {http://slaveUrl/core/...;, http://slaveUrl/core/...} Then use your business logic to use the correct one with Http solrJ facade. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-tp3687106p3687314.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication
Ok thank you for your response. On Wed, Jan 25, 2012 at 12:24 PM, darul daru...@gmail.com wrote: I mean by grid the list of your instances : String masterUrl = http://masterUrl/core/...; String[] slaveUrls = {http://slaveUrl/core/...;, http://slaveUrl/core/...} Then use your business logic to use the correct one with Http solrJ facade. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-tp3687106p3687314.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: phrase auto-complete with suggester component
Tommy Chheng-2 wrote Thanks, I'll try out the custom class file. Any possibilities this class can be merged into solr? It seems like an expected behavior. On Tue, Jan 24, 2012 at 11:29 AM, O. Klein lt;klein@gt; wrote: You might wanna read http://lucene.472066.n3.nabble.com/suggester-issues-td3262718.html#a3264740 which contains the solution to your problem. -- View this message in context: http://lucene.472066.n3.nabble.com/phrase-auto-complete-with-suggester-component-tp3685572p3685730.html Sent from the Solr - User mailing list archive at Nabble.com. -- Tommy Chheng I agree. Suggester could use some attention. Looking at Wiki there were some features planned, but not much has happened lately. -- View this message in context: http://lucene.472066.n3.nabble.com/phrase-auto-complete-with-suggester-component-tp3685572p3687495.html Sent from the Solr - User mailing list archive at Nabble.com.
Need help
I want to create One search. The implementation is like this I have 4 table in database (Suppose profiles, clients, requirement and case) and having 4 entity in java code. Now I created one jsp which is having one drop down (Which contains all 4 options that's are entity name) one search box and one button. How to I need to configure for this. So that I can search from any table by changing drop down. Regards Shambhu
Re: solr replication
Hi Parvin I did something that may help you. I set up apache (with mod_proxy and mode balance) like a front-end and use this to distruted the request of my aplication. Request for /update or /optmize, i'm redirect to master (or masters) server and requests /search i redirect to slaves. Example: Proxy balancer://solrclusterindex BalancerMember http://127.0.0.1:8080/apache-solr-1.4.1/ disablereuse=On route=jvm1 /Proxy Proxy balancer://solrclustersearch BalancerMember http://127.0.0.1:8080/apache-solr-1.4.1/ disablereuse=On route=jvm1 BalancerMember http://10.16.129.61:8080/apache-solr-1.4.1/ disablereuse=On route=jvm2 /Proxy ProxyPassMatch /solrcluster(.*)/update(.*)$ balancer://solrclusterindex$1/update$2 ProxyPassMatch /solrcluster(.*)/select(.*)$ balancer://solrclustersearch$1/select$2 I hope it helps you
Re: Difference between #indexed documents and #results in *:* query
Hi, No, it's not normal :) Have you tried to hit SHIFT-F5 to make sure you're not getting tricked by browser caching? Or try a slightly different query like id:* You can also visit the Schema browser page of Solr admin and check the stats on how many docs are in the index. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 25. jan. 2012, at 10:35, m0rt0n wrote: Hello, I have seen that I am getting 913 documents indexed: str name=Total Requests made to DataSource1/str str name=Total Rows Fetched913/str str name=Total Documents Skipped0/str str name=Full Dump Started2012-01-25 10:22:39/str str name=Indexing completed. Added/Updated: 913 documents. Deleted 0 documents./str str name=Committed2012-01-25 10:22:44/str str name=Optimized2012-01-25 10:22:44/str str name=Total Documents Processed913/str str name=Time taken 0:0:5.10/str ... and, when I do a search for *:* (all documents) I get 383 results result name=response numFound=383 start=0 maxScore=1.0 Is this normal? if it is not, do you know why it could be this way and what could I do to fix it? Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-indexed-documents-and-results-in-query-tp3687217p3687217.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication
Hi Anderson, Thank you for your effort.I will try this. Hope it will solve my problem. Regards On Wed, Jan 25, 2012 at 2:27 PM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi Parvin I did something that may help you. I set up apache (with mod_proxy and mode balance) like a front-end and use this to distruted the request of my aplication. Request for /update or /optmize, i'm redirect to master (or masters) server and requests /search i redirect to slaves. Example: Proxy balancer://solrclusterindex BalancerMember http://127.0.0.1:8080/apache-solr-1.4.1/ disablereuse=On route=jvm1 /Proxy Proxy balancer://solrclustersearch BalancerMember http://127.0.0.1:8080/apache-solr-1.4.1/ disablereuse=On route=jvm1 BalancerMember http://10.16.129.61:8080/apache-solr-1.4.1/ disablereuse=On route=jvm2 /Proxy ProxyPassMatch /solrcluster(.*)/update(.*)$ balancer://solrclusterindex$1/update$2 ProxyPassMatch /solrcluster(.*)/select(.*)$ balancer://solrclustersearch$1/select$2 I hope it helps you
Re: Need help
Treat one solr schema as a database table, one solr core contains only one schema. So in your case you should define 4 solr cores, each contains a schema matching the database table you have. /shen On Wed, Jan 25, 2012 at 1:08 PM, Shambhu Kumar ss2k...@gmail.com wrote: I want to create One search. The implementation is like this I have 4 table in database (Suppose profiles, clients, requirement and case) and having 4 entity in java code. Now I created one jsp which is having one drop down (Which contains all 4 options that's are entity name) one search box and one button. How to I need to configure for this. So that I can search from any table by changing drop down. Regards Shambhu
Re: Difference between #indexed documents and #results in *:* query
Thanks a lot for your answer; really appreciated. Unfortunately, I am still getting the same number of results: - I tried by refreshing the browser cache. - I tried another search by the ID:* - And went to the http://localhost:8983/solr/browse?q= ... and got the same number of results. (383 results found in 13 ms Page 1 of 1) I don't understand why it says that it is indexing 913 (see below) and it just finds 383, that makes no sense to me and I am starting to go crazy :-) Any further help appreciated. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-indexed-documents-and-results-in-query-tp3687217p3687646.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Difference between #indexed documents and #results in *:* query
Does all your 913 documents contain a unique key? The uniqueKey field is id by default. -- Sami Siren On Wed, Jan 25, 2012 at 3:16 PM, m0rt0n rau...@gmail.com wrote: Thanks a lot for your answer; really appreciated. Unfortunately, I am still getting the same number of results: - I tried by refreshing the browser cache. - I tried another search by the ID:* - And went to the http://localhost:8983/solr/browse?q= ... and got the same number of results. (383 results found in 13 ms Page 1 of 1) I don't understand why it says that it is indexing 913 (see below) and it just finds 383, that makes no sense to me and I am starting to go crazy :-) Any further help appreciated. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-indexed-documents-and-results-in-query-tp3687217p3687646.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: highlighter not supporting surround parser
i want performing span queries using surround parser and i want tos how the results with highlighter, but the problem is highlighter is not working properly with surround query parser.Are their any plugins or updates available to do it. Hi Manyu, You can use https://issues.apache.org/jira/browse/SOLR-3060 for this.
Re: Do Hignlighting + proximity using surround query parser
I got this working the way you describe it (in the getHighlightQuery() method). The span queries were tripping it up, so I extracted the query terms and created a DisMax query from them. There'll be a loss of accuracy in the highlighting, but in my case that's better than no highlighting. Should I just go ahead and submit a patch to SOLR-2703? I think a separate jira ticket would be more appropriate. Scott, I created SOLR-3060 for this.
Query for exact part of sentence
Hi I'm using the pecl PHP class to query SOLR and was wondering how to query for a part of a sentence exactly. There are 2 data items index in SOLR 1327497476: 123 456 789 1327497521. 1234 5678 9011 However when running the query, both data items are returned as you can see below. Any idea why? Thanks! SolrObject Object ( [responseHeader] = SolrObject Object ( [status] = 0 [QTime] = 5016 [params] = SolrObject Object ( [debugQuery] = true [shards] = solr01:8983/solr,solr02:8983/solr,solr03:8983/solr [fl] = id,smsc_module,smsc_ssid,smsc_description,smsc_content,smsc_courseid,smsc_date_created,smsc_date_edited,score,metadata_stream_size,metadata_stream_source_info,metadata_stream_name,metadata_stream_content_type,last_modified,author,title,subject [sort] = smsc_date_created asc [indent] = on [start] = 0 [q] = (smsc_content:\123 456\ || smsc_description:\123 456\) (smsc_module:Intradesk) (smsc_date_created:[2011-12-25T10:29:51Z TO NOW]) (smsc_ssid:38) [distrib] = true [wt] = xml [version] = 2.2 [rows] = 55 ) ) [response] = SolrObject Object ( [numFound] = 2 [start] = 0 [docs] = Array ( [0] = SolrObject Object ( [smsc_module] = Intradesk [smsc_ssid] = 38 [id] = 1327497476 [smsc_courseid] = 0 [smsc_date_created] = 2011-12-25T10:29:51Z [smsc_date_edited] = 2011-12-25T10:29:51Z [score] = 10.028017 ) [1] = SolrObject Object ( [smsc_module] = Intradesk [smsc_ssid] = 38 [id] = 1327497521 [smsc_courseid] = 0 [smsc_date_created] = 2011-12-25T10:29:51Z [smsc_date_edited] = 2011-12-25T10:29:51Z [score] = 5.541335 ) ) ) [debug] = SolrObject Object ( [rawquerystring] = (smsc_content:\123 456\ || smsc_description:\123 456\) (smsc_module:Intradesk) (smsc_date_created:[2011-12-25T10:29:51Z TO NOW]) (smsc_ssid:38) [querystring] = (smsc_content:\123 456\ || smsc_description:\123 456\) (smsc_module:Intradesk) (smsc_date_created:[2011-12-25T10:29:51Z TO NOW]) (smsc_ssid:38) [parsedquery] = +(smsc_content:123 smsc_content:456 smsc_description:123 smsc_content:456) +smsc_module:intradesk +smsc_date_created:[2011-12-25T10:29:51Z TO 2012-01-25T13:33:21.098Z] +smsc_ssid:38 [parsedquery_toString] = +(smsc_content:123 smsc_content:456 smsc_description:123 smsc_content:456) +smsc_module:intradesk +smsc_date_created:[2011-12-25T10:29:51 TO 2012-01-25T13:33:21.098] +smsc_ssid:`#8;#0;#0;#0; [QParser] = LuceneQParser [timing] = SolrObject Object
Re: SpellCheck Help
You have to give us a lot more detail about exactly what you've done and what your results are. Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Tue, Jan 24, 2012 at 7:42 PM, vishal_asc vishal.por...@ascendum.com wrote: I have installed the same solr 3.5 with jetty and integrating it magento 1.11 but it seems to be not working. As my search result is not showing Did you mean string ? when I misspelled any word. I followed all steps necessary for magento solr integration. Please help ASAP. Thanks Vishal -- View this message in context: http://lucene.472066.n3.nabble.com/SpellCheck-Help-tp3648589p3686756.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Difference between #indexed documents and #results in *:* query
BINGO!! Yep, I actually was assuming that the ID field was unique; and after your response I went to my DBA and he told me that it wasn't. Then, I made up a unique key by concattening three fields and that works. Thanks a lot for your very helpful answer! -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-indexed-documents-and-results-in-query-tp3687217p3687970.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing Using XML Message
I have a local data store containing a host of different document types. This data store is separate from a remote Solr install making streaming not an option. Instead I'd like to generate an XML file that contains all of the documents including content and metadata. What would be the most appropriate way to accomplish this? I could use the Tika CLI to generate XML but I'm not sure it would work or that its the most efficient way to handle things. Can anyone offer some suggestions? Thanks - Tod
What is the most basic schema.xml you can have for indexing a simple database?
Is it do-able/sensible to build a schema.xml from the ground up? Say that you are feeding the results of a database query into solr containing the fields id(int), title(varchar), description(varchar), pub_date(date) and tags(varchar) What would be the simplest schema.xml that could support this structure in Solr? Fergus
Re: Problem in Accessing DIH
You need to follow the instructions here: http://wiki.apache.org/solr/DataImportHandler In particular setting up the request handler in solrconfig.xml and creating a data-config.xml file that's referenced in the request handler. When this is done correctly, you should see the request handler you defined as a link ont he dataimport.jsp page. Best Erick On Tue, Jan 24, 2012 at 5:31 AM, dsy99 ds...@rediffmail.com wrote: Dear all, I am using solr3.5 in which i tried to access DIH development console with URL mentioned below but getting the message Select handler. http://localhost:8983/solr/admin/dataimport.jsp May I know, how can I select the handler so that I will be able to display the DIH control form and command output as raw XML. Thanking you. With Regds: Divakar -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-Accessing-DIH-tp3684667p3684667.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Currency field type
There's really no roadmap. If you have big enough need you can work on this patch and submit of for someone to commit but it looks like Greg had other priorities get in the way so you can't count on anyone else carrying this forward.. If you want to integrate the patch, you check out the source, apply the patch to the source and compile. It may not apply cleanly, see the instructions at: http://wiki.apache.org/solr/HowToContribute But this patch doesn't really look like it is doing what you want either, it's the stats component which provides summary information. As for a normalized dollar amount, that's really just converting all prices to USD or EUR or whatever. If you also stored a value for what type the original document is in, you can display the correct currency for documents. The problem here of course is that the normalization is done at index time, and the exchange rates change. Best Erick On Tue, Jan 24, 2012 at 5:37 AM, darul daru...@gmail.com wrote: We may need a specific field to store and search over item prices. Currency can be of different kind, EUR There is an open ticket on Jira, but I do not find a way to integrate patch sources, and it seems to be not closed yet. Any idea of roadmap or expected available date for this powerful enhancement : http://wiki.apache.org/solr/MoneyFieldType Erik said an alternative may be to store it in a normalized way : http://lucene.472066.n3.nabble.com/Stats-help-needed-on-price-field-using-different-currencies-td2978082.html#a2997876 Any ideas ? Thanks, Jul -- View this message in context: http://lucene.472066.n3.nabble.com/Currency-field-type-tp3684682p3684682.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: full import is not working and still not showing any errors
please review: http://wiki.apache.org/solr/UsingMailingLists I infer you're using DIH, but you've never really stated that. What page are you refreshing? What commands have you issued? Have you looked at dataimport.jsp (the DIH debugging page)? Best Erick On Tue, Jan 24, 2012 at 6:01 AM, scabra4 scab...@yahoo.com wrote: hi all, anyone can help me with this please. i am trying to do a full import, i've done everything correctly, now when i try the full import an xml page displays showing the following and i stays like this now matter how i refresh the page: This XML file does not appear to have any style information associated with it. The document tree is shown below. response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=initArgs lst name=defaults str name=configC:\solr\conf\data-config.xml/str /lst /lst str name=commandfull-import/str str name=statusbusy/str str name=importResponseA command is still running.../str lst name=statusMessages str name=Time Elapsed0:5:8.925/str str name=Total Requests made to DataSource1/str str name=Total Rows Fetched0/str str name=Total Documents Processed0/str str name=Total Documents Skipped0/str str name=Full Dump Started2012-01-24 16:29:31/str /lst str name=WARNINGThis response format is experimental. It is likely to change in the future./str/response -- View this message in context: http://lucene.472066.n3.nabble.com/full-import-is-not-working-and-still-not-showing-any-errors-tp3684751p3684751.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is the most basic schema.xml you can have for indexing a simple database?
Hi Fergus, The schema.xml has declaration of fields as well as analyzers/tokenizers which are required as per the application demand. The easiest way is to modify the schema.xml file which is delivered with apache_solr/example/solr/conf. In case you are looking for setting up Solr in front of database with minimal manipulation of DB data, you can check it here http://www.params.me/2011/03/configure-apache-solr-14-with-mysql.html. I am using this setup in of my applications in production. -param On 1/25/12 11:10 AM, Fergus McDowall fergusmcdow...@gmail.com wrote: Is it do-able/sensible to build a schema.xml from the ground up? Say that you are feeding the results of a database query into solr containing the fields id(int), title(varchar), description(varchar), pub_date(date) and tags(varchar) What would be the simplest schema.xml that could support this structure in Solr? Fergus
Re: Not getting the expected search results
First thing is that there's a helpful page for debuging this called dataimport.jsp, see: http://wiki.apache.org/solr/DataImportHandler Second, and this is just a guess, what is your uniqueKey defined in your schema? When Solr adds documents, a document with the same value in the field defined in uniqueKey as a document already in the index causes the old doc to be replaced by the new doc. So it's possible that your select is replacing the document you want in the first example, but not in the second. Look on the admin/stats page. There are two numbers reported here, numDoc and maxDocs. The difference between these is the number of documents that have been deleted from your index. The replacement I outlined above is a delete followed by an add, so if you start with a clean index and do your first import and these numbers are different, then you are having documents replaced... Hope that helps Erick On Tue, Jan 24, 2012 at 7:02 AM, m0rt0n rau...@gmail.com wrote: Hello, I am a newbie in this Solr world and I am getting surprised because I try to do searches, both with the browser interface and by using a Java client and the expected results do not appear. The issue is: 1) I have set up an entity called via in my data-config.xml with 5 fields. I do the full-import and it indexes 1.5M records: entity name=via query=select TVIA, NVIAC, CMUM, CVIA, CPRO from INE_VIAS field column=TVIA name=TVIA / field column=NVIAC name=NVIAC / field column=CMUM name=CMUM / field column=CVIA name=CVIA / field column=CPRO name=CPRO / /entity 2) These 5 fields are mapped in the schema.xml, this way: field name=TVIA type=text_general indexed=true stored=true / field name=NVIAC type=text_general indexed=true stored=true / field name=CMUM type=text_general indexed=true stored=true / field name=CVIA type=string indexed=true stored=true / field name=CPRO type=int indexed=true stored=true / 3) I try to do a search for Alcala street in Madrid: NVIAC:ALCALA AND CPRO:28 AND CMUM:079 But it does just get two results (none of them, the desired one): docstr name=CMUM079/strint name=CPRO28/intstr name=CVIA45363/strstr name=NVIACALCALA GAZULES/strstr name=TVIACALLE/str/doc docstr name=CMUM079/strint name=CPRO28/intstr name=CVIA08116/strstr name=NVIACALCALA GUADAIRA/strstr name=TVIACALLE/str/doc 4) When I do the indexing by delimiting the entity search: entity name=via query=select TVIA, NVIAC, CMUM, CVIA, CPRO from INE_VIAS WHERE NVIAC LIKE '%ALCALA%' The full import does 913 documents and I do the same search, but this time I get the desired result: docstr name=CMUM079/strint name=CPRO28/intstr name=CVIA00132/strstr name=NVIACALCALA/strstr name=TVIACALLE/str/doc Anyone can help me with that? I don't know why it does not work as expected when I do the full-import of the whole lot of streets. Thanks a lot in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Not-getting-the-expected-search-results-tp3684974p3684974.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCell maximum file size
Mostly it depends on your container settings, quite often that's where the limits are. I don't think Solr imposes any restrictions. What size are we talking about anyway? There are implicit issues with how much memory parsing the file requires, but you can allocate lots of memory to the JVM to handle that. Best Erick On Tue, Jan 24, 2012 at 10:24 AM, Augusto Camarotti augu...@prpb.mpf.gov.br wrote: Hi everybody Does anyone knows if there is a maximum file size that can be uploaded to the extractingrequesthandler via http request? Thanks in advance, Augusto Camarotti
Re: Indexing failover and replication
No, there no good ways to have a single slave know about two masters and just use the right one. It sounds like you've got each machine being both a master and a slave? This is not supported. What you probably want to do is either set up a repeater or just index to the two masters and manually change the back to the primary if the primary goes down, having all replication happen from the master. Best Erick On Tue, Jan 24, 2012 at 11:36 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi I'm doing now a test with replication using solr 1.4.1. I configured two servers (server1 and server 2) as master/slave to sincronized both. I put apache on the front side, and we index sometime in server1 and sometime in server2. I realized that the both index servers are now confused. In solr data folder, was created many index folders with the timestamp of syncronization (Exemple: index.20120124041340) with some segments inside. I thought that was possible to index in two master server and than synchronized both using replication. It's really possible do this with replication mechanism? If is possible, what I have done wrong? I need to have more than one node for indexing to guarantee failover feature for indexing. MultiMaster is the best way to guarantee failover feature for indexing? Thanks
RE: HTMLStripCharFilterFactory not working in Solr4?
Hi Mike, Yonik committed a fix to Solr trunk - your test on LUCENE-3721 succeeds for me now. (On Solr trunk, *all* CharFilters have been non-functional since LUCENE-3396 was committed in r1175297 on 25 Sept 2011, until Yonik's fix today in r1235810; Solr 3.x was not affected - CharFilters have been working there all along.) Steve -Original Message- From: Mike Hugo [mailto:m...@piragua.com] Sent: Tuesday, January 24, 2012 3:56 PM To: solr-user@lucene.apache.org Subject: Re: HTMLStripCharFilterFactory not working in Solr4? Thanks for the responses everyone. Steve, the test method you provided also works for me. However, when I try a more end to end test with the HTMLStripCharFilterFactory configured for a field I am still having the same problem. I attached a failing unit test and configuration to the following issue in JIRA: https://issues.apache.org/jira/browse/LUCENE-3721 I appreciate all the prompt responses! Looking forward to finding the root cause of this guy :) If there's something I'm doing incorrectly in the configuration, please let me know! Mike On Tue, Jan 24, 2012 at 1:57 PM, Steven A Rowe sar...@syr.edu wrote: Hi Mike, When I add the following test to TestHTMLStripCharFilterFactory.java on Solr trunk, it passes: public void testNumericCharacterEntities() throws Exception { final String text = Bose#174; #8482;; // |Bose® ™| HTMLStripCharFilterFactory htmlStripFactory = new HTMLStripCharFilterFactory(); htmlStripFactory.init(Collections.String,StringemptyMap()); CharStream charStream = htmlStripFactory.create(CharReader.get(new StringReader(text))); StandardTokenizerFactory stdTokFactory = new StandardTokenizerFactory(); stdTokFactory.init(DEFAULT_VERSION_PARAM); Tokenizer stream = stdTokFactory.create(charStream); assertTokenStreamContents(stream, new String[] { Bose }); } What's happening: First, htmlStripFactory converts #174; to ® and #8482; to ™. Then stdTokFactory declines to tokenize ® and ™, because they are belong to the Unicode general category Symbol, Other, and so are not included in any of the output tokens. StandardTokenizer uses the Word Break rules find UAX#29 http://unicode.org/reports/tr29/ to find token boundaries, and then outputs only alphanumeric tokens. See the JFlex grammar for details: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/analysis/common/src/ java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex?view= markup . The behavior you're seeing is not consistent with the above test. Steve -Original Message- From: Mike Hugo [mailto:m...@piragua.com] Sent: Tuesday, January 24, 2012 1:34 PM To: solr-user@lucene.apache.org Subject: HTMLStripCharFilterFactory not working in Solr4? We recently updated to the latest build of Solr4 and everything is working really well so far! There is one case that is not working the same way it was in Solr 3.4 - we strip out certain HTML constructs (like trademark and registered, for example) in a field as defined below - it was working in Solr3.4 with the configuration shown here, but is not working the same way in Solr4. The label field is defined as type=text_general field name=label type=text_general indexed=true stored=false required=false multiValued=true/ Here's the type definition for text_general field: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType In Solr 3.4, that configuration was completely stripping html constructs out of the indexed field which is exactly what we wanted. If for example, we then do a facet on the label field, like in the test below, we're getting some terms in the response that we would not like to be there. // test case (groovy) void specialHtmlConstructsGetStripped() { SolrInputDocument inputDocument = new SolrInputDocument() inputDocument.addField('label', 'Bose#174; #8482;') solrServer.add(inputDocument) solrServer.commit() QueryResponse
Re: HTMLStripCharFilterFactory not working in Solr4?
Thanks guys! I'll grab the latest build from the solr4 jenkins server when those commits get picked up and try it out. Thanks for the quick turnaround! Mike On Wed, Jan 25, 2012 at 11:01 AM, Steven A Rowe sar...@syr.edu wrote: Hi Mike, Yonik committed a fix to Solr trunk - your test on LUCENE-3721 succeeds for me now. (On Solr trunk, *all* CharFilters have been non-functional since LUCENE-3396 was committed in r1175297 on 25 Sept 2011, until Yonik's fix today in r1235810; Solr 3.x was not affected - CharFilters have been working there all along.) Steve -Original Message- From: Mike Hugo [mailto:m...@piragua.com] Sent: Tuesday, January 24, 2012 3:56 PM To: solr-user@lucene.apache.org Subject: Re: HTMLStripCharFilterFactory not working in Solr4? Thanks for the responses everyone. Steve, the test method you provided also works for me. However, when I try a more end to end test with the HTMLStripCharFilterFactory configured for a field I am still having the same problem. I attached a failing unit test and configuration to the following issue in JIRA: https://issues.apache.org/jira/browse/LUCENE-3721 I appreciate all the prompt responses! Looking forward to finding the root cause of this guy :) If there's something I'm doing incorrectly in the configuration, please let me know! Mike On Tue, Jan 24, 2012 at 1:57 PM, Steven A Rowe sar...@syr.edu wrote: Hi Mike, When I add the following test to TestHTMLStripCharFilterFactory.java on Solr trunk, it passes: public void testNumericCharacterEntities() throws Exception { final String text = Bose#174; #8482;; // |Bose® ™| HTMLStripCharFilterFactory htmlStripFactory = new HTMLStripCharFilterFactory(); htmlStripFactory.init(Collections.String,StringemptyMap()); CharStream charStream = htmlStripFactory.create(CharReader.get(new StringReader(text))); StandardTokenizerFactory stdTokFactory = new StandardTokenizerFactory(); stdTokFactory.init(DEFAULT_VERSION_PARAM); Tokenizer stream = stdTokFactory.create(charStream); assertTokenStreamContents(stream, new String[] { Bose }); } What's happening: First, htmlStripFactory converts #174; to ® and #8482; to ™. Then stdTokFactory declines to tokenize ® and ™, because they are belong to the Unicode general category Symbol, Other, and so are not included in any of the output tokens. StandardTokenizer uses the Word Break rules find UAX#29 http://unicode.org/reports/tr29/ to find token boundaries, and then outputs only alphanumeric tokens. See the JFlex grammar for details: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/analysis/common/src/ java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex?view= markup . The behavior you're seeing is not consistent with the above test. Steve -Original Message- From: Mike Hugo [mailto:m...@piragua.com] Sent: Tuesday, January 24, 2012 1:34 PM To: solr-user@lucene.apache.org Subject: HTMLStripCharFilterFactory not working in Solr4? We recently updated to the latest build of Solr4 and everything is working really well so far! There is one case that is not working the same way it was in Solr 3.4 - we strip out certain HTML constructs (like trademark and registered, for example) in a field as defined below - it was working in Solr3.4 with the configuration shown here, but is not working the same way in Solr4. The label field is defined as type=text_general field name=label type=text_general indexed=true stored=false required=false multiValued=true/ Here's the type definition for text_general field: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType In Solr 3.4, that configuration was completely stripping html constructs out of the indexed field which is exactly what we wanted. If for example, we then do a facet on the label field, like in the test below, we're getting some terms
Using multiple DirectSolrSpellcheckers for a query
Hi, We are trying to use the DirectSolrSpellChecker to get corrections for mis-spelled query terms directly from fields in the Solr index. However, we need to use multiple fields for spellchecking a query. It looks looks like you can only use one spellchecker for a request and so the workaround for this it to create a copy field from the fields required for spell correction? We'd like to avoid this because we allow users to perform different kinds of queries on different sets of fields and so to provide meaningful corrections we'd have to create multiple copy fields - one for each query type. Is there any reason why Solr doesn't support using multiple spellcheckers for a query? Is it because of performance overhead? Thanks, Nalini
Re: Using SOLR Autocomplete for addresses (i.e. multiple terms)
Hi, I don't think that the suggester can output multiple fields. You would have to encode your data in a special way with separators. Using the separate Solr core approach, you may return whatever fields you choose to the suggest Ajax component. I've written up a blog post and uploaded an example to GitHub. See http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 3. jan. 2012, at 20:41, Dave wrote: I've got another question for anyone that might have some insight - how do you get all of your indexed information along with the suggestions? i.e. if each suggestion has an ID# associated with it, do I have to then query for that ID#, or is there some way or specifying a field list in the URL to the suggester? Thanks! Dave On Tue, Jan 3, 2012 at 9:41 AM, Dave dla...@gmail.com wrote: Hi Jan, Yes, I just saw the answer. I've implemented that, and it's working as expected. I do have Suggest running on its own core, separate from my standard search handler. I think, however, that the custom QueryConverter that was linked to is now too restrictive. For example, it works perfectly when someone enters brooklyn, n, but if they start by entering ny or new york it doesn't return anything. I think what you're talking about, suggesting from whole input and individual tokens is the way to go. Is there anything you can point me to as a starting point? I think I've got the basic setup, but I'm not quite comfortable enough with SOLR and the SOLR architecture yet (honestly I've only been using it for about 2 weeks now). Thanks for the help! Dave On Tue, Jan 3, 2012 at 8:24 AM, Jan Høydahl jan@cominvent.com wrote: Hi, As you see, you've got an answer at StackOverflow already with a proposed solution to implement your own QueryConverter. Another way is to create a Solr core solely for Suggest, and tune it exactly the way you like. Then you can have it suggest from the whole input as well as individual tokens and weigh these as you choose, as well as implement phonetic normalization and other useful tricks. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 3. jan. 2012, at 00:52, Dave wrote: Hi, I'm reposting my StackOverflow question to this thread as I'm not getting much of a response there. Thank you for any assistance you can provide! http://stackoverflow.com/questions/8705600/using-solr-autocomplete-for-addresses I'm new to SOLR, but I've got it up and running, indexing data via the DIH, and properly returning results for queries. I'm trying to setup another core to run suggester, in order to autocomplete geographical locations. We have a web application that needs to take a city, state / region, country input. We'd like to do this in a single entry box. Here are some examples: Brooklyn, New York, United States of America Philadelphia, Pennsylvania, United States of America Barcelona, Catalunya, Spain Assume for now that every location around the world can be split into this 3-form input. I've setup my DIH to create a TemplateTransformer field that combines the 4 tables (city, state and country are all independent tables connected to each other by a master places table) into a field called fullplacename: field column=fullplacename template=${city_join.plainname}, ${region_join.plainname}, ${country_join.plainname}/ I've defined a text_auto field in schema.xml: fieldType class=solr.TextField name=text_auto analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType and have defined these two fields as well: field name=name_autocomplete type=text_auto indexed=true stored=true multiValued=true / copyField source=fullplacename dest=name_autocomplete / Now, here's my problem. This works fine for the first term, i.e. if I type brooklyn I get the results I'd expect, using this URL to query: http://localhost:8983/solr/places/suggest?q=brooklyn However, as soon as I put a comma and/or a space in there, it breaks them up into 2 suggestions, and I get a suggestion for each: http://localhost:8983/solr/places/suggest?q=brooklyn%2C%20ny Gives me a suggestion for brooklyn and a suggestion for ny instead of a suggestion that matches brooklyn, ny. I've tried every solution I can find via google and haven't had any luck. Is there something simple that I've missed, or is this the wrong approach? Just in case, here's the searchComponent and requestHandler definition: requestHandler name=/suggest class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.count10/str /lst arr
Re: Cluster Resizing question
Jamie, depending on how quickly you need this, it may be better to follow SolrCloud development because cluster resizing will work differently there. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Jamie Johnson jej2...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, January 25, 2012 12:03 PM Subject: Cluster Resizing question Is this the JIRA that I should be tracking for resizing a cluster? https://issues.apache.org/jira/browse/SOLR-2593 If not can someone point me to the appropriate location. Also is there a rough timeline for when this will be available?
Re: Indexing Using XML Message
So you can't even communicate with the remote Solr process by HTTP? Because if you can, SolrJ would work. Otherwise, you're stuck with creating a bunch of Solr-style XML documents, they have a simple format. See the example/exampleDocs directory in the standard distribution. You'll have to parse the separate document types and put your required data into the Solr XML format... But I really don't understand why you need to. A Solr installation that you can't get to via http is pretty useless, although I suppose there can be security setups that preclude this. Assuming you can get there via http, consider a SolrJ program combined with Tika to parse the docs you have in all these formats and send them to Solr via SolrJ... Best Erick On Wed, Jan 25, 2012 at 7:41 AM, Tod listac...@gmail.com wrote: I have a local data store containing a host of different document types. This data store is separate from a remote Solr install making streaming not an option. Instead I'd like to generate an XML file that contains all of the documents including content and metadata. What would be the most appropriate way to accomplish this? I could use the Tika CLI to generate XML but I'm not sure it would work or that its the most efficient way to handle things. Can anyone offer some suggestions? Thanks - Tod
Re: What is the most basic schema.xml you can have for indexing a simple database?
Hi Param Yes, refactoring the various example schema.xml's is what i have been doing up to now. The end results is usually quite verbose with a lot of redundancy. What is the most compact possible schema.xml? Thanks for the link! F On 25. jan. 2012, at 17:31, Sethi, Parampreet parampreet.se...@teamaol.com wrote: Hi Fergus, The schema.xml has declaration of fields as well as analyzers/tokenizers which are required as per the application demand. The easiest way is to modify the schema.xml file which is delivered with apache_solr/example/solr/conf. In case you are looking for setting up Solr in front of database with minimal manipulation of DB data, you can check it here http://www.params.me/2011/03/configure-apache-solr-14-with-mysql.html. I am using this setup in of my applications in production. -param On 1/25/12 11:10 AM, Fergus McDowall fergusmcdow...@gmail.com wrote: Is it do-able/sensible to build a schema.xml from the ground up? Say that you are feeding the results of a database query into solr containing the fields id(int), title(varchar), description(varchar), pub_date(date) and tags(varchar) What would be the simplest schema.xml that could support this structure in Solr? Fergus
Re: Cluster Resizing question
Thanks Otis. I have been following the SolrCloud development, but I was wondering specifically about elastically expanding the cloud by adding shards. I'm following the distributed indexing JIRA, but I'm having difficulty finding a JIRA which specifically references the issues with elasticity. Are you aware of one? On Wed, Jan 25, 2012 at 1:10 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Jamie, depending on how quickly you need this, it may be better to follow SolrCloud development because cluster resizing will work differently there. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Jamie Johnson jej2...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, January 25, 2012 12:03 PM Subject: Cluster Resizing question Is this the JIRA that I should be tracking for resizing a cluster? https://issues.apache.org/jira/browse/SOLR-2593 If not can someone point me to the appropriate location. Also is there a rough timeline for when this will be available?
Re: Indexing failover and replication
Thanks for the Reply Erick I will make the replication to both master manually. Thanks 2012/1/25, Erick Erickson erickerick...@gmail.com: No, there no good ways to have a single slave know about two masters and just use the right one. It sounds like you've got each machine being both a master and a slave? This is not supported. What you probably want to do is either set up a repeater or just index to the two masters and manually change the back to the primary if the primary goes down, having all replication happen from the master. Best Erick On Tue, Jan 24, 2012 at 11:36 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi I'm doing now a test with replication using solr 1.4.1. I configured two servers (server1 and server 2) as master/slave to sincronized both. I put apache on the front side, and we index sometime in server1 and sometime in server2. I realized that the both index servers are now confused. In solr data folder, was created many index folders with the timestamp of syncronization (Exemple: index.20120124041340) with some segments inside. I thought that was possible to index in two master server and than synchronized both using replication. It's really possible do this with replication mechanism? If is possible, what I have done wrong? I need to have more than one node for indexing to guarantee failover feature for indexing. MultiMaster is the best way to guarantee failover feature for indexing? Thanks
Re: What is the most basic schema.xml you can have for indexing a simple database?
Fergus: I have to ask what's driving the push for compactness? General tidiness (of which I actually approve) or something else? What is the redundancy you're seeing? Just the fact that some fieldTypes will contain *almost* the same set of analyzers? Posting your schema and asking can we make this smaller would make this a much easier question to answer, especially if you added some indications of what parts you were dissatisfied with. Best Erick On Wed, Jan 25, 2012 at 10:21 AM, Fergus McDowall fergusmcdow...@gmail.com wrote: Hi Param Yes, refactoring the various example schema.xml's is what i have been doing up to now. The end results is usually quite verbose with a lot of redundancy. What is the most compact possible schema.xml? Thanks for the link! F On 25. jan. 2012, at 17:31, Sethi, Parampreet parampreet.se...@teamaol.com wrote: Hi Fergus, The schema.xml has declaration of fields as well as analyzers/tokenizers which are required as per the application demand. The easiest way is to modify the schema.xml file which is delivered with apache_solr/example/solr/conf. In case you are looking for setting up Solr in front of database with minimal manipulation of DB data, you can check it here http://www.params.me/2011/03/configure-apache-solr-14-with-mysql.html. I am using this setup in of my applications in production. -param On 1/25/12 11:10 AM, Fergus McDowall fergusmcdow...@gmail.com wrote: Is it do-able/sensible to build a schema.xml from the ground up? Say that you are feeding the results of a database query into solr containing the fields id(int), title(varchar), description(varchar), pub_date(date) and tags(varchar) What would be the simplest schema.xml that could support this structure in Solr? Fergus
Re: full import is not working and still not showing any errors
Erick, Thanks for your input, but I've solved the problem which was caused by the jdbc driver. This is my first time using solr, and i doing some search over internet just to get familiar with it, and see how flexible it is. Do you know whether i can specify complex Search, Filtration and Ranking rules in solr? Regards, -- View this message in context: http://lucene.472066.n3.nabble.com/full-import-is-not-working-and-still-not-showing-any-errors-tp3684751p3689042.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JSON response truncated
Two things: 1 I suspect it's your servelet container rather than Solr since your JSON isn't well formatted. I have no clue where to set that up, but that's where I'd look. 2 A side note. You may run into the default of 10,000 tokens that are indexed, see the maxFieldLength in solrconfig.xml. This is NOT what you're current problem is since if you exceed this limit you should still get well-formatted XML But if you're sending large documents back and forth you might see turncated *fields* Best Erick On Wed, Jan 25, 2012 at 1:18 PM, Sean Adams-Hiett s...@webgeeksforhire.com wrote: Summary of Issue: When specifying output as JSON, I get a truncated response. Details: The JSON output I get is truncated, causing errors for any parser that requires well-formed JSON. I have tried spot checking at a dozen different records by adjusting the start= attribute. I am using Solr 3.5 running as a Tomcat webapp on a portable hard drive. When getting the response as XML, it appears to work fine. I have provided some examples of the query I am using, as well as JSON and XML responses below. I am definitely new to working directly with Solr, although I have used it via Drupal for years and I have a pretty solid understanding of how it works at a high level. My best guess is that there is some setting that I am not aware of in schema.xml or solrconfig.xml that is causing this outcome. Any help in figuring this out would be greatly appreciated. Example query: http://localhost:8080/solr/rolfe/select?indent=onversion=2.2q=*%3A*fq=start=0rows=10fl=*%2Cscoreqt=wt=jsonexplainOther=hl.fl= Example JSON response: { responseHeader:{ status:0, QTime:1}, response:{numFound:43678,start:0,maxScore:1.0,docs:[ { contents:idlotoer patriota department of the rolfe arrowlocal itemsrobi anderson of denver, colo., ur-rived in plover the latter part of tho week for a short visit with relatives and iri'-.'idst ; owen returned monday from hampton, iowa. mr owen reports that .mrs owen w.is successfully operated upon in the hospital there, ;md is now w.'ll on the road to recovery.real estate loans-wo are quoting low rotes on real estate ioudb. if you are expecting to make u loan on your farm this year, it will pay you to see us.wo solicit your banking uusl--ness on the basis of prompt, efficient service to you.peoples saving banki5st. 1883 the community dank:~:~:~m~:~:~x\xkk-:-m~:~®x\¯ 11the variety of our win-ned goods should appeal lo œ i }ou. especially at this sea- ||[ son ('aimed vegetables, canned fruits, meals, soups œ and so on. make your housework lighter during œ this season by being a con- œ slant \\isilor lo our canned goods department| saturday specials19c 19c 19c 19c 19c 19ci t11 i5:i!vlib. of pollock dakingpowder for ... 1 cnn white seal lllnc-kitaspbenlea for \\i i.h oysters forsaturday special þ' cans hominy forsaturday special i pkgs. com starch.i willi spoon) i large can plm applesaturday spi-riultwo kxtk\\ specials white (jrtipcse\\tra special pem'lii'k-kxtra .special15c 15cfred ehler's| the right place to tradem. i helvlg spent last sunday in hampton with his daughter, mikk ivis, who is in a hospital fn that city, recovering from the effects of ® recent operation. martin reports that she is getting rlong nicely and will probably be home some, time the latter part of the weekmr and mrs. f j sarhv were holfi visitors last saturday and sundaythe m y ¯. club met with mrs l. n. moody last thursday afternoonmr and mrs. chas. england, mid mrs. england's father. mr brle.kson. of albert city, mr. and mrs. enoch erlckson and children of marathon, and mr. and mrs. a b. cobbs of rolfe spent sunday at the a. w. hess home in tjii8 city.jack (j ton on was recently thrown from a horse and suffered a broken arm.if you want to buy a ®ood corn planter, buy a. \cbbc\ and pet the rest. see j. w. mangun, plover, iowa.p. j. nacke has purchased a buick touring car, and is now busy learning to operate the same.miss freda gcmbler, daughter of pred oembler, was taken suddenly ill while in school one day last week. a physician was called and after examination pronounced it a case of scarlet fevermr sherlock of bmmotsburg was a business visitor here tuesday,1 h pollock has been making an improvement on bis farm residence by tho addition of a largo porch.ii .1. watts of des molncs spent sunday at the home of bis brother, chas. 15. watts.p 11. henderson has sold the building now occupied by the harness shop to geo. jcffriub.a. j eggspuehler has rctirfd from the management of the drury store mid is upending his time on his farm south of townif von want a gang plow that has no side draft, buy an \oliver\ of j w \\!\\vr:!t\\, plover. iowa.m-v. iv. c. a. n'otksthe y w c a of plover lias been þliihe active of late, and the meetings held recently have enjoyed an excellent attendance. much interest has been shown in the work. it is
WARNING: Unable to read: dataimport.properties DHI issue
I have tried to search for my specific problem but have not found solution. I have also read the wiki on the DHI and seem to have everything set up right but my Query still fails. Thank you for your help I am running Solr 3.1 with Tomcat 6.0 Windows server 2003 r2 and SQL 2008 I have the sqljdbc4.jar sitting in C:\Program Files\Apache Software Foundation\Tomcat 6.0\lib /My solrconfig.xml/ - requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler - lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler /My db-data-config.xml/ - dataConfig dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=://localhost:1433;DatabaseName=KnowledgeBase_DM user=user password=password / - document - entity dataSource=ds1 name=Titles query=SELECT mrID, mrTitle from KnowledgeBase_DM.dbo.AskMe_Data field column=mrID name=id / field column=mrTitle name=title / - entity name=Desc query=select meDescription from KnowledgeBase_DM.dbo.AskMe_Data field column=meDescription name=description / /entity /entity /document /dataConfig /My logfile Output / Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImportHandler processConfiguration INFO: Processing configuration from solrconfig.xml: {config=db-data-config.xml} Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties *WARNING: Unable to read: dataimport.properties* Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity Titles with URL: ://localhost:1433;DatabaseName=KnowledgeBase_DM Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 0 Jan 25, 2012 2:17:37 PM org.apache.solr.common.SolrException log *SEVERE: Exception while processing: Titles document : SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT mrID, mrTitle from KnowledgeBase_DM.dbo.AskMe_Data Processing Document # 1* at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:188) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:205) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at
Advice - evaluating Solr for categorization keyword search
Hi all, I've been tasked with evaluating whether Solr is the right solution for my company's search needs. If this isn't the right forum for this kind of question, please let me know where to go instead! We are currently using sql queries to find mysql db results that match a single keyword in one short text field, so our search is pretty crude. What we hope that Solr can do initially is: 1 enable more flexible search (booleans, more than one field searched/matched, etc) 2 live search results (eg new records get added to the index upon creation) 3 search rankings (eg most relevant - least relevant) 4 categorize our db (take records and at least group them, better if it could assign a label to each record) 5 locate nearby results (geospatial search) What I hope you can advise on is: A How would you go about #2 - making sure that new documents are added/indexed asap, based on a new rows to the db? Is that as simple as a setting in Solr, or does it take some coding (eg a listener object, a kron job, etc.). I tried looking at the wiki tutorial but wasn't able to find answers - I couldn't make sense of how to use UpdateRequestProcessor to do it. (http://wiki.apache.org/solr/UpdateRequestProcessor) B What's the status of document clustering? The wiki says it's not been fully implemented. Would we be able to achieve any of #4 yet? If not, what else should we consider? C Would you use Solr over say Google Maps api to run location aware searches? D How long should we expect it to take to configure Solr on our servers with our db, get the initial index set up, and enable live search results? Are we talking one week, or one month? Our db is not tiny, but it's not huge - say around 8k records in each of ~20 tables. Most tables have around 10 fields, including at least one large text field and then a variety of dates, numbers, and small text. I really appreciate any advice you can offer! Cheers, Becky http://www.coffeeandpower.com
Re: phrase auto-complete with suggester component
O. Klein wrote I agree. Suggester could use some attention. Looking at Wiki there were some features planned, but not much has happened lately. Or check out this post http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ looking very promising as an alternative. -- View this message in context: http://lucene.472066.n3.nabble.com/phrase-auto-complete-with-suggester-component-tp3685572p3689240.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Multiple document types
Hi Simon, No, not different entity types, but actually different document types (I think). What would be ideal is if we could have multiple document elements in the data-config.xml file and some way of mapping each different document element to a different sets of field in the schema.xml file, and to a different index. Then, when Solr got a search request on one url (say, for example, http://172.24.1.16:8080/gwsolr/cc/doctype1/select/?q=...;), it would search for a document in the first index and when it got a search request on a different url (say, for example, http://172.24.1.16:8080/gwsolr/pc/doctype1/select/?q=...;), it would search for the document in the second index. In like manner, administrative tasks (like dataimport) would also switch off of the url, so that the url would determine which index was to be loaded by the dataimport command. F -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Wednesday, January 25, 2012 2:08 PM To: java-user Subject: Re: Multiple document types hey Frank, can you elaborate what you mean by different doc types? Are you referring to an entity ie. a table per entity to speak in SQL terms? in general you should get better responses for solr related questions on solr-user@lucene.apache.org simon On Wed, Jan 25, 2012 at 10:49 PM, Frank DeRose fder...@guidewire.com wrote: It seems that it is not possible to have multiple document types defined in a single solr schema.xml file. If, in fact, this is not possible, then, what is the recommended app server deployment strategy for supporting multiple documents on solr? Do I need to have one webapp instance per document type? For example, if I am deploying under tomcat, do I need to have a separate webapps each with its own context-path and set of config files (data-config.xml and schema.xml, in particular)? _ Frank DeRose Guidewire Software | Senior Software Engineer Cell: 510 -589-0752 fder...@guidewire.commailto:fder...@guidewire.com | www.guidewire.comhttp://www.guidewire.com/ Deliver insurance your way with flexible core systems from Guidewire. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Query for documents that have ONLY a certain value in a multivalued field
Does anyone know if there's a way using the SOLR query syntax to filter documents that have only a certain value in a multivalued field? As an example if I have some field country that's multivalued and I want q=id:[* TO *]fq=country:brazil where 'brazil' is the only value present. I've run through a few possibilities to do this, but I think it would be more common and a better solution would exist: 1) On index creation time, aggregate my source data and create a count_country field that contains the number of terms in the country field. Then the query would be q=id:[* TO *]fq=country:brazilfq=count_country=1 2) In the search client, use the terms component to retrieve all terms for country and then do the exclusions in the client and construct the query as follows q=id:[* TO *]fq=country:brazilfq=-country:canadafq=-country:us etc. 3) Write a function query or similar that could capture the info. Thanks in advance, Garrett Conaty
Re: phrase auto-complete with suggester component
Thanks for link, that's the approach I'm going to try. On Wed, Jan 25, 2012 at 2:39 PM, O. Klein kl...@octoweb.nl wrote: O. Klein wrote I agree. Suggester could use some attention. Looking at Wiki there were some features planned, but not much has happened lately. Or check out this post http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ looking very promising as an alternative. -- View this message in context: http://lucene.472066.n3.nabble.com/phrase-auto-complete-with-suggester-component-tp3685572p3689240.html Sent from the Solr - User mailing list archive at Nabble.com. -- Tommy Chheng
Re: Cluster Resizing question
I think I need to provide a few more details here. I need the ability to add a shard to the cluster, in doing this I'd like to split an existing index and spin up this new shard with 1/2 (or there abouts) of this and allow the original to continue serving the pieces it has now. In our application we are using the murmurhash3 (taken from http://www.yonik.com/) so that updates/deletes are sent to the appropriate servers (we're using an old snapshot of solrcloud which doesn't support the latest distributed indexing). In my case the hashing is based on the number of shards, which means you add a shard and it breaks. I've read in one of the JIRAs that instead the hashing should be based on some other number (any ideas?) and then should be used to calculate ranges which would in turn be stored in ZK so adding another shard would be a matter of updating the range in ZK, stopping the machine serving the index to be split and splitting said index such that the 2 indexes created would map to the new bins. All of that being said I have none of this implemented and would much prefer this work happen within solr proper since it's already on the roadmap and my code would ultimately be throw away. Thus the reason I'd like to understand what the plans are for this in Solr and possibly start contributing to this development, assuming it meets my timelines. Any thoughts/comments are greatly appreciated. On 1/25/12, Jamie Johnson jej2...@gmail.com wrote: Thanks Otis. I have been following the SolrCloud development, but I was wondering specifically about elastically expanding the cloud by adding shards. I'm following the distributed indexing JIRA, but I'm having difficulty finding a JIRA which specifically references the issues with elasticity. Are you aware of one? On Wed, Jan 25, 2012 at 1:10 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Jamie, depending on how quickly you need this, it may be better to follow SolrCloud development because cluster resizing will work differently there. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Jamie Johnson jej2...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, January 25, 2012 12:03 PM Subject: Cluster Resizing question Is this the JIRA that I should be tracking for resizing a cluster? https://issues.apache.org/jira/browse/SOLR-2593 If not can someone point me to the appropriate location. Also is there a rough timeline for when this will be available?
Re: Multiple document types
On Thu, Jan 26, 2012 at 12:05 AM, Frank DeRose fder...@guidewire.com wrote: Hi Simon, No, not different entity types, but actually different document types (I think). What would be ideal is if we could have multiple document elements in the data-config.xml file and some way of mapping each different document element to a different sets of field in the schema.xml file, and to a different index. Then, when Solr got a search request on one url (say, for example, http://172.24.1.16:8080/gwsolr/cc/doctype1/select/?q=...;), it would search for a document in the first index and when it got a search request on a different url (say, for example, http://172.24.1.16:8080/gwsolr/pc/doctype1/select/?q=...;), it would search for the document in the second index. In like manner, administrative tasks (like dataimport) would also switch off of the url, so that the url would determine which index was to be loaded by the dataimport command. seems like you should look at solr's multicore feature: http://wiki.apache.org/solr/CoreAdmin simon F -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Wednesday, January 25, 2012 2:08 PM To: java-user Subject: Re: Multiple document types hey Frank, can you elaborate what you mean by different doc types? Are you referring to an entity ie. a table per entity to speak in SQL terms? in general you should get better responses for solr related questions on solr-user@lucene.apache.org simon On Wed, Jan 25, 2012 at 10:49 PM, Frank DeRose fder...@guidewire.com wrote: It seems that it is not possible to have multiple document types defined in a single solr schema.xml file. If, in fact, this is not possible, then, what is the recommended app server deployment strategy for supporting multiple documents on solr? Do I need to have one webapp instance per document type? For example, if I am deploying under tomcat, do I need to have a separate webapps each with its own context-path and set of config files (data-config.xml and schema.xml, in particular)? _ Frank DeRose Guidewire Software | Senior Software Engineer Cell: 510 -589-0752 fder...@guidewire.commailto:fder...@guidewire.com | www.guidewire.comhttp://www.guidewire.com/ Deliver insurance your way with flexible core systems from Guidewire. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org