MoreLikeThis to extract relevant terms to the query from the index
Hi All, I am using MoreLikeThis.java in lucene to expand the query with related terms. It works fine and I could't retrieve the relevant documents to the query but I couldn’t know how to extract the related terms to the query for the index. my task is: For example query is bank related terms can be money, credit and so on that appeares frequntly with bank in the index. what I should write in the main even I get the interesting terms to my query? i tried BooleanQuery result = (BooleanQuery) mlt.like(docNum); result.add(query, BooleanClause.Occur.MUST_NOT); System.out.println(result.getClauses().toString()); but it doesnt help any idea
MoreLikeThis to extract relevant terms to the query from the index
Hi All, I am using MoreLikeThis.java in lucene to expand the query with related terms. It works fine and I could retrieve the relevant documents to the query but I couldn’t know how to extract the related terms to the query for the index. my task is: For example query is bank related terms can be money, credit and so on that appeares frequntly with bank in the index. what I should write in the main even I get the interesting terms to my query? i tried BooleanQuery result = (BooleanQuery) mlt.like(docNum); result.add(query, BooleanClause.Occur.MUST_NOT); System.out.println(result.getClauses().toString()); but it doesnt help any idea
Tomcat special character problem
Hi List, I got an issue with my Solr-environment in Tomcat. First: I am not very familiar with Tomcat, so it might be my fault and not Solr's. It can not be a solr-side configuration problem, since everything worked fine with my local Jetty-servlet container. However, when I deploy into Tomcat, several special characters were shown in their utf-8 representation. Example: göteburg will be displayed as str name=qgöteburg/str when it comes to search. I tried the following within my server.xml-file Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 URIEncoding=UTF-8 / And restarted Tomcat afterwards. The problem only occurs when I try to search for something. It is no problem to index that data. Thank you for any help! Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857648.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tomcat special character problem
On Sun, Nov 7, 2010 at 9:11 AM, Em mailformailingli...@yahoo.de wrote: Hi List, I got an issue with my Solr-environment in Tomcat. First: I am not very familiar with Tomcat, so it might be my fault and not Solr's. It can not be a solr-side configuration problem, since everything worked fine with my local Jetty-servlet container. However, when I deploy into Tomcat, several special characters were shown in their utf-8 representation. Example: göteburg will be displayed as str name=qgöteburg/str when it comes to search. I tried the following within my server.xml-file Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 URIEncoding=UTF-8 / And restarted Tomcat afterwards. The problem only occurs when I try to search for something. It is no problem to index that data. Thank you for any help! Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857648.html Sent from the Solr - User mailing list archive at Nabble.com. That is definitely odd. When I tried copying göteburg and doing a manual query in my web browser, everything worked. How are you making the request to SOLR? When I viewed the properties/info of the results, my returned charset was in UTF-8. Can you confirm similar for you? When I grepped for UTF-8 in both my SOLR and Tomcat configs, nothing stood out as a special configuration option.
Re: Tomcat special character problem
Hi Ken, thank you for your quick answer! To make sure that there occurs no mistakes at my application's side, I send my requests with the form that is available at solr/admin/form.jsp I changed almost nothing from the example-configurations within the example-package except some auto-commit params. All the special-characters within the results were displayed correctly, and so far they were also indexed correctly. The only problem is querying with special-characters. I can confirm that the page is encoded in UTF-8 within my browser. Is there a possibility that Tomcat did not use the UTF-8 URIEncoding? Maybe I should say that Tomcat is behind an Apache HttpdServer and is mounted by a jk_mount. Thank you! Ken Stanley wrote: On Sun, Nov 7, 2010 at 9:11 AM, Em mailformailingli...@yahoo.de wrote: Hi List, I got an issue with my Solr-environment in Tomcat. First: I am not very familiar with Tomcat, so it might be my fault and not Solr's. It can not be a solr-side configuration problem, since everything worked fine with my local Jetty-servlet container. However, when I deploy into Tomcat, several special characters were shown in their utf-8 representation. Example: göteburg will be displayed as str name=qgöteburg/str when it comes to search. I tried the following within my server.xml-file Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 URIEncoding=UTF-8 / And restarted Tomcat afterwards. The problem only occurs when I try to search for something. It is no problem to index that data. Thank you for any help! Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857648.html Sent from the Solr - User mailing list archive at Nabble.com. That is definitely odd. When I tried copying göteburg and doing a manual query in my web browser, everything worked. How are you making the request to SOLR? When I viewed the properties/info of the results, my returned charset was in UTF-8. Can you confirm similar for you? When I grepped for UTF-8 in both my SOLR and Tomcat configs, nothing stood out as a special configuration option. -- View this message in context: http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857729.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tomcat special character problem
On Sun, Nov 7, 2010 at 9:34 AM, Em mailformailingli...@yahoo.de wrote: Hi Ken, thank you for your quick answer! To make sure that there occurs no mistakes at my application's side, I send my requests with the form that is available at solr/admin/form.jsp I changed almost nothing from the example-configurations within the example-package except some auto-commit params. All the special-characters within the results were displayed correctly, and so far they were also indexed correctly. The only problem is querying with special-characters. I can confirm that the page is encoded in UTF-8 within my browser. Is there a possibility that Tomcat did not use the UTF-8 URIEncoding? Maybe I should say that Tomcat is behind an Apache HttpdServer and is mounted by a jk_mount. Thank you! I am not familiar with using your type of set up, but a quick Google search suggested using a second connector on a different port. If you're using mod_jk, you can try setting JkOptions +ForwardURICompatUnparsed to see if that helps. ( http://markstechstuff.blogspot.com/2008/02/utf-8-problem-between-apache-and-tomcat.html). Sorry I couldn't have been more help. :) - Ken
Re: Tomcat special character problem
This helped a lot, since it solved the göteburg-problem. Thank you, Ken! Great help :-). Unfortunately there are some other encoding problems fq=testcat%3Aacôme worked, however the full url-encoded version fq=testcat%3Aac%F4me does not. The first version is the result of submitting the form.jsp, the second is the version when you click into the adress-bar and press enter. This is a real problem for me, since applications that send a query send an urlencoded query like the second one. Any suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857963.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr 4.0 - pagination
Dear Yonik, this is fantastic, but can you tell any time it will be ready ? I would need this feature in two weeks. Is it possible to finish and make an update in this time or should I look for another solution cocerning the pgaination (like implement just more results link instead of pagination) ? best regards, Rich -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Saturday, October 30, 2010 19:29 To: solr-user@lucene.apache.org Subject: Re: solr 4.0 - pagination On Sat, Oct 30, 2010 at 12:22 PM, Papp Richard ccode...@gmail.com wrote: I'm using Solr 4.0 with grouping (field collapsing), but unfortunately I can't solve the pagination. It's not implemented yet, but I'm working on that right now. -Yonik http://www.lucidimagination.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5576 (20101029) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5598 (20101107) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Re: solr 4.0 - pagination
On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard ccode...@gmail.com wrote: this is fantastic, but can you tell any time it will be ready ? It already is ;-) Grab the latest trunk or the latest nightly build. -Yonik http://www.lucidimagination.com
RE: solr 4.0 - pagination
thank you very much Yonik! you are a magician! regards, Rich -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Sunday, November 07, 2010 18:04 To: Papp Richard Cc: solr-user@lucene.apache.org Subject: Re: solr 4.0 - pagination On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard ccode...@gmail.com wrote: this is fantastic, but can you tell any time it will be ready ? It already is ;-) Grab the latest trunk or the latest nightly build. -Yonik http://www.lucidimagination.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5598 (20101107) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5598 (20101107) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
RE: Corename after Swap in MultiCore
Do you mean solr.core.name has the wrong value after the swap? You swapped doc-temp so now it's doc and solr.core.name is still doc-temp? This completely contradicts my experience, what version of solr are you using? Why use postCommit? You're running the risk of performing a swap when you don't mean to. Are you using DIH? If so, I'd go with querying the status of the import until it's done and then performing the swap. Ephraim Ofir -Original Message- From: sivaram [mailto:yogendra.bopp...@gmail.com] Sent: Wednesday, November 03, 2010 4:46 PM To: solr-user@lucene.apache.org Subject: Corename after Swap in MultiCore Hi everyone, Long question but please hold on. I'm using a multicore Solr instance to index different documents from different sources( around 4) and I'm using a common config for all the cores. So, for each source I have core and temp core like 'doc' and 'doc-temp'. So, everytime I want to get new data, I do dataimport to the temp core and then swap the cores. For swaping I'm using the postCommit event listener to make sure the swap is done after the completing commit. After the first swap when I use solr.core.name on the doc-temp it is returning doc as its name ( because the commit is done on the doc's data dir after the first swap ). How do I get the core name of the doc-temp here in order to swap again with .swap ? I'm stuck here. Please help me. Also if anyone know for sure if a dataimport is being done on a core then the next swap query will be executed only after this dataimport is finished? Thanks in advance. Ram. -- View this message in context: http://lucene.472066.n3.nabble.com/Corename-after-Swap-in-MultiCore-tp18 35325p1835325.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Removing irrelevant URLS
You can always do a delete-by-query, but that pre-supposes you can form a query that would remove only those documents with URLs you want removed... Assuming you do this, an optimize would then physically remove the documents from your index (delete by query just marks the docs as deleted). Solr has nothing specifically for URLs, it's an engine rather than a web crawling app Best Erick On Fri, Nov 5, 2010 at 4:33 PM, Eric Martin e...@makethembite.com wrote: Hi, I have 100k URL's in my index. I specifically crawled sits relating to law. However, during my intitial crawls I didn't specify urlfilters so I am stuck with extrinsic and often irrelevant URL's like twitter, etc. Is there some way in Solr that I can run periodic URL cleanings to remove URL's and search string results? Or, should I just dump my index and rebuild using the filter? I have looked on the Solr wiki and came across some candidates that look like it is what I am trying to accomplish but am not sure. If anyone knows where I should be looking I would appreciate it. Eric
RE: Removing irrelevant URLS
OK, thanks. I am using nutch and figuring out how to use urlfilters, unsuccessfully. Just thought there might be a way I could save some trouble this way. Thanks! -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, November 07, 2010 8:46 AM To: solr-user@lucene.apache.org Subject: Re: Removing irrelevant URLS You can always do a delete-by-query, but that pre-supposes you can form a query that would remove only those documents with URLs you want removed... Assuming you do this, an optimize would then physically remove the documents from your index (delete by query just marks the docs as deleted). Solr has nothing specifically for URLs, it's an engine rather than a web crawling app Best Erick On Fri, Nov 5, 2010 at 4:33 PM, Eric Martin e...@makethembite.com wrote: Hi, I have 100k URL's in my index. I specifically crawled sits relating to law. However, during my intitial crawls I didn't specify urlfilters so I am stuck with extrinsic and often irrelevant URL's like twitter, etc. Is there some way in Solr that I can run periodic URL cleanings to remove URL's and search string results? Or, should I just dump my index and rebuild using the filter? I have looked on the Solr wiki and came across some candidates that look like it is what I am trying to accomplish but am not sure. If anyone knows where I should be looking I would appreciate it. Eric
Adding Carrot2
Hi, Solr and nutch have been working fine. I now want to integrate Carrot2. I followed this tutorial/quickstart: http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabil ities/ I didn't see anything to adjust in my schema so I didn't do anything there. I did add the code to the solrconfig.xml though. I am getting this when I start Solr now: Command: java -Dsolr.clustering.enabled=true -jar start.jar Nov 7, 2010 11:35:16 AM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: [solrconfig.xml] requestHandler: missing mandatory attribute 'class' Anyone run into issues with Carrot2? Eric
RE: solr 4.0 - pagination
Hi Yonik, I've just tried the latest stable version from nightly build: apache-solr-4.0-2010-11-05_08-06-28.war I have some concerns however: I have 3 documents; 2 in the first group, 1 in the 2nd group. 1. I got for matches 3 - which is good, but I still don't know how many groups I have. (using start = 0, rows = 10) 2. as far as I see the start / rows is working now, but the matches is returned incorrectly = it said matches = 3 instead of = 1, when I used start = 1, rows = 1 so can you help me, how to compute how many pages I'll have, because the matches can't use for this. regards, Rich -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Sunday, November 07, 2010 18:04 To: Papp Richard Cc: solr-user@lucene.apache.org Subject: Re: solr 4.0 - pagination On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard ccode...@gmail.com wrote: this is fantastic, but can you tell any time it will be ready ? It already is ;-) Grab the latest trunk or the latest nightly build. -Yonik http://www.lucidimagination.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5598 (20101107) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5599 (20101107) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Re: solr 4.0 - pagination
On Sun, Nov 7, 2010 at 2:45 PM, Papp Richard ccode...@gmail.com wrote: Hi Yonik, I've just tried the latest stable version from nightly build: apache-solr-4.0-2010-11-05_08-06-28.war I have some concerns however: I have 3 documents; 2 in the first group, 1 in the 2nd group. 1. I got for matches 3 - which is good, but I still don't know how many groups I have. (using start = 0, rows = 10) 2. as far as I see the start / rows is working now, but the matches is returned incorrectly = it said matches = 3 instead of = 1, when I used start = 1, rows = 1 matches is the number of documents before grouping, so start/rows or group.offset/group.limit will not affect this number. so can you help me, how to compute how many pages I'll have, because the matches can't use for this. Solr doesn't even know given the current algorithm, hence it can't return that info. The issue is that to calculate the total number of groups, we would need to keep each group in memory (which could cause a big blowup if there are tons of groups). The current algorithm only keeps the top 10 groups (assuming rows=10) in memory at any one time, hence it has no idea what the total number of groups is. -Yonik http://www.lucidimagination.com
RE: solr 4.0 - pagination
Hey Yonik, Sorry, I think the matches is ok - because it probably returns always the total document number - however I don't know how to compute the number of pages. thanks, Rich -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Sunday, November 07, 2010 18:04 To: Papp Richard Cc: solr-user@lucene.apache.org Subject: Re: solr 4.0 - pagination On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard ccode...@gmail.com wrote: this is fantastic, but can you tell any time it will be ready ? It already is ;-) Grab the latest trunk or the latest nightly build. -Yonik http://www.lucidimagination.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5598 (20101107) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5599 (20101107) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
RE: solr 4.0 - pagination
I see. Let's assume that there are 1000 groups. Can I use safely (with no negative impact on memory usage or slowness) the start = 990, rows = 10 to get the latest page? Or this will not work, due you will need to compute all the groups till 1000, in order to return the last 10, and because of this the whole will be slow / memory usage will increase considerably. regards, Rich -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Sunday, November 07, 2010 21:54 To: Papp Richard Cc: solr-user@lucene.apache.org Subject: Re: solr 4.0 - pagination On Sun, Nov 7, 2010 at 2:45 PM, Papp Richard ccode...@gmail.com wrote: Hi Yonik, I've just tried the latest stable version from nightly build: apache-solr-4.0-2010-11-05_08-06-28.war I have some concerns however: I have 3 documents; 2 in the first group, 1 in the 2nd group. 1. I got for matches 3 - which is good, but I still don't know how many groups I have. (using start = 0, rows = 10) 2. as far as I see the start / rows is working now, but the matches is returned incorrectly = it said matches = 3 instead of = 1, when I used start = 1, rows = 1 matches is the number of documents before grouping, so start/rows or group.offset/group.limit will not affect this number. so can you help me, how to compute how many pages I'll have, because the matches can't use for this. Solr doesn't even know given the current algorithm, hence it can't return that info. The issue is that to calculate the total number of groups, we would need to keep each group in memory (which could cause a big blowup if there are tons of groups). The current algorithm only keeps the top 10 groups (assuming rows=10) in memory at any one time, hence it has no idea what the total number of groups is. -Yonik http://www.lucidimagination.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5599 (20101107) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5599 (20101107) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Re: Tomcat special character problem
Is it possible that your original search is being posted (HTTP POST), and the character encoding of the page with the form is not UTF-8? In that case, I believe a header gets sent with the request specifying a different character set (different from parameters in the URL, for which it's not possible to specify an encoding explicitly). -Mike On 11/7/2010 10:26 AM, Em wrote: This helped a lot, since it solved the göteburg-problem. Thank you, Ken! Great help :-). Unfortunately there are some other encoding problems fq=testcat%3Aacôme worked, however the full url-encoded version fq=testcat%3Aac%F4me does not. The first version is the result of submitting the form.jsp, the second is the version when you click into the adress-bar and press enter. This is a real problem for me, since applications that send a query send an urlencoded query like the second one. Any suggestions?
Re: Adding Carrot2
Carrot is already part of the Solr distributions. 1.4.1 and 3.x and the trunk. On 11/7/10, Eric Martin e...@makethembite.com wrote: Hi, Solr and nutch have been working fine. I now want to integrate Carrot2. I followed this tutorial/quickstart: http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabil ities/ I didn't see anything to adjust in my schema so I didn't do anything there. I did add the code to the solrconfig.xml though. I am getting this when I start Solr now: Command: java -Dsolr.clustering.enabled=true -jar start.jar Nov 7, 2010 11:35:16 AM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: [solrconfig.xml] requestHandler: missing mandatory attribute 'class' Anyone run into issues with Carrot2? Eric -- Lance Norskog goks...@gmail.com
RE: Adding Carrot2
Yeah I know, you have to download the libraries and copy them to your /lib inside of Solr. In Solr 1.4 the plugin is available but the libraries are not. http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabilities/ I think there is something wrong with the schema and solrconfig (xml's) integration. Some documentation on Apache says it's already written into the xml and some says its not. Searching the xml's in Solr I find no reference to clustering. Now that I think about it, I copied over the solrconfig.xml and schema.xml with my Drupal/ApacheSolr xml's. I think I may have answered my own question as to why the clustering isn't running correctly. I will go get a copy of the default xml's and if I find it there, I will try and merge them. Does this sound I am on the right path now? -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Sunday, November 07, 2010 12:41 PM To: solr-user@lucene.apache.org Subject: Re: Adding Carrot2 Carrot is already part of the Solr distributions. 1.4.1 and 3.x and the trunk. On 11/7/10, Eric Martin e...@makethembite.com wrote: Hi, Solr and nutch have been working fine. I now want to integrate Carrot2. I followed this tutorial/quickstart: http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabil ities/ I didn't see anything to adjust in my schema so I didn't do anything there. I did add the code to the solrconfig.xml though. I am getting this when I start Solr now: Command: java -Dsolr.clustering.enabled=true -jar start.jar Nov 7, 2010 11:35:16 AM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: [solrconfig.xml] requestHandler: missing mandatory attribute 'class' Anyone run into issues with Carrot2? Eric -- Lance Norskog goks...@gmail.com
Re: Tomcat special character problem
I also thought that this might be the case a few hours ago. However, I have to verify that tomorrow. From a debugging point of view: How can I set the encoding of my browser's adress-bar? When I pressed enter the encoding switched from clear-text to an urlencoded version. The urlencoded version did not work. Thank you Mike. I will give you a feedback whether it worked or not! -- View this message in context: http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1859259.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tomcat special character problem
In a post document, or a get document with URL encoded variables in the BODY of the document, it's possible to specify/use different encodings that are actually specified in the headers. For SURE in post, and I'm pretty sure in GET also. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Michael Sokolov soko...@ifactory.com To: solr-user@lucene.apache.org Cc: Em mailformailingli...@yahoo.de Sent: Sun, November 7, 2010 12:40:45 PM Subject: Re: Tomcat special character problem Is it possible that your original search is being posted (HTTP POST), and the character encoding of the page with the form is not UTF-8? In that case, I believe a header gets sent with the request specifying a different character set (different from parameters in the URL, for which it's not possible to specify an encoding explicitly). -Mike On 11/7/2010 10:26 AM, Em wrote: This helped a lot, since it solved the göteburg-problem. Thank you, Ken! Great help :-). Unfortunately there are some other encoding problems fq=testcat%3Aacôme worked, however the full url-encoded version fq=testcat%3Aac%F4me does not. The first version is the result of submitting the form.jsp, the second is the version when you click into the adress-bar and press enter. This is a real problem for me, since applications that send a query send an urlencoded query like the second one. Any suggestions?
Re: Adding Carrot2
There are three xml sets. The solr/example set, the drupal solr, AND the set in contrib/clustering/src/test/resources/solr/conf/. These are what clustering is actually tested with. So, the first order of business is to check if clustering works with example/solr/conf. The diffs looked like the clustering files were just old versions of example/solr. But they might need a little merging. Lance On Sun, Nov 7, 2010 at 12:47 PM, Eric Martin e...@makethembite.com wrote: Yeah I know, you have to download the libraries and copy them to your /lib inside of Solr. In Solr 1.4 the plugin is available but the libraries are not. http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabilities/ I think there is something wrong with the schema and solrconfig (xml's) integration. Some documentation on Apache says it's already written into the xml and some says its not. Searching the xml's in Solr I find no reference to clustering. Now that I think about it, I copied over the solrconfig.xml and schema.xml with my Drupal/ApacheSolr xml's. I think I may have answered my own question as to why the clustering isn't running correctly. I will go get a copy of the default xml's and if I find it there, I will try and merge them. Does this sound I am on the right path now? -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Sunday, November 07, 2010 12:41 PM To: solr-user@lucene.apache.org Subject: Re: Adding Carrot2 Carrot is already part of the Solr distributions. 1.4.1 and 3.x and the trunk. On 11/7/10, Eric Martin e...@makethembite.com wrote: Hi, Solr and nutch have been working fine. I now want to integrate Carrot2. I followed this tutorial/quickstart: http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabil ities/ I didn't see anything to adjust in my schema so I didn't do anything there. I did add the code to the solrconfig.xml though. I am getting this when I start Solr now: Command: java -Dsolr.clustering.enabled=true -jar start.jar Nov 7, 2010 11:35:16 AM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: [solrconfig.xml] requestHandler: missing mandatory attribute 'class' Anyone run into issues with Carrot2? Eric -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com
facetting when using field collapsing
Hi, I am pondering making use of field collapsing. I am currently indexing clauses (sections) inside UN documents: http://resolutionfinder.org/search/unifiedResults?q=africa=t[22]=medicationdc=st=clause Now since right now my data set is still fairly small I am doing field collapsing in userland: http://resolutionfinder.org/search/unifiedResults?q=africa=t[22]=medicationdc=st=document However while this works alright (not ideal, since I am fetching essentially the entire result set and not paged as for clauses) etc, I still have no idea how to get the facet filters to display the right counts. So I am wondering if field collapsing in its current form supports faceting, since its not mentioned on the wiki page: http://wiki.apache.org/solr/FieldCollapsing regards, Lukas Kahwe Smith m...@pooteeweet.org