MoreLikeThis to extract relevant terms to the query from the index

2010-11-07 Thread farag ahmed
Hi All,
 
I am using MoreLikeThis.java in lucene to expand the query with related terms. 
It works fine and I could't retrieve the relevant documents to the query but I 
couldn’t know how to extract the related terms to the query for the index. 
 
my task is:
 
 For example query is bank related terms can be money, credit and so on 
that appeares frequntly with bank in the index.
 what I should write in the main even I get the interesting terms to my query?
 
i tried 
 
BooleanQuery result = (BooleanQuery) mlt.like(docNum); 

result.add(query, BooleanClause.Occur.MUST_NOT); 

System.out.println(result.getClauses().toString());
 
but it doesnt help

any idea




MoreLikeThis to extract relevant terms to the query from the index

2010-11-07 Thread farag ahmed
Hi All,

I am using MoreLikeThis.java in lucene to expand the query with related terms. 
It works fine and I could retrieve the relevant documents to the query but I 
couldn’t know how to extract the related terms to the query for the index. 

my task is:

For example query is bank related terms can be money, credit and so on 
that appeares frequntly with bank in the index.
what I should write in the main even I get the interesting terms to my query?

i tried 

BooleanQuery result = (BooleanQuery) mlt.like(docNum); 

result.add(query, BooleanClause.Occur.MUST_NOT); 

System.out.println(result.getClauses().toString());

but it doesnt help

any idea






Tomcat special character problem

2010-11-07 Thread Em

Hi List,

I got an issue with my Solr-environment in Tomcat.
First: I am not very familiar with Tomcat, so it might be my fault and not
Solr's.

It can not be a solr-side configuration problem, since everything worked
fine with my local Jetty-servlet container.

However, when I deploy into Tomcat, several special characters were shown in
their utf-8 representation.

Example:
göteburg will be displayed as str name=qgöteburg/str when it comes to
search.

I tried the following within my server.xml-file

Connector port=8080 protocol=HTTP/1.1 
   connectionTimeout=2 
   redirectPort=8443
   URIEncoding=UTF-8 /

And restarted Tomcat afterwards.

The problem only occurs when I try to search for something.
It is no problem to index that data.

Thank you for any help!

Regards,
Em
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857648.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tomcat special character problem

2010-11-07 Thread Ken Stanley
On Sun, Nov 7, 2010 at 9:11 AM, Em mailformailingli...@yahoo.de wrote:


 Hi List,

 I got an issue with my Solr-environment in Tomcat.
 First: I am not very familiar with Tomcat, so it might be my fault and not
 Solr's.

 It can not be a solr-side configuration problem, since everything worked
 fine with my local Jetty-servlet container.

 However, when I deploy into Tomcat, several special characters were shown
 in
 their utf-8 representation.

 Example:
 göteburg will be displayed as str name=qgöteburg/str when it comes
 to
 search.

 I tried the following within my server.xml-file

Connector port=8080 protocol=HTTP/1.1
   connectionTimeout=2
   redirectPort=8443
   URIEncoding=UTF-8 /

 And restarted Tomcat afterwards.

 The problem only occurs when I try to search for something.
 It is no problem to index that data.

 Thank you for any help!

 Regards,
 Em
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857648.html
 Sent from the Solr - User mailing list archive at Nabble.com.


That is definitely odd. When I tried copying göteburg and doing a manual
query in my web browser, everything worked. How are you making the request
to SOLR? When I viewed the properties/info of the results, my returned
charset was in UTF-8. Can you confirm similar for you?

When I grepped for UTF-8 in both my SOLR and Tomcat configs, nothing stood
out as a special configuration option.


Re: Tomcat special character problem

2010-11-07 Thread Em

Hi Ken,

thank you for your quick answer!

To make sure that there occurs no mistakes at my application's side, I send
my requests with the form that is available at solr/admin/form.jsp

I changed almost nothing from the example-configurations within the
example-package except some auto-commit params.

All the special-characters within the results were displayed correctly, and
so far they were also indexed correctly. 
The only problem is querying with special-characters. 

I can confirm that the page is encoded in UTF-8 within my browser.

Is there a possibility that Tomcat did not use the UTF-8 URIEncoding?
Maybe I should say that Tomcat is behind an Apache HttpdServer and is
mounted by a jk_mount.

Thank you! 


Ken Stanley wrote:
 
 On Sun, Nov 7, 2010 at 9:11 AM, Em mailformailingli...@yahoo.de wrote:
 

 Hi List,

 I got an issue with my Solr-environment in Tomcat.
 First: I am not very familiar with Tomcat, so it might be my fault and
 not
 Solr's.

 It can not be a solr-side configuration problem, since everything worked
 fine with my local Jetty-servlet container.

 However, when I deploy into Tomcat, several special characters were shown
 in
 their utf-8 representation.

 Example:
 göteburg will be displayed as str name=qgöteburg/str when it comes
 to
 search.

 I tried the following within my server.xml-file

Connector port=8080 protocol=HTTP/1.1
   connectionTimeout=2
   redirectPort=8443
   URIEncoding=UTF-8 /

 And restarted Tomcat afterwards.

 The problem only occurs when I try to search for something.
 It is no problem to index that data.

 Thank you for any help!

 Regards,
 Em
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857648.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 That is definitely odd. When I tried copying göteburg and doing a manual
 query in my web browser, everything worked. How are you making the request
 to SOLR? When I viewed the properties/info of the results, my returned
 charset was in UTF-8. Can you confirm similar for you?
 
 When I grepped for UTF-8 in both my SOLR and Tomcat configs, nothing
 stood
 out as a special configuration option.
 
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857729.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tomcat special character problem

2010-11-07 Thread Ken Stanley
On Sun, Nov 7, 2010 at 9:34 AM, Em mailformailingli...@yahoo.de wrote:


 Hi Ken,

 thank you for your quick answer!

 To make sure that there occurs no mistakes at my application's side, I send
 my requests with the form that is available at solr/admin/form.jsp

 I changed almost nothing from the example-configurations within the
 example-package except some auto-commit params.

 All the special-characters within the results were displayed correctly, and
 so far they were also indexed correctly.
 The only problem is querying with special-characters.

 I can confirm that the page is encoded in UTF-8 within my browser.

 Is there a possibility that Tomcat did not use the UTF-8 URIEncoding?
 Maybe I should say that Tomcat is behind an Apache HttpdServer and is
 mounted by a jk_mount.

 Thank you!


I am not familiar with using your type of set up, but a quick Google search
suggested using a second connector on a different port. If you're using
mod_jk, you can try setting JkOptions +ForwardURICompatUnparsed to see if
that helps. (
http://markstechstuff.blogspot.com/2008/02/utf-8-problem-between-apache-and-tomcat.html).
Sorry I couldn't have been more help. :)

- Ken


Re: Tomcat special character problem

2010-11-07 Thread Em

This helped a lot, since it solved the göteburg-problem.
Thank you, Ken! Great help :-).

Unfortunately there are some other encoding problems

fq=testcat%3Aacôme worked, however the full url-encoded version 
fq=testcat%3Aac%F4me does not.

The first version is the result of submitting the form.jsp, the second is
the version when you click into the adress-bar and press enter. 

This is a real problem for me, since applications that send a query send an
urlencoded query like the second one.

Any suggestions?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1857963.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: solr 4.0 - pagination

2010-11-07 Thread Papp Richard
Dear Yonik,

  this is fantastic, but can you tell any time it will be ready ?
  I would need this feature in two weeks. Is it possible to finish and make
an update in this time or should I look for another solution cocerning the
pgaination (like implement just more results link instead of pagination) ?

best regards,
  Rich

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Saturday, October 30, 2010 19:29
To: solr-user@lucene.apache.org
Subject: Re: solr 4.0 - pagination

On Sat, Oct 30, 2010 at 12:22 PM, Papp Richard ccode...@gmail.com wrote:
  I'm using Solr 4.0 with grouping (field collapsing), but unfortunately I
 can't solve the pagination.

It's not implemented yet, but I'm working on that right now.

-Yonik
http://www.lucidimagination.com
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5576 (20101029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5598 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



Re: solr 4.0 - pagination

2010-11-07 Thread Yonik Seeley
On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard ccode...@gmail.com wrote:
  this is fantastic, but can you tell any time it will be ready ?

It already is ;-)  Grab the latest trunk or the latest nightly build.

-Yonik
http://www.lucidimagination.com


RE: solr 4.0 - pagination

2010-11-07 Thread Papp Richard
thank you very much Yonik! 
you are a magician!

regards,
  Rich

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, November 07, 2010 18:04
To: Papp Richard
Cc: solr-user@lucene.apache.org
Subject: Re: solr 4.0 - pagination

On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard ccode...@gmail.com wrote:
  this is fantastic, but can you tell any time it will be ready ?

It already is ;-)  Grab the latest trunk or the latest nightly build.

-Yonik
http://www.lucidimagination.com
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5598 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5598 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: Corename after Swap in MultiCore

2010-11-07 Thread Ephraim Ofir
Do you mean solr.core.name has the wrong value after the swap? You
swapped doc-temp so now it's doc and solr.core.name is still doc-temp?
This completely contradicts my experience, what version of solr are you
using?
Why use postCommit? You're running the risk of performing a swap when
you don't mean to.  Are you using DIH? If so, I'd go with querying the
status of the import until it's done and then performing the swap.

Ephraim Ofir


-Original Message-
From: sivaram [mailto:yogendra.bopp...@gmail.com] 
Sent: Wednesday, November 03, 2010 4:46 PM
To: solr-user@lucene.apache.org
Subject: Corename after Swap in MultiCore


Hi everyone,

Long question but please hold on. I'm using a multicore Solr instance to
index different documents from different sources( around 4) and I'm
using a
common config for all the cores. So, for each source I have core and
temp
core like 'doc' and 'doc-temp'. So, everytime I want to get new data, I
do
dataimport to the temp core and then swap the cores. For swaping I'm
using
the postCommit event listener to make sure the swap is done after the
completing commit. 

After the first swap when I use solr.core.name on the doc-temp it is
returning doc as its name ( because the commit is done on the doc's data
dir
after the first swap ). How do I get the core name of the doc-temp here
in
order to swap again with .swap ? 

I'm stuck here. Please help me. Also if anyone know for sure if a
dataimport
is being done on a core then the next swap query will be executed only
after
this dataimport is finished?

Thanks in advance.
Ram.
-- 
View this message in context:
http://lucene.472066.n3.nabble.com/Corename-after-Swap-in-MultiCore-tp18
35325p1835325.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Removing irrelevant URLS

2010-11-07 Thread Erick Erickson
You can always do a delete-by-query, but that pre-supposes you can form
a query that would remove only those documents with URLs you want
removed... Assuming you do this, an optimize would then physically
remove the documents from your index (delete by query just marks
the docs as deleted).

Solr has nothing specifically for URLs, it's an engine rather than a web
crawling app

Best
Erick

On Fri, Nov 5, 2010 at 4:33 PM, Eric Martin e...@makethembite.com wrote:

 Hi,



 I have 100k URL's in my index. I specifically crawled sits relating to law.
 However, during my intitial crawls I didn't specify urlfilters so I am
 stuck
 with extrinsic and often irrelevant URL's like twitter, etc.



 Is there some way in Solr that I can run periodic URL cleanings to remove
 URL's and search string results? Or, should I just dump my index and
 rebuild
 using the filter?



 I have looked on the Solr wiki and came across some candidates that look
 like it is what I am trying to accomplish but am not sure. If anyone knows
 where I should be looking I would appreciate it.



 Eric




RE: Removing irrelevant URLS

2010-11-07 Thread Eric Martin
OK, thanks. I am using nutch and figuring out how to use urlfilters,
unsuccessfully. Just thought there might be a way I could save some trouble
this way. Thanks!

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, November 07, 2010 8:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Removing irrelevant URLS

You can always do a delete-by-query, but that pre-supposes you can form
a query that would remove only those documents with URLs you want
removed... Assuming you do this, an optimize would then physically
remove the documents from your index (delete by query just marks
the docs as deleted).

Solr has nothing specifically for URLs, it's an engine rather than a web
crawling app

Best
Erick

On Fri, Nov 5, 2010 at 4:33 PM, Eric Martin e...@makethembite.com wrote:

 Hi,



 I have 100k URL's in my index. I specifically crawled sits relating to
law.
 However, during my intitial crawls I didn't specify urlfilters so I am
 stuck
 with extrinsic and often irrelevant URL's like twitter, etc.



 Is there some way in Solr that I can run periodic URL cleanings to remove
 URL's and search string results? Or, should I just dump my index and
 rebuild
 using the filter?



 I have looked on the Solr wiki and came across some candidates that look
 like it is what I am trying to accomplish but am not sure. If anyone knows
 where I should be looking I would appreciate it.



 Eric





Adding Carrot2

2010-11-07 Thread Eric Martin
Hi,

 

Solr and nutch have been working fine. I now want to integrate Carrot2. I
followed this tutorial/quickstart:
http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabil
ities/

 

I didn't see anything to adjust in my schema so I didn't do anything there.
I did add the code to the solrconfig.xml though. I am getting this when I
start Solr now:

 

Command: java -Dsolr.clustering.enabled=true -jar start.jar

 

Nov 7, 2010 11:35:16 AM org.apache.solr.common.SolrException log

SEVERE: java.lang.RuntimeException: [solrconfig.xml] requestHandler: missing
mandatory attribute 'class'

 

Anyone run into issues with Carrot2?

 

Eric



RE: solr 4.0 - pagination

2010-11-07 Thread Papp Richard
Hi Yonik,

  I've just tried the latest stable version from nightly build:
apache-solr-4.0-2010-11-05_08-06-28.war

  I have some concerns however: I have 3 documents; 2 in the first group, 1
in the 2nd group.
  
  1. I got for matches 3 - which is good, but I still don't know how many
groups I have. (using start = 0, rows = 10)
  2. as far as I see the start / rows is working now, but the matches is
returned incorrectly = it said matches = 3 instead of = 1, when I used
start = 1, rows = 1

  so can you help me, how to compute how many pages I'll have, because the
matches can't use for this.

regards,
  Rich

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, November 07, 2010 18:04
To: Papp Richard
Cc: solr-user@lucene.apache.org
Subject: Re: solr 4.0 - pagination

On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard ccode...@gmail.com wrote:
  this is fantastic, but can you tell any time it will be ready ?

It already is ;-)  Grab the latest trunk or the latest nightly build.

-Yonik
http://www.lucidimagination.com
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5598 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5599 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



Re: solr 4.0 - pagination

2010-11-07 Thread Yonik Seeley
On Sun, Nov 7, 2010 at 2:45 PM, Papp Richard ccode...@gmail.com wrote:
 Hi Yonik,

  I've just tried the latest stable version from nightly build:
 apache-solr-4.0-2010-11-05_08-06-28.war

  I have some concerns however: I have 3 documents; 2 in the first group, 1
 in the 2nd group.

  1. I got for matches 3 - which is good, but I still don't know how many
 groups I have. (using start = 0, rows = 10)
  2. as far as I see the start / rows is working now, but the matches is
 returned incorrectly = it said matches = 3 instead of = 1, when I used
 start = 1, rows = 1

matches is the number of documents before grouping, so start/rows or
group.offset/group.limit will not affect this number.

  so can you help me, how to compute how many pages I'll have, because the
 matches can't use for this.

Solr doesn't even know given the current algorithm, hence it can't
return that info.

The issue is that to calculate the total number of groups, we would
need to keep each group in memory (which could cause a big blowup if
there are tons of groups).  The current algorithm only keeps the top
10 groups (assuming rows=10) in memory at any one time, hence it has
no idea what the total number of groups is.

-Yonik
http://www.lucidimagination.com


RE: solr 4.0 - pagination

2010-11-07 Thread Papp Richard
Hey Yonik,

  Sorry, I think the matches is ok - because it probably returns always the
total document number - however I don't know how to compute the number of
pages.

thanks,
  Rich

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, November 07, 2010 18:04
To: Papp Richard
Cc: solr-user@lucene.apache.org
Subject: Re: solr 4.0 - pagination

On Sun, Nov 7, 2010 at 10:55 AM, Papp Richard ccode...@gmail.com wrote:
  this is fantastic, but can you tell any time it will be ready ?

It already is ;-)  Grab the latest trunk or the latest nightly build.

-Yonik
http://www.lucidimagination.com
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5598 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5599 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



RE: solr 4.0 - pagination

2010-11-07 Thread Papp Richard
I see. Let's assume that there are 1000 groups.
Can I use safely (with no negative impact on memory usage or slowness) the
start = 990, rows = 10 to get the latest page?
Or this will not work, due you will need to compute all the groups till
1000, in order to return the last 10, and because of this the whole will be
slow / memory usage will increase considerably.

regards,
  Rich

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, November 07, 2010 21:54
To: Papp Richard
Cc: solr-user@lucene.apache.org
Subject: Re: solr 4.0 - pagination

On Sun, Nov 7, 2010 at 2:45 PM, Papp Richard ccode...@gmail.com wrote:
 Hi Yonik,

  I've just tried the latest stable version from nightly build:
 apache-solr-4.0-2010-11-05_08-06-28.war

  I have some concerns however: I have 3 documents; 2 in the first group, 1
 in the 2nd group.

  1. I got for matches 3 - which is good, but I still don't know how many
 groups I have. (using start = 0, rows = 10)
  2. as far as I see the start / rows is working now, but the matches is
 returned incorrectly = it said matches = 3 instead of = 1, when I used
 start = 1, rows = 1

matches is the number of documents before grouping, so start/rows or
group.offset/group.limit will not affect this number.

  so can you help me, how to compute how many pages I'll have, because the
 matches can't use for this.

Solr doesn't even know given the current algorithm, hence it can't
return that info.

The issue is that to calculate the total number of groups, we would
need to keep each group in memory (which could cause a big blowup if
there are tons of groups).  The current algorithm only keeps the top
10 groups (assuming rows=10) in memory at any one time, hence it has
no idea what the total number of groups is.

-Yonik
http://www.lucidimagination.com
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5599 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5599 (20101107) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



Re: Tomcat special character problem

2010-11-07 Thread Michael Sokolov
Is it possible that your original search is being posted (HTTP POST), 
and the character encoding of the page with the form is not UTF-8?  In 
that case, I believe a header gets sent with the request specifying a 
different character set (different from parameters in the URL, for 
which  it's not possible to specify an encoding explicitly).


-Mike

On 11/7/2010 10:26 AM, Em wrote:

This helped a lot, since it solved the göteburg-problem.
Thank you, Ken! Great help :-).

Unfortunately there are some other encoding problems

fq=testcat%3Aacôme worked, however the full url-encoded version
fq=testcat%3Aac%F4me does not.

The first version is the result of submitting the form.jsp, the second is
the version when you click into the adress-bar and press enter.

This is a real problem for me, since applications that send a query send an
urlencoded query like the second one.

Any suggestions?




Re: Adding Carrot2

2010-11-07 Thread Lance Norskog
Carrot is already part of the Solr distributions. 1.4.1 and 3.x and the trunk.

On 11/7/10, Eric Martin e...@makethembite.com wrote:
 Hi,



 Solr and nutch have been working fine. I now want to integrate Carrot2. I
 followed this tutorial/quickstart:
 http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabil
 ities/



 I didn't see anything to adjust in my schema so I didn't do anything there.
 I did add the code to the solrconfig.xml though. I am getting this when I
 start Solr now:



 Command: java -Dsolr.clustering.enabled=true -jar start.jar



 Nov 7, 2010 11:35:16 AM org.apache.solr.common.SolrException log

 SEVERE: java.lang.RuntimeException: [solrconfig.xml] requestHandler: missing
 mandatory attribute 'class'



 Anyone run into issues with Carrot2?



 Eric




-- 
Lance Norskog
goks...@gmail.com


RE: Adding Carrot2

2010-11-07 Thread Eric Martin
Yeah I know, you have to download the libraries and copy them to your /lib 
inside of Solr. In Solr 1.4 the plugin is available but the libraries are not. 
http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabilities/

I think there is something wrong with the schema and solrconfig (xml's) 
integration. Some documentation on Apache says it's already written into the 
xml and some says its not. Searching the xml's in Solr I find no reference to 
clustering. Now that I think about it, I copied over the solrconfig.xml and 
schema.xml with my Drupal/ApacheSolr xml's.

I think I may have answered my own question as to why the clustering isn't 
running correctly. I will go get a copy of the default xml's and if I find it 
there, I will try and merge them. Does this sound I am on the right path now?

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Sunday, November 07, 2010 12:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding Carrot2

Carrot is already part of the Solr distributions. 1.4.1 and 3.x and the trunk.

On 11/7/10, Eric Martin e...@makethembite.com wrote:
 Hi,



 Solr and nutch have been working fine. I now want to integrate Carrot2. I
 followed this tutorial/quickstart:
 http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabil
 ities/



 I didn't see anything to adjust in my schema so I didn't do anything there.
 I did add the code to the solrconfig.xml though. I am getting this when I
 start Solr now:



 Command: java -Dsolr.clustering.enabled=true -jar start.jar



 Nov 7, 2010 11:35:16 AM org.apache.solr.common.SolrException log

 SEVERE: java.lang.RuntimeException: [solrconfig.xml] requestHandler: missing
 mandatory attribute 'class'



 Anyone run into issues with Carrot2?



 Eric




-- 
Lance Norskog
goks...@gmail.com



Re: Tomcat special character problem

2010-11-07 Thread Em

I also thought that this might be the case a few hours ago.
However, I have to verify that tomorrow.

From a debugging point of view: 
How can I set the encoding of my browser's adress-bar?
When I pressed enter the encoding switched from clear-text to an urlencoded
version.
The urlencoded version did not work.

Thank you Mike.

I will give you a feedback whether it worked or not!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1859259.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tomcat special character problem

2010-11-07 Thread Dennis Gearon
In a post document, or a get document with URL encoded variables in the BODY of 
the document, it's possible to specify/use different encodings that are 
actually 
specified in the headers. For SURE in post, and I'm pretty sure in GET also.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Michael Sokolov soko...@ifactory.com
To: solr-user@lucene.apache.org
Cc: Em mailformailingli...@yahoo.de
Sent: Sun, November 7, 2010 12:40:45 PM
Subject: Re: Tomcat special character problem

Is it possible that your original search is being posted (HTTP POST), 
and the character encoding of the page with the form is not UTF-8?  In 
that case, I believe a header gets sent with the request specifying a 
different character set (different from parameters in the URL, for 
which  it's not possible to specify an encoding explicitly).

-Mike

On 11/7/2010 10:26 AM, Em wrote:
 This helped a lot, since it solved the göteburg-problem.
 Thank you, Ken! Great help :-).

 Unfortunately there are some other encoding problems

 fq=testcat%3Aacôme worked, however the full url-encoded version
 fq=testcat%3Aac%F4me does not.

 The first version is the result of submitting the form.jsp, the second is
 the version when you click into the adress-bar and press enter.

 This is a real problem for me, since applications that send a query send an
 urlencoded query like the second one.

 Any suggestions?


Re: Adding Carrot2

2010-11-07 Thread Lance Norskog
There are three xml sets. The solr/example set, the drupal solr, AND
the set in contrib/clustering/src/test/resources/solr/conf/. These are
what clustering is actually tested with. So, the first order of
business is to check if clustering works with example/solr/conf. The
diffs looked like the clustering files were just old versions of
example/solr. But they might need a little merging.

Lance

On Sun, Nov 7, 2010 at 12:47 PM, Eric Martin e...@makethembite.com wrote:
 Yeah I know, you have to download the libraries and copy them to your /lib 
 inside of Solr. In Solr 1.4 the plugin is available but the libraries are 
 not. 
 http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabilities/

 I think there is something wrong with the schema and solrconfig (xml's) 
 integration. Some documentation on Apache says it's already written into the 
 xml and some says its not. Searching the xml's in Solr I find no reference to 
 clustering. Now that I think about it, I copied over the solrconfig.xml and 
 schema.xml with my Drupal/ApacheSolr xml's.

 I think I may have answered my own question as to why the clustering isn't 
 running correctly. I will go get a copy of the default xml's and if I find it 
 there, I will try and merge them. Does this sound I am on the right path now?

 -Original Message-
 From: Lance Norskog [mailto:goks...@gmail.com]
 Sent: Sunday, November 07, 2010 12:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Adding Carrot2

 Carrot is already part of the Solr distributions. 1.4.1 and 3.x and the trunk.

 On 11/7/10, Eric Martin e...@makethembite.com wrote:
 Hi,



 Solr and nutch have been working fine. I now want to integrate Carrot2. I
 followed this tutorial/quickstart:
 http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabil
 ities/



 I didn't see anything to adjust in my schema so I didn't do anything there.
 I did add the code to the solrconfig.xml though. I am getting this when I
 start Solr now:



 Command: java -Dsolr.clustering.enabled=true -jar start.jar



 Nov 7, 2010 11:35:16 AM org.apache.solr.common.SolrException log

 SEVERE: java.lang.RuntimeException: [solrconfig.xml] requestHandler: missing
 mandatory attribute 'class'



 Anyone run into issues with Carrot2?



 Eric




 --
 Lance Norskog
 goks...@gmail.com





-- 
Lance Norskog
goks...@gmail.com


facetting when using field collapsing

2010-11-07 Thread Lukas Kahwe Smith
Hi,

I am pondering making use of field collapsing. I am currently indexing clauses 
(sections) inside UN documents:
http://resolutionfinder.org/search/unifiedResults?q=africa=t[22]=medicationdc=st=clause

Now since right now my data set is still fairly small I am doing field 
collapsing in userland:
http://resolutionfinder.org/search/unifiedResults?q=africa=t[22]=medicationdc=st=document

However while this works alright (not ideal, since I am fetching essentially 
the entire result set and not paged as for clauses) etc, I still have no idea 
how to get the facet filters to display the right counts. So I am wondering if 
field collapsing in its current form supports faceting, since its not mentioned 
on the wiki page:
http://wiki.apache.org/solr/FieldCollapsing

regards,
Lukas Kahwe Smith
m...@pooteeweet.org