Re: Unsubscribe me

2015-06-08 Thread François Schiettecatte
Please follow instructions here: http://lucene.apache.org/solr/resources.html

F.


 On Jun 8, 2015, at 1:06 AM, Dylan dylan.h...@gmail.com wrote:
 
 On 30 May 2015 12:08, Lalit Kumar 4 lkum...@sapient.com wrote:
 
 Please unsubscribe me as well
 
 On May 30, 2015 15:23, Neha Jatav neha.ja...@gmail.com wrote:
 Unsubscribe me
 



Re: Unsubscribe me

2015-05-30 Thread François Schiettecatte
Quoting Erik from two days ago:

Please follow the instructions here:

http://lucene.apache.org/solr/resources.html. Be sure to use the exact same 
e-mail you used to subscribe.


 On May 30, 2015, at 6:07 AM, Lalit Kumar 4 lkum...@sapient.com wrote:
 
 Please unsubscribe me as well
 
 On May 30, 2015 15:23, Neha Jatav neha.ja...@gmail.com wrote:
 Unsubscribe me



Re: YAJar

2015-05-26 Thread François Schiettecatte
Run whatever tests you want with 14.0.1, replace it with 18.0, rerun the tests 
and compare.

François

 On May 26, 2015, at 10:25 AM, Robust Links pey...@robustlinks.com wrote:
 
 by dumping you mean recompiling solr with guava 18?
 
 On Tue, May 26, 2015 at 10:22 AM, François Schiettecatte 
 fschietteca...@gmail.com wrote:
 
 Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a
 while ago and it worked fine for me.
 
 François
 
 On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com
 wrote:
 
 i have a minhash logic that uses guava 18.0 method that is not in guava
 14.0.1. This minhash logic is a separate maven project. I'm including it
 in
 my project via maven.the code is being used as a search component on the
 set of results. The logic goes through the search results and deletes
 duplicates. here is the solrconfig.xml
 
   requestHandler name=/select class=solr.SearchHandler
 default=true
 
 
  arr name=last-components
 
  strtvComponent/str
 
  strterms/str
 
  strminHashDedup/str
 
  /arr
 
   /requestHandler
 
 searchComponent name=minHashDedup
 class=com.xyz.DedupSearchHitsstr
 name=MAX_COMPARISONS5/str
 
 DedupSearchHits class is the one implementing the minhash (hence using
 guava 18). I start solr via the solr.in.sh script. The error I am
 getting
 is:
 
 
 Caused by: java.lang.NoSuchMethodError:
 
 com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode;
 
 at com.xyz.incrementToken(MinHashTokenFilter.java:54)
 
 at com.xyz.MinHash.calculate(MinHash.java:131)
 
 at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89)
 
 at
 com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74)
 
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338)
 
 at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297)
 
 at org.apache.solr.core.SolrCore.init(SolrCore.java:813)
 
 
 What is the best design to solve this problem?I understand the point of
 modularity but how can i include logic into solr that does result
 processing without loading that jar into solr?
 
 thank you
 
 
 On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com
 wrote:
 
 I guess this is one reason why the whole WAR approach is being removed!
 Solr should be a black-box that you talk to, and get responses from.
 What
 it depends on and how it is deployed, should be irrelevant to you.
 
 If you are wanting to override the version of guava that Solr uses, then
 you'd have to rebuild Solr (can be done with maven) and manually update
 the
 pom.xml to use guava 18.0, but why would you? You need to test Solr
 completely (in case any guava bugs affect Solr), deal with any build
 issues
 that arise (if guava changes any APIs), and cause yourself a world of
 pain,
 for what gain?
 
 
 On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:
 
 i have custom search components.
 
 On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:
 
 Why is your app tied that closely to Solr? I can understand if you are
 talking about SolrJ, but normal usage you use a different application
 in
 a different JVM from Solr.
 
 Upayavira
 
 On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
 I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
 question. i
 noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What
 is
 the pattern to override a jar version uploaded into jetty?
 
 I am using maven, and solr is being started the old way
 
 java -jar start.jar
 -Dsolr.solr.home=...
 -Djetty.home=...
 
 I tried to edit jetty's start.config (then run java
 -DSTART=/my/dir/start.config
 -jar start.jar) but got no where...
 
 any help would be much appreciated
 
 Peyman
 
 
 
 
 



Re: YAJar

2015-05-26 Thread François Schiettecatte
What I am suggesting is that you set up a stand alone version of solr with 
14.0.1 and run some sort of test suite similar to what you would normally use 
solr for in your app. The replace the guava jar and re-run the tests. If all 
works well, and I suspect it will because it did for me, then you can use 18.0. 
Simple really.

François

 On May 26, 2015, at 10:30 AM, Robust Links pey...@robustlinks.com wrote:
 
 i can't run 14.0.1. that is the problem. 14 does not have the interfaces i
 need
 
 On Tue, May 26, 2015 at 10:28 AM, François Schiettecatte 
 fschietteca...@gmail.com wrote:
 
 Run whatever tests you want with 14.0.1, replace it with 18.0, rerun the
 tests and compare.
 
 François
 
 On May 26, 2015, at 10:25 AM, Robust Links pey...@robustlinks.com
 wrote:
 
 by dumping you mean recompiling solr with guava 18?
 
 On Tue, May 26, 2015 at 10:22 AM, François Schiettecatte 
 fschietteca...@gmail.com wrote:
 
 Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a
 while ago and it worked fine for me.
 
 François
 
 On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com
 wrote:
 
 i have a minhash logic that uses guava 18.0 method that is not in guava
 14.0.1. This minhash logic is a separate maven project. I'm including
 it
 in
 my project via maven.the code is being used as a search component on
 the
 set of results. The logic goes through the search results and deletes
 duplicates. here is the solrconfig.xml
 
  requestHandler name=/select class=solr.SearchHandler
 default=true
 
 
 arr name=last-components
 
 strtvComponent/str
 
 strterms/str
 
 strminHashDedup/str
 
 /arr
 
  /requestHandler
 
 searchComponent name=minHashDedup
 class=com.xyz.DedupSearchHitsstr
 name=MAX_COMPARISONS5/str
 
 DedupSearchHits class is the one implementing the minhash (hence using
 guava 18). I start solr via the solr.in.sh script. The error I am
 getting
 is:
 
 
 Caused by: java.lang.NoSuchMethodError:
 
 
 com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode;
 
 at com.xyz.incrementToken(MinHashTokenFilter.java:54)
 
 at com.xyz.MinHash.calculate(MinHash.java:131)
 
 at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89)
 
 at
 com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74)
 
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338)
 
 at
 org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297)
 
 at org.apache.solr.core.SolrCore.init(SolrCore.java:813)
 
 
 What is the best design to solve this problem?I understand the point of
 modularity but how can i include logic into solr that does result
 processing without loading that jar into solr?
 
 thank you
 
 
 On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com
 
 wrote:
 
 I guess this is one reason why the whole WAR approach is being
 removed!
 Solr should be a black-box that you talk to, and get responses from.
 What
 it depends on and how it is deployed, should be irrelevant to you.
 
 If you are wanting to override the version of guava that Solr uses,
 then
 you'd have to rebuild Solr (can be done with maven) and manually
 update
 the
 pom.xml to use guava 18.0, but why would you? You need to test Solr
 completely (in case any guava bugs affect Solr), deal with any build
 issues
 that arise (if guava changes any APIs), and cause yourself a world of
 pain,
 for what gain?
 
 
 On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:
 
 i have custom search components.
 
 On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:
 
 Why is your app tied that closely to Solr? I can understand if you
 are
 talking about SolrJ, but normal usage you use a different
 application
 in
 a different JVM from Solr.
 
 Upayavira
 
 On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
 I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
 question. i
 noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0.
 What
 is
 the pattern to override a jar version uploaded into jetty?
 
 I am using maven, and solr is being started the old way
 
 java -jar start.jar
 -Dsolr.solr.home=...
 -Djetty.home=...
 
 I tried to edit jetty's start.config (then run java
 -DSTART=/my/dir/start.config
 -jar start.jar) but got no where...
 
 any help would be much appreciated
 
 Peyman
 
 
 
 
 
 
 



Re: YAJar

2015-05-26 Thread François Schiettecatte
Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a while ago 
and it worked fine for me.

François

 On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com wrote:
 
 i have a minhash logic that uses guava 18.0 method that is not in guava
 14.0.1. This minhash logic is a separate maven project. I'm including it in
 my project via maven.the code is being used as a search component on the
 set of results. The logic goes through the search results and deletes
 duplicates. here is the solrconfig.xml
 
requestHandler name=/select class=solr.SearchHandler default=true
 
 
   arr name=last-components
 
   strtvComponent/str
 
   strterms/str
 
   strminHashDedup/str
 
   /arr
 
/requestHandler
 
  searchComponent name=minHashDedup class=com.xyz.DedupSearchHitsstr
 name=MAX_COMPARISONS5/str
 
 DedupSearchHits class is the one implementing the minhash (hence using
 guava 18). I start solr via the solr.in.sh script. The error I am getting
 is:
 
 
 Caused by: java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode;
 
 at com.xyz.incrementToken(MinHashTokenFilter.java:54)
 
 at com.xyz.MinHash.calculate(MinHash.java:131)
 
 at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89)
 
 at com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74)
 
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305)
 
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338)
 
 at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297)
 
 at org.apache.solr.core.SolrCore.init(SolrCore.java:813)
 
 
 What is the best design to solve this problem?I understand the point of
 modularity but how can i include logic into solr that does result
 processing without loading that jar into solr?
 
 thank you
 
 
 On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com
 wrote:
 
 I guess this is one reason why the whole WAR approach is being removed!
 Solr should be a black-box that you talk to, and get responses from.  What
 it depends on and how it is deployed, should be irrelevant to you.
 
 If you are wanting to override the version of guava that Solr uses, then
 you'd have to rebuild Solr (can be done with maven) and manually update the
 pom.xml to use guava 18.0, but why would you? You need to test Solr
 completely (in case any guava bugs affect Solr), deal with any build issues
 that arise (if guava changes any APIs), and cause yourself a world of pain,
 for what gain?
 
 
 On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote:
 
 i have custom search components.
 
 On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote:
 
 Why is your app tied that closely to Solr? I can understand if you are
 talking about SolrJ, but normal usage you use a different application
 in
 a different JVM from Solr.
 
 Upayavira
 
 On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote:
 I am stuck in Yet Another Jarmagedon of SOLR. this is a basic
 question. i
 noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What
 is
 the pattern to override a jar version uploaded into jetty?
 
 I am using maven, and solr is being started the old way
 
 java -jar start.jar
 -Dsolr.solr.home=...
 -Djetty.home=...
 
 I tried to edit jetty's start.config (then run java
 -DSTART=/my/dir/start.config
 -jar start.jar) but got no where...
 
 any help would be much appreciated
 
 Peyman
 
 
 



Re: how to debug solr performance degradation

2015-02-24 Thread François Schiettecatte
Rebecca

You don’t want to give all the memory to the JVM. You want to give it just 
enough for it to work optimally and leave the rest of the memory for the OS to 
use for caching data. Giving the JVM too much memory can result in worse 
performance because of GC. There is no magic formula to figuring out the memory 
allocation for the JVM, that is very dependent on the workload. In your case I 
would start with 5GB, and increment by 5GB with each run.

I also use these settings for the JVM

-XX:+UseG1GC -Xms1G -Xmx1G

-XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:+ParallelRefProcEnabled 
-XX:MaxGCPauseMillis=200

I got them from this list so can’t take credit for them but they work for me.


Cheers

François


 On Feb 24, 2015, at 7:45 PM, Tang, Rebecca rebecca.t...@ucsf.edu wrote:
 
 We gave the machine 180G mem to see if it improves performance.  However,
 after we increased the memory, Solr started using only 5% of the physical
 memory.  It has always used 90-something%.
 
 What could be causing solr to not grab all the physical memory (grabbing
 so little of the physical memory)?
 
 
 Rebecca Tang
 Applications Developer, UCSF CKM
 Industry Documents Digital Libraries
 E: rebecca.t...@ucsf.edu
 
 
 
 
 
 On 2/24/15 12:44 PM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 2/24/2015 1:09 PM, Tang, Rebecca wrote:
 Our solr index used to perform OK on our beta production box (anywhere
 between 0-3 seconds to complete any query), but today I noticed that the
 performance is very bad (queries take between 12 ­ 15 seconds).
 
 I haven't updated the solr index configuration
 (schema.xml/solrconfig.xml) lately.  All that's changed is the data ‹
 every month, I rebuild the solr index from scratch and deploy it to the
 box.  We will eventually go to incremental builds. But for now, all
 indexes are built from scratch.
 
 Here are the stats:
 Solr index size 183G
 Documents in index 14364201
 We just have single solr box
 It has 100G memory
 500G Harddrive
 16 cpus
 
 The bottom line on this problem, and I'm sure it's not something you're
 going to want to hear:  You don't have enough memory available to cache
 your index.  I'd plan on at least 192GB of RAM for an index this size,
 and 256GB would be better.
 
 Depending on the exact index schema, the nature of your queries, and how
 large your Java heap for Solr is, 100GB of RAM could be enough for good
 performance on an index that size ... or it might be nowhere near
 enough.  I would imagine that one of two things is true here, possibly
 both:  1) Your queries are very complex and involve accessing a very
 large percentage of the index data.  2) Your Java heap is enormous,
 leaving very little RAM for the OS to automatically cache the index.
 
 Adding more memory to the machine, if that's possible, might fix some of
 the problems.  You can find a discussion of the problem here:
 
 http://wiki.apache.org/solr/SolrPerformanceProblems
 
 If you have any questions after reading that wiki article, feel free to
 ask them.
 
 Thanks,
 Shawn
 
 



Re: American British Dictionary for Solr

2015-02-12 Thread François Schiettecatte
Dinesh


See this:

http://wordlist.aspell.net/varcon/

You will need to do some work to convert to a SOLR friendly format though.

Cheers

François

 On Feb 12, 2015, at 12:22 AM, dinesh naik dineshkumarn...@gmail.com wrote:
 
 Hi ,
 We are looking for a dictionary to support American/British English synonym.
 Could you please let us know what all dictionaries are available ?
 -- 
 Best Regards,
 Dinesh Naik



Re: Solr: How to delete a document

2014-09-13 Thread François Schiettecatte
How about adding 'expungeDeletes=true' as well as 'commit=true'?

François

On Sep 13, 2014, at 4:09 PM, FiMka maximfil...@gmail.com wrote:

 Hi guys, could you say how to delete a document in Solr? After I delete a
 document it still persists in the search results. For example there is the
 following document saved in Solr:
 After I POST the following data to localhost:8983/solr/update/?commit=true:
 Solr each time says 200 OK and responds the following:
 If I try to search
 localhost:8983/solr/lexikos/select?q=phrase%3A+%22qwerty%22wt=jsonindent=true
 for the document once again, it still shown in the results. So how to remove
 the document from Solr index as well or what else to do? Thanks in advance
 for any assistance!
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-How-to-delete-a-document-tp4158649.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Date field related query

2014-09-02 Thread François Schiettecatte
How about :

datefield:[NOW-1DAY/DAY TO *]

François

On Sep 2, 2014, at 6:54 AM, Aman Tandon amantandon...@gmail.com wrote:

 Hi,
 
 I did it using this, fq=datefield:[2014-09-01T23:59:59Z TO
 2014-09-02T23:59:59Z].
 Correct me if i am wrong.
 
 Is there any way to find this using the NOW?
 
 
 With Regards
 Aman Tandon
 
 
 On Tue, Sep 2, 2014 at 4:08 PM, Aman Tandon amantandon...@gmail.com wrote:
 
 Hi,
 
 I am working on date and i want to find all those records which are
 indexed today.
 
 With Regards
 Aman Tandon
 



Re: Random OOM Exceptions

2014-08-14 Thread François Schiettecatte
I would also get some metrics when SOLR is doing nothing, the JVM does do work 
in the background and looking at the memory graph in VisualVM will show a nice 
sawtooth.

François


On Aug 14, 2014, at 1:16 PM, Erick Erickson erickerick...@gmail.com wrote:

 bq: I just don’t know why Solr is suddenly going nuts.
 
 Hmmm, as Shawn says, hard to say at this remove. But
 I've personally doubled the memory requirements for Solr
 on the _same_ index by altering the query to a pathological
 one. Something like
 q=*:*facet.field=whatever
 where the field whatever contains a billion unique strings is
 an example of a pathological query.
 
 So you may have to do the ugly work of correlating memory spikes
 with the queries just prior to the spike. Which you should be able
 to do from the Solr logs.
 
 Sorry I can't be more help...
 Erick
 
 On Thu, Aug 14, 2014 at 9:45 AM, Shawn Heisey s...@elyograg.org wrote:
 On 8/14/2014 10:06 AM, Scott Rankin wrote:
 My question was actually more about what in Solr might cause the
 server to suddenly go from a very consistent heap size of 300-400 MB
 to over 2 GB in a matter of minutes with no changes in traffic. I get
 why the VM is crashing, I just don’t know why Solr is suddenly going nuts.
 
 That's nearly impossible to answer.  Chances are that something has
 changed about the requests that Solr is receiving and now it's required
 to do something that it wasn't before, something that uses a lot of heap
 memory.
 
 The other likely possibilities are:
 
 * There's a bug in your solr version or in some software component that
 you are using with Solr.  That can include the Java virtual machine, the
 servlet container, and/or any third-party Solr components.
 
 * You were running on the hairy edge of heap usage already, and
 something (a traffic increase, a slight change to your requests) pushed
 you over the edge into OutOfMemory.
 
 Thanks,
 Shawn
 



Re: Character encoding problems

2014-07-29 Thread François Schiettecatte
Hi

If you are seeing  appelé au téléphone in the browser, I would guess that 
the data is being rendered in UTF-8 by your server and the content type of the 
html is set to iso-8859-1 or not being set and your browser is defaulting to 
iso-8859-1. 

You can force the encoding to utf-8 in the browser, usually this is a menu item 
(in Chrome/Safari/Firefox).

FWIW having messed around with this kind of stuff in the past, I always 
generate utf-8 and always set the HTML content type to utf-8 with:

meta contentType-equiv=Content-Type content=text/html; 
charset=utf-8 /

Cheers

François


On Jul 29, 2014, at 3:59 PM, Gulliver Smith gulliver.m.sm...@gmail.com wrote:

 Thanks for the information about URIEncoding=UTF-8 in the tomcat
 conf file, but that doesn't answer my main concerns:
 - what is the character encoding of the text in the title_fr field?
 - is there any way to force it to be UTF-8?
 
 On Tue, Jul 29, 2014 at 8:35 AM,  aurelien.mazo...@francelabs.com wrote:
 Hi,
 
 If you use solr 4.8.1, you don't have to add URIEncoding=UTF-8 in the
 tomcat conf file anymore :
 https://wiki.apache.org/solr/SolrTomcat
 
 
 Regards,
 
 Aurélien MAZOYER
 
 
 On 29.07.2014 14:22, Gulliver Smith wrote:
 
 I have solr 4.8.1 under Tomcat 7 on Debian Linux. The connector in
 Tomcat's server.xml has been changed to include character encoding
 UTF-8:
 
 Connector port=8080 protocol=HTTP/1.1
   URIEncoding=UTF-8
   connectionTimeout=2
   redirectPort=8443 /
 
 
 I am posting to the server from PHP 5.5 curl. The extract POST was
 intercepted and confirmed that everything is being encode in UTF-8.
 
 However, the responses to query commands, whether XML or JSON are
 returning field values such as title_fr in something that looks like
 latin1 or iso-8859-1 when displayed in a browser or editor.
 
 E.g.: title_fr:[ appelé au téléphone]
 
 The highlights in the query response do have correctly displaying
 character codes.
 
 E.g. text_fr:[ \n \n  \n  \n  \n  \n  \n  \n  \n \n \nappelé au
 téléphone\nappelé au téléphone\n
 
 PHP's utf8_decode doesn't make sense of the title_fr.
 
 Is there something to configure to fix this and get proper UTF8
 results for everything?
 
 Thanks
 Gulliver



Re: Java heap space error

2014-07-24 Thread François Schiettecatte
A default garbage collector will be chosen for you by the VM, might help to get 
the stack trace to look at.

François

On Jul 24, 2014, at 10:06 AM, Ameya Aware ameya.aw...@gmail.com wrote:

 ooh ok.
 
 So you want to say that since i am using large heap but didnt set my
 garbage collection, thats why i why getting java heap space error?
 
 
 
 
 
 On Thu, Jul 24, 2014 at 9:58 AM, Marcello Lorenzi mlore...@sorint.it
 wrote:
 
 I think that on large heap is suggested to monitor the garbage collection
 behavior and try to add a strategy adapted to your performance.  On my
 production environment with a heap of 6 GB I set this parameter (server
 with 8 cores):
 
 -server -Xms6144m -Xmx6144m -XX:MaxPermSize=512m
 -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
 -XX:+CMSIncrementalMode -XX:+CMSParallelRemarkEnabled
 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70
 -XX:ConcGCThreads=6 -XX:ParallelGCThreads=6
 
 Marcello
 
 
 On 07/24/2014 03:53 PM, Ameya Aware wrote:
 
 I did not make any other change than this.. rest of the settings are
 default.
 
 Do i need to set garbage collection strategy?
 
 
 On Thu, Jul 24, 2014 at 9:49 AM, Marcello Lorenzi mlore...@sorint.it
 wrote:
 
 Hi,
 Did you set a Garbage collection strategy on your JVM ?
 
 Marcello
 
 
 On 07/24/2014 03:32 PM, Ameya Aware wrote:
 
 Hi
 
 I am in process of indexing around 2,00,000 documents.
 
 I have increase java jeap space to 4 GB using below command :
 
 java -Xmx4096M -Xms4096M -jar start.jar
 
 Still after indexing around 15000 documents it gives java heap space
 error
 again.
 
 
 Any fix for this?
 
 Thanks,
 Ameya
 
 
 
 
 



Re: Garbage collection issue and RELOADing cores

2014-07-01 Thread François Schiettecatte
Hi

Just following up on my previous post about a memory leak when RELOADing cores, 
I narrowed it down to the SuggestComponent, specifically 'searchComponent 
name=suggest class=solr.SuggestComponent.../searchComponent' in 
solrconfig.xml. Comment that out and the leak goes away.

The leak occurs in 4.7, 4.8 and 4.9. It occurs when a core is RELOADed, but not 
if it is UNLOADed and then LOADed. It occurs whether G1, CMS or ParallelGC is 
used for garbage collection.

I used JDK 1.7.0_60 and Tomcat 7.0.54 for the underlying layers.

Not sure where to take it from here?

Cheers

François


On Jun 16, 2014, at 4:50 PM, François Schiettecatte fschietteca...@gmail.com 
wrote:

 Hi
 
 I am running into an interesting garbage collection issue and am looking for 
 suggestions/thoughts. 
 
 Because some word lists such as synonyms, plurals, protected words need to be 
 updated on a regular basis I have to RELOAD a number of cores in order to 
 'pick up' the new lists. 
 
 What I have found is that I get a memory leak when I do a RELOAD rather than 
 an UNLOAD/CREATE with core admin. This is most pronounced with the G1 GC and 
 much less so with the CMS GC. The former will cause the VM to run out of 
 memory after 5/6 RELOADs, while the latter does so after 30/35 RELOADs. We 
 are not talking about large indices here, the files footprint totals 470MB.
 
 I am using SOLR 4.8.1, Tomcat 7.0.53, jdk1.7.0_60, on Fedora Core 20. I am 
 not using any fancy GC parameters, I cut everything back to basics, just:
 
   -Xmx1G -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
 
 and 
 
   -Xmx1G -XX:+UseG1GC
 
 I was curious if anyone else had run into this issue and managed to fix it?
 
 Thanks
 
 François
 
 
 



Garbage collection issue and RELOADing cores

2014-06-16 Thread François Schiettecatte
Hi

I am running into an interesting garbage collection issue and am looking for 
suggestions/thoughts. 

Because some word lists such as synonyms, plurals, protected words need to be 
updated on a regular basis I have to RELOAD a number of cores in order to 'pick 
up' the new lists. 

What I have found is that I get a memory leak when I do a RELOAD rather than an 
UNLOAD/CREATE with core admin. This is most pronounced with the G1 GC and much 
less so with the CMS GC. The former will cause the VM to run out of memory 
after 5/6 RELOADs, while the latter does so after 30/35 RELOADs. We are not 
talking about large indices here, the files footprint totals 470MB.

I am using SOLR 4.8.1, Tomcat 7.0.53, jdk1.7.0_60, on Fedora Core 20. I am not 
using any fancy GC parameters, I cut everything back to basics, just:

-Xmx1G -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

and 

-Xmx1G -XX:+UseG1GC

I was curious if anyone else had run into this issue and managed to fix it?

Thanks

François





Re: Any way to view lucene files

2014-06-09 Thread François Schiettecatte
Just click the 'Releases' link:

https://github.com/DmitryKey/luke/releases

François

On Jun 9, 2014, at 10:43 AM, Aman Tandon amantandon...@gmail.com wrote:

 No, Anyways thanks Alex, but where is the luke jar?
 
 With Regards
 Aman Tandon
 
 
 On Mon, Jun 9, 2014 at 6:54 AM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
 Have you looked at:
 https://github.com/DmitryKey/luke
 
 Regards,
   Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency
 
 
 On Mon, Jun 9, 2014 at 8:12 AM, Aman Tandon amantandon...@gmail.com
 wrote:
 I guess this is not available now. I am trying to download from the
 google,
 please take a look https://code.google.com/p/luke/downloads/list
 
 If you have any link please share
 
 With Regards
 Aman Tandon
 
 
 On Sat, Jun 7, 2014 at 10:32 PM, Summer Shire shiresum...@gmail.com
 wrote:
 
 
 Did u try  luke 47
 
 
 
 On Jun 6, 2014, at 11:59 PM, Aman Tandon amantandon...@gmail.com
 wrote:
 
 I also tried with solr 4.2 and with luke version Luke 4.0.0-ALPHA
 
 but got this error:
 java.lang.IllegalArgumentException: A SPI class of type
 org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist.
 You
 need to add the corresponding JAR file supporting this SPI to your
 classpath.The current classpath supports the following names:
 [Lucene40,
 Lucene3x, SimpleText, Appending]
 
 With Regards
 Aman Tandon
 
 
 On Sat, Jun 7, 2014 at 12:22 PM, Aman Tandon amantandon...@gmail.com
 
 wrote:
 
 My solr version is 4.8.1 and luke is 3.5
 
 With Regards
 Aman Tandon
 
 
 On Sat, Jun 7, 2014 at 12:21 PM, Chris Collins ch...@geekychris.com
 
 wrote:
 
 What version of Solr / Lucene are you using?  You have to match the
 Luke
 version to the same version of Lucene.
 
 C
 On Jun 6, 2014, at 11:42 PM, Aman Tandon amantandon...@gmail.com
 wrote:
 
 Yes  tried, but it not working at all every time i choose my index
 directory it shows me EOF past
 
 With Regards
 Aman Tandon
 
 
 On Sat, Jun 7, 2014 at 12:01 PM, Chris Collins 
 ch...@geekychris.com
 
 wrote:
 
 Have you tried:
 
 https://code.google.com/p/luke/
 
 Best
 
 Chris
 On Jun 6, 2014, at 11:24 PM, Aman Tandon amantandon...@gmail.com
 
 wrote:
 
 Hi,
 
 Is there any way so that i can view what information and which is
 there
 in
 my _e.fnm, etc files. may be with the help of any application or
 any
 viewer
 tool.
 
 With Regards
 Aman Tandon
 
 
 



Re: OutOfMemoryError while merging large indexes

2014-04-08 Thread François Schiettecatte
Have you tried using:

-XX:-UseGCOverheadLimit 

François

On Apr 8, 2014, at 6:06 PM, Haiying Wang haiyingwa...@yahoo.com wrote:

 Hi,
 
 We were trying to merge a large index (9GB, 21 million docs) into current 
 index (only 13MB), using mergeindexes command ofCoreAdminHandler, but always 
 run into OOM error. We currently set the max heap size to 4GB for the Solr 
 server. We are using 4.6.0, and did not change the original solrconfig.xml. 
 
 Is there any setting/configure that could help to complete the mergeindexes 
 process without running into OOM error? I can increase the max jvm heap size, 
 but am afraid that may not scale in case larger index need to be merged in 
 the future, and hoping the index merge can be performed with limited memory 
 foorprint. Please help. Thanks!
 
 The jvm heap setting:   -Xmx4096M -Xms512M
 
 Command used:
 
 
 curl 
 http://dev101:8983/solr/admin/cores?action=mergeindexescore=collection1indexDir=/solr/tmp/data/snapshot.20140407194442777;
 
 OOM error stack trace:
 
 Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
 at
 java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133)
 at java.lang.StringCoding.decode(StringCoding.java:179)
 at java.lang.String.lt;initgt;(String.java:483)
 at java.lang.String.lt;initgt;(String.java:539)
 at 
 org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:187)
 at 
 org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:351)
 at 
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276)
 at
 org.apache.lucene.index.IndexReader.document(IndexReader.java:436)
 at 
 org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:345)
 at 
 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:316)
 at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:94)
 at 
 org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2555)
 at 
 org.apache.solr.update.DirectUpdateHandler2.mergeIndexes(DirectUpdateHandler2.java:449)
 at 
 org.apache.solr.update.processor.RunUpdateProcessor.processMergeIndexes(RunUpdateProcessorFactory.java:88)
 at
 org.apache.solr.update.processor.UpdateRequestProcessor.processMergeIndexes(UpdateRequestProcessor.java:59)
 at 
 org.apache.solr.update.processor.LogUpdateProcessor.processMergeIndexes(LogUpdateProcessorFactory.java:149)
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleMergeAction(CoreAdminHandler.java:384)
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 
 Regards,
 
 Haiying



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Reading Solr index

2014-04-07 Thread François Schiettecatte
Maybe you should try a more recent release of Luke:

https://github.com/DmitryKey/luke/releases

François

On Apr 7, 2014, at 12:27 PM, azhar2007 azhar2...@outlook.com wrote:

 Hi All,
 
 I have a solr index which is indexed ins Solr.4.7.0.
 
 Ive attempted to open the index with Luke4.0.0 and also other verisons with
 no luck.
 Gives me an error message.
 
 Is there a way of reading the data?
 
 I would like to convert the file to a readable format where i can see the
 terms it holds from the documents etc. 
 
 Please Help!!
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Reading-Solr-index-tp4129662.html
 Sent from the Solr - User mailing list archive at Nabble.com.



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: The word no in a query

2014-04-02 Thread François Schiettecatte
Have you looked at the debugging output?

http://wiki.apache.org/solr/CommonQueryParameters#Debugging

François

On Apr 2, 2014, at 1:37 AM, Bob Laferriere spongeb...@icloud.com wrote:

 
 I have built an commerce search engine. I am struggling with the word “no” in 
 queries. We have products that are “No Smoking Sign.” When the query is 
 “Smoking AND Sign” the product is found. If I query as “No AND Sign” I get no 
 results? I do not have no as a stop word. Any ideas why I would get zero 
 results back?
 
 Regards,
 
 Bob



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: AND not as a boolean operator in Phrase

2014-03-25 Thread François Schiettecatte
Better to user '+A +B' rather than AND/OR, see:

http://searchhub.org/2011/12/28/why-not-and-or-and-not/

François

On Mar 25, 2014, at 10:21 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 (2014/03/26 2:29), abhishek jain wrote:
 hi friends,
 
 when i search for A and B it gives me result for A , B , i am not sure
 why?
 
 Please guide how can i exact match when it is within phrase/quotes.
 
 Generally speaking (w/ LuceneQParser), if you want phrase match results,
 use quotes, i.e. q=A B. If you want results which contain both terms A
 and B, do not use quotes but boolean operator AND, i.e. q=A AND B.
 
 koji
 -- 
 http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Solr cores across multiple machines

2013-12-17 Thread François Schiettecatte
Hi

Why not copy the core directory instead of the data directory? The conf 
directory is very small and that would ensure that you don't get schema 
mismatch issues.

If you are stuck with copying the data directory, then I would replace the data 
directory in the target core and reload that core, though I would guess that 
YMMV given that this is probably not supported.

François

On Dec 17, 2013, at 1:35 AM, sivaprasad sivaprasa...@echidnainc.com wrote:

 Hi,
 
 In my project, we are doing full index on dedicated machine and the index
 will be copied to other search serving machine. For this, we are copying the
 data folder from indexing machine to serving machine manually. Now, we
 wanted to use Solr's SWAP configuration to do this job. Looks like the SWAP
 will work between the cores. Based on our setup, any one has any idea how to
 move the data from indexing machine to serving machine? Is there any other
 alternatives?
 
 Regards,
 Siva
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-cores-across-multiple-machines-tp4107035.html
 Sent from the Solr - User mailing list archive at Nabble.com.



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Stop/Restart Solr

2013-10-22 Thread François Schiettecatte
If you are on linux/unix, use the kill command.

François

On Oct 22, 2013, at 12:42 PM, Raheel Hasan raheelhasan@gmail.com wrote:

 Hi,
 
 is there a way to stop/restart java? I lost control over it via SSH and
 connection was closed. But the Solr (start.jar) is still running.
 
 thanks.
 
 -- 
 Regards,
 Raheel Hasan



Re: Stop/Restart Solr

2013-10-22 Thread François Schiettecatte
A few more specifics about the environment would help, Windows/Linux/...? 
Jetty/Tomcat/...?

François

On Oct 22, 2013, at 12:50 PM, Yago Riveiro yago.rive...@gmail.com wrote:

 If you are asking about if solr has a way to restart himself, I think that 
 the answer is no.
 
 If you lost control of the remote machine someone will need to go and restart 
 the machine ...
 
 You can try use a kvm or other remote control system
 
 --  
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
 On Tuesday, October 22, 2013 at 5:46 PM, François Schiettecatte wrote:
 
 If you are on linux/unix, use the kill command.
 
 François
 
 On Oct 22, 2013, at 12:42 PM, Raheel Hasan raheelhasan@gmail.com 
 (mailto:raheelhasan@gmail.com) wrote:
 
 Hi,
 
 is there a way to stop/restart java? I lost control over it via SSH and
 connection was closed. But the Solr (start.jar) is still running.
 
 thanks.
 
 --  
 Regards,
 Raheel Hasan
 
 
 
 
 
 



Re: Stop/Restart Solr

2013-10-22 Thread François Schiettecatte
Yago has the right command to search for the process, that will get you the 
process ID specifically the first number on the output line, then do 'kill 
###', if that fails 'kill -9 ###'.

François

On Oct 22, 2013, at 12:56 PM, Raheel Hasan raheelhasan@gmail.com wrote:

 its CentOS...
 
 and using jetty with solr here..
 
 
 On Tue, Oct 22, 2013 at 9:54 PM, François Schiettecatte 
 fschietteca...@gmail.com wrote:
 
 A few more specifics about the environment would help, Windows/Linux/...?
 Jetty/Tomcat/...?
 
 François
 
 On Oct 22, 2013, at 12:50 PM, Yago Riveiro yago.rive...@gmail.com wrote:
 
 If you are asking about if solr has a way to restart himself, I think
 that the answer is no.
 
 If you lost control of the remote machine someone will need to go and
 restart the machine ...
 
 You can try use a kvm or other remote control system
 
 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
 On Tuesday, October 22, 2013 at 5:46 PM, François Schiettecatte wrote:
 
 If you are on linux/unix, use the kill command.
 
 François
 
 On Oct 22, 2013, at 12:42 PM, Raheel Hasan 
 raheelhasan@gmail.com(mailto:
 raheelhasan@gmail.com) wrote:
 
 Hi,
 
 is there a way to stop/restart java? I lost control over it via SSH and
 connection was closed. But the Solr (start.jar) is still running.
 
 thanks.
 
 --
 Regards,
 Raheel Hasan
 
 
 
 
 
 
 
 
 
 
 -- 
 Regards,
 Raheel Hasan



Re: Solr timeout after reboot

2013-10-21 Thread François Schiettecatte
To put the file data into file system cache which would make for faster access.

François


On Oct 21, 2013, at 8:33 AM, michael.boom my_sky...@yahoo.com wrote:

 Hmm, no, I haven't...
 
 What would be the effect of this ?
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096809.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Exact Match Results

2013-10-21 Thread François Schiettecatte
Kumar

You might want to look into the 'pf' parameter:


https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

François

On Oct 21, 2013, at 9:24 AM, kumar pavan2...@gmail.com wrote:

 I am querying solr for exact match results. But it is showing some other
 results also.
 
 Examle :
 
 User Query String : 
 
 Okkadu telugu movie
 
 Results :
 
 1.Okkadu telugu movie
 2.Okkadunnadu telugu movie
 3.YuganikiOkkadu telugu movie
 4.Okkadu telugu movie stills
 
 
 how can we order these results that 4th result has to come second.
 
 
 Please anyone can you give me any idea?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Exact-Match-Results-tp4096816.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr timeout after reboot

2013-10-21 Thread François Schiettecatte
Well no, the OS is smarter than that, it manages file system cache along with 
other memory requirements. If applications need more memory then file system 
cache will likely be reduced. 

The command is a cheap trick to get the OS to fill the file system cache as 
quickly as possible, not sure how much it will help though with a 100GB index 
on a 15GB machine. This might work if you 'cat' the index files other than the 
'.fdx' and '.fdt' files.

François

On Oct 21, 2013, at 10:03 AM, michael.boom my_sky...@yahoo.com wrote:

 I'm using the m3.xlarge server with 15G RAM, but my index size is over 100G,
 so I guess putting running the above command would bite all available
 memory.
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096827.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Can I use app specific document id as the document id that Solr uses for internal purposes?

2013-10-06 Thread François Schiettecatte
Hi

The approach I take is to store enough data in the SOLR index to render the 
results page, and go to the database if the user want to view a document. 

Cheers

François

On Oct 6, 2013, at 9:45 AM, user 01 user...@gmail.com wrote:

 @Gora:
 you understood the schema correctly, but I can't believe it's strange but i
 think it is actually the recommended way.. you index your data but don't
 store in a Search engine, you store your actual data in DB, which is the
 right place for it. Data in SE should be just used for indexing. Isn't it ?
 
 @maephisto: ok, thanks!
 
 
 On Sun, Oct 6, 2013 at 6:07 PM, Gora Mohanty g...@mimirtech.com wrote:
 
 On 6 October 2013 16:36, Ertio Lew ertio...@gmail.com wrote:
 I meant that solr should not be thinking that it has to retrieve any
 thing
 further (as in any stored document data) after once it gets the doc id,
 so
 that one further look up for doc data is prevented.
 [...]
 
 If I understood your setup correctly, the doc ID is the only field
 in the Solr schema, and the only data stored in the Solr index.
 So there is no question of recovering any other data.
 
 Having said that, this is a strange setup and seems to defeat the
 whole purpose of a search engine. Maybe you could explain further
 as to what you are trying to achieve: What does storing only doc
 IDs in Solr gain you? You could as well get these from a database
 lookup  which it seems that you would be doing anyway.
 
 Regards,
 Gora
 



Re: setQuery in SolrJ

2013-09-02 Thread François Schiettecatte
Shouldn't the search be more like this if you are searching in the 
'descricaoRoteiro' field:

descricaoRoteiro:(BPS 8D BEACH*)

or in your example you have a space in between 'descricaoRoteiro' and 'BPS':

descricaoRoteiro:BPS 8D BEACH*

François


On Sep 2, 2013, at 8:08 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi,
 
 What's your default query field in solrconfig.xml?
 
 requestHandler name=/select class=solr.SearchHandler
 str name=df[WHAT IS IN HERE?]/str
 
 I think what's happening is that the query:
 
 (descricaoRoteiro: BPS 8D BEACH*)
 
 gets interpreted as:
 
 descricaoRoteiro:BPS (8D BEACH*)
 
 then on the (8D BEACH*) a default field name is applied.
 
 You can use debugQuery parameter to see how the query was parsed.
 
 HTH,
 Dmitry
 
 
 On Mon, Sep 2, 2013 at 2:53 PM, Sergio Stateri stat...@gmail.com wrote:
 
 hi,
 
 How can I looking for an exact phrase in query.setQuery method (SolrJ)?
 
 Like this:
 
 SolrQuery query = new SolrQuery();
 query.setQuery( (descricaoRoteiro: BPS 8D BEACH*) );
 query.set(start, 200);
 query.set(rows, 10);
 query.addField(descricaoRoteiro);
 QueryResponse rsp = server.query( query );
 
 
 When I run this code, the following exception is thrown:
 
 Exception in thread main
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: no
 field name specified in query and no default specified via 'df' param
 at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
 at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at
 
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
 at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
 at
 
 com.teste.SearchRoteirosFromCollection.extrairEApresentarResultados(SearchRoteirosFromCollection.java:65)
 ...
 
 
 But If I search one a word od put * between two words, the search works
 fine.
 
 
 Thanks in advance,
 
 
 --
 Sergio Stateri Jr.
 stat...@gmail.com
 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Mandatory words search in SOLR

2013-05-13 Thread François Schiettecatte
Kamal

You could also use the 'mm' parameter to require a minimum match, or you could 
prepend '+' to each required term.

Cheers

François


On May 13, 2013, at 7:57 AM, Kamal Palei palei.ka...@gmail.com wrote:

 Hi Rafał Kuć
 I added q.op=AND as per you suggested. I see though some initial record
 document contains both keywords (*java* and *mysql*), towards end I see
 still there are number of
 documents, they have only one key word either *java* or *mysql*.
 
 Is it the SOLR behaviour or can I ask for a *strict search only if all my
 keywords are present, then only* *fetch record* else not.
 
 BR,
 Kamal
 
 
 
 On Mon, May 13, 2013 at 4:02 PM, Rafał Kuć r@solr.pl wrote:
 
 Hello!
 
 Change  the  default  query  operator. For example add the q.op=AND to
 your query.
 
 --
 Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
 
 Hi SOLR Experts
 When I search documents with keyword as *java, mysql* then I get the
 documents containing either *java* or *mysql* or both.
 
 Is it possible to get the documents those contains both *java* and
 *mysql*.
 
 In that case, how the query would look like.
 
 Thanks a lot
 Kamal
 
 



Re: Indexing only on change

2012-11-24 Thread François Schiettecatte
I would create a hash of the document content and store that in SOLR along with 
any document info you wish to store. When a document is presented for indexing, 
hash that and compare to the hash of the stored document, index if they are 
different and skip if they are not.

François
 

On Nov 24, 2012, at 3:30 PM, Pratyul Kapoor praty...@gmail.com wrote:

 Hi,
 
 I just discovered that solr while editing a particular field of a document,
 removes the entire document and recreates.
 
 I have a list of 1000s of documents to be indexed. But I am aware that only
 some of those documents would be changed and rest all would already be
 there. Is there any way, I can check whether the incoming and already
 existing document is same, and there is no need of indexing it again.
 
 Pratyul



Re: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread François Schiettecatte
John

You can still use leading wildcards even if you dont have the 
ReversedWildcardFilterFactory in your analysis but it means you will be 
scanning the entire dictionary when the search is run which can be a 
performance issue. If you do use ReversedWildcardFilterFactory you wont have 
that performance issue but you will increase the overall size of your index. 
Its a tradeoff. 

When I looked into it for a site I built I decided that the tradeoff was not 
worth it (after benchmarking) given how few leading wildcards searches it was 
getting.

Best regards

François


On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote:

 
 
 Hi,
 
 
 I'm migrating from Solr 1.2 to 3.6.1.  I used the same analyzer as I was, and 
 re-indexed my data.  I did not add 
 solr.ReversedWildcardFilterFactory to my index analyzer, but yet leading wild 
 cards are working!!  Does this mean it's turned on by default?  If so, how do 
 I turn it off, and what are the implication of leaving ON?  Won't my searches 
 be slower and consume more memory?
 
 
 Thanks,
 
 
 --MJ
 



Re: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread François Schiettecatte
I suspect it is just part of the wildcard handling, maybe someone can chime in 
here, you may need to catch this before it gets to SOLR.

François

On Nov 12, 2012, at 5:44 PM, johnmu...@aol.com wrote:

 Thanks for the quick response.
 
 
 So, I do not want to use ReversedWildcardFilterFactory, but leading wildcard 
 is working and thus is ON by default.  How do I disable it to prevent the use 
 of it and the issues that come with it?
 
 
 -- MJ
 
 
 
 -Original Message-
 From: François Schiettecat
 te fschietteca...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Nov 12, 2012 5:39 pm
 Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1?
 
 
 John
 
 You can still use leading wildcards even if you dont have the 
 ReversedWildcardFilterFactory in your analysis but it means you will be 
 scanning 
 the entire dictionary when the search is run which can be a performance 
 issue. 
 If you do use ReversedWildcardFilterFactory you wont have that performance 
 issue 
 but you will increase the overall size of your index. Its a tradeoff. 
 
 When I looked into it for a site I built I decided that the tradeoff was not 
 worth it (after benchmarking) given how few leading wildcards searches it was 
 getting.
 
 Best regards
 
 François
 
 
 On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote:
 
 
 
 Hi,
 
 
 I'm migrating from Solr 1.2 to 3.6.1.  I used the same analyzer as I was, 
 and 
 re-indexed my data.  I did not add 
 solr.ReversedWildcardFilterFactory to my index analyzer, but yet leading 
 wild 
 cards are working!!  Does this mean it's turned on by default?  If so, how do 
 I 
 turn it off, and what are the implication of leaving ON?  Won't my searches 
 be 
 slower and consume more memory?
 
 
 Thanks,
 
 
 --MJ
 
 
 
 
 



Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)

2012-10-24 Thread François Schiettecatte
Aaron

The best way to make sure the index is cached by the OS is to just cat it on 
startup:

cat `find /path/to/solr/index`  /dev/null

Just make sure your index is smaller than RAM otherwise data will be rotated 
out.

Memory mapping is built on the virtual memory system, and I suspect that ramfs 
is too, so I doubt very much that copying your index to ramfs will help at all. 
Sidebar - a while ago I did a bunch of testing copying indices to shared memory 
(/dev/shm in this case) and there was no advantage compared to just accessing 
indices on disc when using memory mapping once the system got to a steady state.

There has been a lot written about this topic on the list. Basically it come 
down to using MMapDirectory (which is the default), make sure your index is 
smaller than your RAM, and allocate just enough memory to the Java VM. That 
last part requires some benchmarking because it is so workload dependent.

Best regards

François

On Oct 24, 2012, at 8:29 PM, Aaron Daubman daub...@gmail.com wrote:

 Greetings,
 
 Most times I've seen the topic of storing one's index in memory, it
 seems the asker was referring (or understood to be referring) to the
 (in)famous not intended to work with huge indexes Solr RAMDirectory.
 
 Let me be clear that that I am not interested in RAMDirectory.
 However, I would like to better understand the oft-recommended and
 currently-default MMapDirectory, and what the tradeoffs would be, when
 using a 64-bit linux server dedicated to this single solr instance,
 with plenty (more than 2x index size) of RAM, of storing the index
 files on SSDs versus on a ramfs mount.
 
 I understand that using the default MMapDirectory will allow caching
 of the index in-memory, however, my understanding is that mmaped files
 are demand-paged (lazy evaluated), meaning that only after a block is
 read from disk will it be paged into memory - is this correct? is it
 actually block-by-block (page size by page size?) - any pointers to
 decent documentation on this regardless of the effectiveness of the
 approach would be appreciated...
 
 My concern with using MMapDirectory for an index stored on disk (even
 SSDs), if my understanding is correct, is that there is still a large
 startup cost to MMapDirectory, as it may take many queries before even
 most of a 20G index has been loaded into memory, and there may yet
 still be dark corners that only come up in edge-case queries that
 cause QTime spikes should these queries ever occur.
 
 I would like to ensure that, at startup, no query will incur
 disk-seek/read penalties.
 
 Is the right way to achieve this to copy the index to a ramfs (NOT
 ramdisk) mount and then continue to use MMapDirectory in Solr to read
 the index? I am under the impression that when using ramfs (rather
 than ramdisk, for which this would not work) a file mmaped on a ramfs
 mount will actually share the same address space, and so would not
 incur the typical double-ram overhead of mmaping a file in memory just
 o have yet another copy of the file created in a second memory
 location. Is this correct? If not, would you please point me to
 documentation stating otherwise (I haven't found much documentation
 either way).
 
 Finally, given the desire to be quick at startup with a large index
 that will still easily fit within a system's memory, am I thinking
 about this wrong or are there other better approaches?
 
 Thanks, as always,
 Aaron



Re: The way to customize ranking?

2012-08-23 Thread François Schiettecatte
I would create two indices, one with your content and one with your ads. This 
approach would allow you to precisely control how many ads you pull back and 
how you merge them into the results, and you would be able to control schemas, 
boosting, defaults fields, etc for each index independently. 

Best regards

François

On Aug 23, 2012, at 11:45 AM, Nicholas Ding nicholas...@gmail.com wrote:

 Thank you, but I don't want to filter those ads.
 
 For example, when user make a search like q=Car
 Result list:
 1. Ford Automobile (score 10)
 2. Honda Civic (score 9)
 ...
 ...
 ...
 99. Paid Ads (score 1, Ad has own field to identify it's an Ad)
 
 What I want to find is a way to make the score of Paid Ads higher than
 Ford Automobile. Basically, the result structure will look like
 
 - [Paid Ads Section]
[Most valuable Ads 1]
[Most valuable Ads 2]
[Less valuable Ads 1]
[Less valuable Ads 2]
 - [Relevant Results Section]
 
 
 On Thu, Aug 23, 2012 at 11:33 AM, Karthick Duraisamy Soundararaj 
 karthick.soundara...@gmail.com wrote:
 
 Hi
 You might add an int  field Search Rule that identifies the type of
 search.
 example
Search Rule  Description
 0  Unpaid Search
 1  Paid Search - Rule
 1
 2  Paid Serch - Rule 2
 
 You can use filterqueries (
 http://wiki.apache.org/solr/CommonQueryParameters)
 like fq:  Search Rule :[1 TO *]
 
 Alternatively, You can even use a boolean field to identify whether or not
 a search is paid and then an addtitional field that identifies the type of
 paid search.
 
 --
 karthick
 
 On Thu, Aug 23, 2012 at 11:16 AM, Nicholas Ding nicholas...@gmail.com
 wrote:
 
 Hi
 
 I'm working on Solr to build a local business search in China. We have a
 special requirement from advertiser. When user makes a search, if the
 results contain paid advertisements, those ads need to be moved on the
 top
 of results. For different ads, they have detailed rules about which comes
 first.
 
 Could anyone offer me some suggestions how I customize the ranking based
 on
 my requirement?
 
 Thanks
 Nicholas
 
 



Re: recommended SSD

2012-08-23 Thread François Schiettecatte
You should check this at pcper.com:

http://pcper.com/ssd-decoder

http://pcper.com/content/SSD-Decoder-popup

Specs for a wide range of SSDs.

Best regards

François


On Aug 23, 2012, at 5:35 PM, Peyman Faratin pey...@robustlinks.com wrote:

 Hi
 
 Is there a SSD brand and spec that the community recommends for an index of 
 size 56G with mostly reads? We are evaluating this one
 
 http://www.newegg.com/Product/Product.aspx?Item=N82E16820227706
 
 thank you
 
 Peyman
 
 



Re: Can't find solr.xml

2012-07-11 Thread François Schiettecatte
On Jul 11, 2012, at 2:52 PM, Shawn Heisey wrote:

 On 7/2/2012 2:33 AM, Nabeel Sulieman wrote:
 Argh! (and hooray!)
 
 I started from scratch again, following the wiki instructions. I did only
 one thing differently; put my data directory in /opt instead of /home/dev.
 And now it works!
 
 I'm glad it's working now. I just wish I knew exactly what the difference
 is. The directory in /opt has exactly the same permissions as the one in
 /home/dev (chown -R tomcat solr).
 
 This could be selinux.  I tend to disable it, as configuring it for proper 
 operation with custom software can be tricky.  If this is the problem, there 
 will hopefully be a record of the denial in one of the files in /var/log.  
 CentOS has selinux enabled by default.
 
 In case you don't know how to turn it off: in /etc/selinux/config, set 
 SELINUX=disabled and reboot.  There may be a way to disable it without 
 rebooting, but I've found that to be the path of least resistance.
 
 Thanks,
 Shawn
 


You can temporarily disable selinux until the next reboot with this:

echo 0  /selinux/enforce

Cheers

François




Re: difference between stored=false and stored=true ?

2012-06-30 Thread François Schiettecatte
Giovanni

stored=true means the data is stored in the index and can be returned with 
the search results (see the 'fl' parameter). This is independent of indexed=..

Which means that you can store but not index a field:

indexed=false stored=true

Best regards

François

On Jun 30, 2012, at 9:57 AM, Giovanni Gherdovich wrote:

 Hi all,
 
 when declaring a field in the schema.xml file you can
 set the attributes 'indexed' and 'stored' to true or false.
 
 What is the difference between a indexed=true stored=false
 and a indexed=true stored=true?
 
 I guess understanding this would require me to have
 a closer look to lucene's index data structures;
 what's the pointer to some doc I can read?
 
 Cheers,
 GGhh



Re: Indexation Speed?

2012-06-19 Thread François Schiettecatte
Just a suggestion, you might want to monitor CPU usage and disk I/O, there 
might be a bottleneck.

Cheers

François

On Jun 19, 2012, at 7:07 AM, Bruno Mannina wrote:

 Actually -Xmx512m and no effect
 
 Concerning  maxFieldLength, no problem it's commented
 
 Le 19/06/2012 13:02, Erick Erickson a écrit :
 Then try -Xmx600M
 next try -Xmx900M
 
 
 etc. The idea is to bump things on separate runs.
 
 But be a little cautious here. Look in your solrconfig.xml file, you'll see
 a commented-out line
 maxFieldLength1/maxFieldLength
 
 The default behavior for Solr/Lucene is to index the first 10,000 tokens
 (not characters, think of tokens as words for not) in each
 document and throw the rest on the floor. At the sizes you're talking about,
 that's probably not a problem, but do be aware of it.
 
 Best
 Erick
 
 On Tue, Jun 19, 2012 at 5:44 AM, Bruno Manninabmann...@free.fr  wrote:
 Like that?
 
 java -Xmx300m -jar post.jar myfile.xml
 
 
 
 Le 19/06/2012 11:11, Lance Norskog a écrit :
 
 Ah! Java memory size is a java command line option:
 
 http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html
 
 You would try increasing the memory size in stages up to maybe 300m.
 
 On Tue, Jun 19, 2012 at 2:04 AM, Bruno Manninabmann...@free.frwrote:
 
 Le 19/06/2012 10:51, Lance Norskog a écrit :
 
 675 doc/s is respectable for that server. You might move the memory
 allocated to Java up and down- there is a balance between amount of
 memory in Java v.s. the OS disk buffer.
 
 How can I do that ? is there an option during my command line or in a
 config
 file?
 sorry for this newbie question :(
 
 
 And, of course, use the latest trunk.
 Solr 3.6
 
 
 On Tue, Jun 19, 2012 at 12:10 AM, Bruno Manninabmann...@free.fr
  wrote:
 Correction: file size is 40 Mo !!!
 
 Le 19/06/2012 09:09, Bruno Mannina a écrit :
 
 Dear All,
 
 I would like to know if the indexation speed is right.
 
 I have a 40Go file size with around 27 000 docs inside.
 I index around 20 fields,
 
 My (old) test server is a DualCore 3.06GHz Intel Xeon with only 1Go
 Ram
 
 The file takes 40 seconds with the command line:
 java -jar post.jar myfile.xml
 
 Could I increase this speed or reduce this time?
 
 Thanks a lot,
 PS: Newbie user
 
 
 
 
 



Re: Indexation Speed?

2012-06-19 Thread François Schiettecatte
Well that depends on the platform you are on, you did not mention that.

If you are using linux, you could use atop ( http://www.atoptool.nl/ ), or top, 
or  iostat or stat, or all four.

Cheers

François

On Jun 19, 2012, at 8:55 AM, Bruno Mannina wrote:

 CPU is not used, just 50-60% sometimes during the process but How can I check 
 IO HDD ?
 
 Le 19/06/2012 14:13, François Schiettecatte a écrit :
 Just a suggestion, you might want to monitor CPU usage and disk I/O, there 
 might be a bottleneck.
 
 Cheers
 
 François
 
 On Jun 19, 2012, at 7:07 AM, Bruno Mannina wrote:
 
 Actually -Xmx512m and no effect
 
 Concerning  maxFieldLength, no problem it's commented
 
 Le 19/06/2012 13:02, Erick Erickson a écrit :
 Then try -Xmx600M
 next try -Xmx900M
 
 
 etc. The idea is to bump things on separate runs.
 
 But be a little cautious here. Look in your solrconfig.xml file, you'll see
 a commented-out line
 maxFieldLength1/maxFieldLength
 
 The default behavior for Solr/Lucene is to index the first 10,000 tokens
 (not characters, think of tokens as words for not) in each
 document and throw the rest on the floor. At the sizes you're talking 
 about,
 that's probably not a problem, but do be aware of it.
 
 Best
 Erick
 
 On Tue, Jun 19, 2012 at 5:44 AM, Bruno Manninabmann...@free.fr   wrote:
 Like that?
 
 java -Xmx300m -jar post.jar myfile.xml
 
 
 
 Le 19/06/2012 11:11, Lance Norskog a écrit :
 
 Ah! Java memory size is a java command line option:
 
 http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html
 
 You would try increasing the memory size in stages up to maybe 300m.
 
 On Tue, Jun 19, 2012 at 2:04 AM, Bruno Manninabmann...@free.fr 
 wrote:
 Le 19/06/2012 10:51, Lance Norskog a écrit :
 
 675 doc/s is respectable for that server. You might move the memory
 allocated to Java up and down- there is a balance between amount of
 memory in Java v.s. the OS disk buffer.
 How can I do that ? is there an option during my command line or in a
 config
 file?
 sorry for this newbie question :(
 
 
 And, of course, use the latest trunk.
 Solr 3.6
 
 
 On Tue, Jun 19, 2012 at 12:10 AM, Bruno Manninabmann...@free.fr
  wrote:
 Correction: file size is 40 Mo !!!
 
 Le 19/06/2012 09:09, Bruno Mannina a écrit :
 
 Dear All,
 
 I would like to know if the indexation speed is right.
 
 I have a 40Go file size with around 27 000 docs inside.
 I index around 20 fields,
 
 My (old) test server is a DualCore 3.06GHz Intel Xeon with only 1Go
 Ram
 
 The file takes 40 seconds with the command line:
 java -jar post.jar myfile.xml
 
 Could I increase this speed or reduce this time?
 
 Thanks a lot,
 PS: Newbie user
 
 
 
 
 



Re: Indexation Speed?

2012-06-19 Thread François Schiettecatte
There is a lot of good information about that on the web, just google for 
'ubuntu performance monitor'

Also the ubuntu website has a pretty good help section:

https://help.ubuntu.com/

and a community wiki:

https://help.ubuntu.com/community

Cheers

François

On Jun 19, 2012, at 9:03 AM, Bruno Mannina wrote:

 Linux Ubuntu :) since 2 months ! so I'm a new in this world :)
 
 Le 19/06/2012 15:01, François Schiettecatte a écrit :
 Well that depends on the platform you are on, you did not mention that.
 
 If you are using linux, you could use atop ( http://www.atoptool.nl/ ), or 
 top, or  iostat or stat, or all four.
 
 Cheers
 
 François
 
 On Jun 19, 2012, at 8:55 AM, Bruno Mannina wrote:
 
 CPU is not used, just 50-60% sometimes during the process but How can I 
 check IO HDD ?
 
 Le 19/06/2012 14:13, François Schiettecatte a écrit :
 Just a suggestion, you might want to monitor CPU usage and disk I/O, there 
 might be a bottleneck.
 
 Cheers
 
 François
 
 On Jun 19, 2012, at 7:07 AM, Bruno Mannina wrote:
 
 Actually -Xmx512m and no effect
 
 Concerning  maxFieldLength, no problem it's commented
 
 Le 19/06/2012 13:02, Erick Erickson a écrit :
 Then try -Xmx600M
 next try -Xmx900M
 
 
 etc. The idea is to bump things on separate runs.
 
 But be a little cautious here. Look in your solrconfig.xml file, you'll 
 see
 a commented-out line
 maxFieldLength1/maxFieldLength
 
 The default behavior for Solr/Lucene is to index the first 10,000 tokens
 (not characters, think of tokens as words for not) in each
 document and throw the rest on the floor. At the sizes you're talking 
 about,
 that's probably not a problem, but do be aware of it.
 
 Best
 Erick
 
 On Tue, Jun 19, 2012 at 5:44 AM, Bruno Manninabmann...@free.fr
 wrote:
 Like that?
 
 java -Xmx300m -jar post.jar myfile.xml
 
 
 
 Le 19/06/2012 11:11, Lance Norskog a écrit :
 
 Ah! Java memory size is a java command line option:
 
 http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html
 
 You would try increasing the memory size in stages up to maybe 300m.
 
 On Tue, Jun 19, 2012 at 2:04 AM, Bruno Manninabmann...@free.fr  
 wrote:
 Le 19/06/2012 10:51, Lance Norskog a écrit :
 
 675 doc/s is respectable for that server. You might move the memory
 allocated to Java up and down- there is a balance between amount of
 memory in Java v.s. the OS disk buffer.
 How can I do that ? is there an option during my command line or in a
 config
 file?
 sorry for this newbie question :(
 
 
 And, of course, use the latest trunk.
 Solr 3.6
 
 
 On Tue, Jun 19, 2012 at 12:10 AM, Bruno Manninabmann...@free.fr
  wrote:
 Correction: file size is 40 Mo !!!
 
 Le 19/06/2012 09:09, Bruno Mannina a écrit :
 
 Dear All,
 
 I would like to know if the indexation speed is right.
 
 I have a 40Go file size with around 27 000 docs inside.
 I index around 20 fields,
 
 My (old) test server is a DualCore 3.06GHz Intel Xeon with only 1Go
 Ram
 
 The file takes 40 seconds with the command line:
 java -jar post.jar myfile.xml
 
 Could I increase this speed or reduce this time?
 
 Thanks a lot,
 PS: Newbie user
 
 
 
 
 
 



Re: Solr out of memory exception

2012-03-15 Thread François Schiettecatte
FWIW it looks like this feature has been enabled by default since JDK 6 Update 
23:


http://blog.juma.me.uk/2008/10/14/32-bit-or-64-bit-jvm-how-about-a-hybrid/

François

On Mar 15, 2012, at 6:39 AM, Husain, Yavar wrote:

 Thanks a ton.
 
 From: Li Li [fancye...@gmail.com]
 Sent: Thursday, March 15, 2012 12:11 PM
 To: Husain, Yavar
 Cc: solr-user@lucene.apache.org
 Subject: Re: Solr out of memory exception
 
 it seems you are using 64bit jvm(32bit jvm can only allocate about 1.5GB). 
 you should enable pointer compression by -XX:+UseCompressedOops
 
 On Thu, Mar 15, 2012 at 1:58 PM, Husain, Yavar 
 yhus...@firstam.commailto:yhus...@firstam.com wrote:
 Thanks for helping me out.
 
 I have allocated Xms-2.0GB Xmx-2.0GB
 
 However i see Tomcat is still using pretty less memory and not 2.0G
 
 Total Memory on my Windows Machine = 4GB.
 
 With smaller index size it is working perfectly fine. I was thinking of 
 increasing the system RAM  tomcat heap space allocated but then how come on 
 a different server with exactly same system and solr configuration  memory 
 it is working fine?
 
 
 -Original Message-
 From: Li Li [mailto:fancye...@gmail.commailto:fancye...@gmail.com]
 Sent: Thursday, March 15, 2012 11:11 AM
 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
 Subject: Re: Solr out of memory exception
 
 how many memory are allocated to JVM?
 
 On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar 
 yhus...@firstam.commailto:yhus...@firstam.com wrote:
 
 Solr is giving out of memory exception. Full Indexing was completed fine.
 Later while searching maybe when it tries to load the results in memory it
 starts giving this exception. Though with the same memory allocated to
 Tomcat and exactly same solr replica on another server it is working
 perfectly fine. I am working on 64 bit software's including Java  Tomcat
 on Windows.
 Any help would be appreciated.
 
 Here are the logs:
 
 The server encountered an internal error (Severe errors in solr
 configuration. Check your log files for more detailed information on what
 may be wrong. If you want solr to continue after configuration errors,
 change: abortOnConfigurationErrorfalse/abortOnConfigurationError in
 null -
 java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at
 org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at
 org.apache.solr.core.SolrCore.init(SolrCore.java:579) at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
 at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
 at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
 at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at
 org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943) at
 org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778) at
 org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504) at
 org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
 at
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at
 org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at
 org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at
 org.apache.catalina.core.StandardService.start(StandardService.java:525) at
 org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at
 org.apache.catalina.startup.Catalina.start(Catalina.java:595) at
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
 sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at
 java.lang.reflect.Method.invoke(Unknown Source) at
 org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at
 org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by:
 java.lang.OutOfMemoryError: Java heap space at
 org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:180)
 at org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:91)
 at
 

Re: Development inside or outside of Solr?

2012-02-20 Thread François Schiettecatte
You could take a look at this:

http://www.let.rug.nl/vannoord/TextCat/

Will probably require some work to integrate/implement through

François

On Feb 20, 2012, at 3:37 AM, bing wrote:

 I have looked into the TikaCLI with -language option, and learned that Tika
 can output only the language metadata. It cannot help me to solve my problem
 though, as my main concern is whether to change Solr or not.  Thank you all
 the same. 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3760131.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr logging

2012-02-20 Thread François Schiettecatte
Ola

Here is what I have for this:


##
#
# Log4J configuration for SOLR
#
#   http://wiki.apache.org/solr/SolrLogging
#
#
# 1) Download LOG4J:
#   http://logging.apache.org/log4j/1.2/
#   http://logging.apache.org/log4j/1.2/download.html
#   
http://www.apache.org/dyn/closer.cgi/logging/log4j/1.2.16/apache-log4j-1.2.16.tar.gz
#   
http://newverhost.com/pub//logging/log4j/1.2.16/apache-log4j-1.2.16.tar.gz
#
# 2) Download SLF4J:
#   http://www.slf4j.org/
#   http://www.slf4j.org/download.html
#   http://www.slf4j.org/dist/slf4j-1.6.4.tar.gz
#
# 3) Unpack Solr:
#   jar xvf apache-solr-3.5.0.war
#
# 4) Delete:
#   WEB-INF/lib/log4j-over-slf4j-1.6.4.jar
#   WEB-INF/lib/slf4j-jdk14-1.6.4.jar
#
# 5) Copy:
#   apache-log4j-1.2.16/log4j-1.2.16.jar-  WEB-INF/lib
#   slf4j-1.6.4/slf4j-log4j12-1.6.4.jar -  WEB-INF/lib
#   log4j.properties (this file)-  WEB-INF/classes/ (needs 
to be created)
#
# 6) Pack Solr:
#   jar cvf apache-solr-3.4.0-omim.war admin favicon.ico index.jsp META-INF 
WEB-INF
#
#
#   Author: Francois Schiettecatte
#   Version:1.0
#
##



##
#
# Logging levels (helpful reminder)
#
# DEBUG  INFO  WARN  ERROR  FATAL
#



##
#
# Logging setup
#

log4j.rootLogger=WARN, SOLR


# Daily Rolling File Appender (SOLR)
log4j.appender.SOLR=org.apache.log4j.DailyRollingFileAppender
log4j.appender.SOLR.File=${catalina.base}/logs/solr.log
log4j.appender.SOLR.Append=true
log4j.appender.SOLR.Encoding=UTF-8
log4j.appender.SOLR.DatePattern='-'-MM-dd
log4j.appender.SOLR.layout=org.apache.log4j.PatternLayout
log4j.appender.SOLR.layout.ConversionPattern=%d [%t] %-5p %c - %m%n



##
#
# Logging levels for SOLR
#

# Default logging level
log4j.logger.org.apache.solr=WARN



##



On Feb 20, 2012, at 5:15 AM, ola nowak wrote:

 Yep. I suppose it is. But I have several applications installed on
 glassfish and I want each one of them to write into separate file. And Your
 solution with this jvm option was redirecting all messages from all apps to
 one file. Does anyone knows how to accomplish that?
 
 
 On Mon, Feb 20, 2012 at 11:09 AM, darul daru...@gmail.com wrote:
 
 Hmm, I did not try to achieve this but interested if you find a way...
 
 After I believe than having log4j config file outside war archive is a
 better solution, if you may need to update its content for example.
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760322.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread François Schiettecatte
Have you tried checking any logs?

Have you tried identifying a file which did not make it in and submitting just 
that one and seeing what happens?

François

On Feb 9, 2012, at 10:37 AM, Rong Kang wrote:

 
 Yes, I put all file in one directory and I have tested file names using 
 code.  
 
 
 
 
 At 2012-02-09 20:45:49,Jan Høydahl jan@cominvent.com wrote:
 Hi,
 
 Are you 100% sure that the filename is globally unique, since you use it as 
 the uniqueKey?
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 On 9. feb. 2012, at 08:30, 荣康 wrote:
 
 Hey ,
 I am using solr as my search engine to search my pdf files. I have 18219 
 files(different file names) and all the files are in one same directory。But 
 when I use solr to import the files into index using Dataimport method, 
 solr report only import 17233 files. It's very strange. This problem has 
 stoped out project for a few days. I can't handle it.
 
 
 please help me!
 
 
 Schema.xml
 
 
 fields
  field name=text type=text indexed=true multiValued=true 
 termVectors=true termPositions=true termOffsets=true/
  field name=filename type=filenametext indexed=true required=true 
 termVectors=true termPositions=true termOffsets=true/
  field name=id type=string stored=true/ 
 /fields
 uniqueKeyid/uniqueKey 
 copyField source=filename dest=text/
 
 
 and 
 dataConfig 
   dataSource type=BinFileDataSource name=bin/ 
 document 
 entity name=f processor=FileListEntityProcessor recursive=true 
 rootEntity=false 
 dataSource=null  baseDir=H:/pdf/cls_1_16800_OCRed/1 
 fileName=.*\.(PDF)|(pdf)|(Pdf)|(pDf)|(pdF)|(PDf)|(PdF)|(pDF) 
 onError=skip 
 
 
 entity name=tika-test processor=TikaEntityProcessor 
 url=${f.fileAbsolutePath} format=text dataSource=bin onError=skip
   field column=text name=text/  
 /entity 
 field column=file name=id/
 field column=file name=filename/ 
 /entity 
   /document 
 /dataConfig 
 
 
 
 
 sincerecly
 Rong Kang
 
 
 
 



Re: Using UUID for uniqueId

2012-02-08 Thread François Schiettecatte
Anderson

I would say that this is highly unlikely, but you would need to pay attention 
to how they are generated, this would be a good place to start:

http://en.wikipedia.org/wiki/Universally_unique_identifier

Cheers

François

On Feb 8, 2012, at 1:31 PM, Anderson vasconcelos wrote:

 HI all
 
 If i use the UUID like a uniqueId in the future if i break my index in
 shards, i will have problems? The UUID generation could generate the same
 UUID in differents machines?
 
 Thanks



Re: Question on Reverse Indexing

2012-01-17 Thread François Schiettecatte
Using ReversedWildcardFilterFactory will double the size of your dictionary 
(more or less), maybe the drop in performance that you are seeing is a result 
of that?

François

On Jan 17, 2012, at 9:01 PM, Shyam Bhaskaran wrote:

 Hi,
 
 For reverse indexing we are using the ReversedWildcardFilterFactory on Solr 
 4.0
 
 
 filter class=solr.ReversedWildcardFilterFactory withOriginal=true
 
 maxPosAsterisk=3 maxPosQuestion=2 maxFractionAsterisk=0.33/
 
 
 ReversedWildcardFilterFactory was helping us to perform leading wild card 
 searches like *lock.
 
 But it was observed that the performance of the searches was not good after 
 introducing ReversedWildcardFilterFactory filter.
 
 Hence we disabled ReversedWildcardFilterFactory filter and re-created the 
 indexes and this time we found the performance of Solr query to be faster.
 
 But surprisingly it is observed that leading wild card searches were still 
 working inspite of disabling the ReversedWildcardFilterFactory filter.
 
 
 This behavior is puzzling everyone and wanted to know how this behavior of 
 reverse indexing works?
 
 Can anyone share with me on this Solr behavior.
 
 -Shyam
 



Re: best query for one-box search string over multiple types fields?

2012-01-15 Thread François Schiettecatte
Johnny 

What you are going to want to do is boost the artist field with respect to the 
others, for example using edismax my 'qf' parameter is:

number^5 title^3 default

so hits in the number field get a five-fold boost and hits in the title field 
get a three-fold boost. In your case you might want to start with:

artist^5 album^3 song

Getting these parameters right will take a little work, and I would suggest you 
build a set of searches with known results so you can quickly check the effect 
of any tweaks you do.

Useful reading would include:

http://wiki.apache.org/solr/SolrRelevancyFAQ

http://wiki.apache.org/solr/SolrRelevancyCookbook


http://www.lucidimagination.com/blog/2011/12/14/options-to-tune-document’s-relevance-in-solr/


http://www.lucidimagination.com/blog/2011/03/10/solr-relevancy-function-queries/

Cheers

François


On Jan 15, 2012, at 1:19 AM, Johnny Marnell wrote:

 hi all,
 
 short of it: i want queen bohemian rhapsody to return that song named
 Bohemian Rhapsody by the artist named Queen, rather than songs with
 titles like Bohemian Rhapsody (Queen Cover).
 
 i'm indexing a catalog of music with these types of docs and their fields:
 
 artist (artistName), album (albumName, artistName), and song (songName,
 albumName, artistName).
 
 the client is one search box, and i'm having trouble handling searching
 over multiple multifields and weighting their exactness.  when a user types
 queen, i want the artist Queen to be the first hit, and then albums 
 songs titled queen.
 
 if queen bohemian rhapsody is searched, i want to return that song, but
 instead i'm getting songs like Bohemian Rhapsody (Queen Cover) by Stupid
 Queen Tribute Band because all three terms are in the songName, i'm
 guessing.  what kind of query do i need?
 
 i'm indexing all of these fields as multi-fields with ngram, shingle (i
 think this might be really useful for my use case?), keyword, and standard.
 that appears to be working, but i'm not sure how to combine all of this
 together over multiple multi-fields.
 
 if anyone has good links to broadly summarized use cases of Indexing and
 Querying, that would be great - i would think this would be a common
 situation but i can't find any good resources on the web.  and i'm having
 trouble understanding scoring and boosting.
 
 this was my first post, hope i did it right, thanks so much!
 
 -j



Re: Doing url search in solr is slow

2012-01-09 Thread François Schiettecatte
About the search 'referal_url:*www.someurl.com*', having a wildcard at the 
start will cause a dictionary scan for every term you search on unless you use 
ReversedWildcardFilterFactory. That could be the cause of your slowdown if you 
are I/O bound, and even if you are CPU bound for that matter.

François


On Jan 8, 2012, at 8:44 PM, yu shen wrote:

 Hi,
 
 My solr document has up to 20 fields, containing data from product name,
 date, url etc.
 
 The volume of documents is around 1.5m.
 
 My symptom is when doing url search like [ url:*www.someurl.com*
 referal_url:*www.someurl.com* page_url:*www.someurl.com*] will get a
 extraordinary long response time, while search against all other fields,
 the response time will be normal.
 
 Can anyone share any insights on this?
 
 Spark



Re: Shutdown hook issue

2011-12-14 Thread François Schiettecatte
I am not an expert on this but the oom-killer will kill off the process 
consuming the greatest amount of memory if the machine runs out of memory, and 
you should see something to that effect in the system log, /var/log/messages I 
think.

François

On Dec 14, 2011, at 2:54 PM, Adolfo Castro Menna wrote:

 I think I found the issue. The ubuntu server is running OOM-Killer which
 might be sending a SIGINT to the java process, probably because of memory
 consumption.
 
 Thanks,
 Adolfo.
 
 On Wed, Dec 14, 2011 at 12:44 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:
 
 Hi,
 
 Solr won't shut down by itself just because it's idle. :)
 You could run it with debugger attached and breakpoint set in the shutdown
 hook you are talking about and see what calls it.
 
 Otis
 
 
 Performance Monitoring SaaS for Solr -
 http://sematext.com/spm/solr-performance-monitoring/index.html
 
 
 
 
 
 From: Adolfo Castro Menna adolfo.castrome...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, December 14, 2011 8:17 AM
 Subject: Shutdown hook issue
 
 Hi All,
 
 I'm experiencing some issues with solr. From time to time solr goes down.
 After checking the logs, I see that it's due to the shutdown hook being
 triggered.
 I still don't know why it happens but it seems to be related to solr being
 idle. Does anyone have any insights?
 
 I'm using Ubuntu 10.04.2 LTS and solr 3.1.0 running on Jetty (default
 configuration). Solr runs in background, so it doesn't seem to be related
 to a SIGINT unless ubuntu is sending it for some odd reason.
 
 Thanks,
 Adolfo.
 
 
 
 



Re: Don't snowball depending on terms

2011-11-29 Thread François Schiettecatte
It won't and depending on how your analyzer is set up the terms are most likely 
stemmed at index time.

You could create a separate field for unstemmed terms though, or use a less 
aggressive stemmer such as EnglishMinimalStemFilterFactory.

François

On Nov 29, 2011, at 12:33 PM, Robert Brown wrote:

 Is it possible to search a field but not be affected by the snowball filter?
 
 ie, searching for manage is matching management, but a user may want to 
 restrict results to only containing manage.
 
 I was hoping that simply quoting the term would do this, but it doesn't 
 appear to make any difference.
 
 
 
 
 --
 
 IntelCompute
 Web Design  Local Online Marketing
 
 http://www.intelcompute.com
 



Re: how index words with their perfix in solr?

2011-11-29 Thread François Schiettecatte
You might try the snowball stemmer too, I am not sure how closely that will fit 
your requirements though.

Alternatively you could use synonyms.

François

On Nov 29, 2011, at 1:08 AM, mina wrote:

 thank you for your answer.i read it and i use this filter in my schema.xml in
 solr:
 
 filter class=solr.PorterStemFilterFactory/
 
 but this filter doesn't understand all words with their suffix and prefix.
 this means when i search 'rain' solr doesn't show me any document that have
 'rainy'.
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-index-words-with-their-perfix-in-solr-tp3542300p3544319.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: how index words with their perfix in solr?

2011-11-28 Thread François Schiettecatte
It looks like you are using the plural stemmer, you might want to look into 
using the Porter stemmer instead:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming

François

On Nov 28, 2011, at 9:14 AM, mina wrote:

 I use solr 3.3,I want solr index words with their suffixes. when i index
 'book' and 'books' and search 'book', solr show any document that has 'book'
 or 'books' but when I index 'rain' and 'rainy' and search 'rain', solr show
 any document that has 'rain' but i whant that solr show any document that
 has 'rain' or 'rainy'.help me.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-index-words-with-their-perfix-in-solr-tp3542300p3542300.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: query within search results

2011-11-08 Thread François Schiettecatte
Wouldn't 'diseases AND water' or '+diseases +water' return you that result? Or 
you could search on 'water' while filtering on 'diseases'.

Or am I missing something here?

François

On Nov 8, 2011, at 4:19 PM, sharnel pereira wrote:

 Hi,
 
 I have 10k records indexed using solr 1.4
 
 We have a requirement to search within search results.
 
 example: query for 'water' returns 2000 results. I need the second query
 for 'diseases' to search within those 2000 results.(I cant add a facet as
 the second search should also check non faceted fields)
 
 Is there a way to get this working.
 
 Thanks
 Sharnel



Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread François Schiettecatte
Arshad

Actually it is available, you need to use the ReversedWildcardFilterFactory 
which I am sure you can Google for.

Solr and SQL address different problem sets with some overlaps but there are 
significant differences between the two technologies. Actually '%Solr%' is a 
worse case for SQL but handled quite elegantly in Solr.

Hope this helps!

Cheers

François


On Nov 1, 2011, at 7:46 AM, arshad ansari wrote:

 Hi,
 
 Is SQL Like operator feature available in Apache Solr Just like we have it
 in SQL.
 
 SQL example below -
 
 *Select * from Employee where employee_name like '%Solr%'*
 
 If not is it a Bug with Solr. If this feature available, please tell the
 examples available.
 
 Thanks!
 
 -- 
 Best Regards,
 Arshad



Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread François Schiettecatte
Kuli

Good point about just tokenizing the fields :)

I ran a couple of tests to double-check my understanding and you can have a 
wildcard operator at either or both ends of a term. Adding 
ReversedWildcardFilterFactory to your field analyzer will make leading wildcard 
searches a lot faster of course but at the expense of index size.

Cheers

François


On Nov 1, 2011, at 9:07 AM, Michael Kuhlmann wrote:

 Hi,
 
 this is not exactly true. In Solr, you can't have the wildcard operator on 
 both sides of the operator.
 
 However, you can tokenize your fields and simply query for Solr. This is 
 what's Solr made for. :)
 
 -Kuli
 
 Am 01.11.2011 13:24, schrieb François Schiettecatte:
 Arshad
 
 Actually it is available, you need to use the ReversedWildcardFilterFactory 
 which I am sure you can Google for.
 
 Solr and SQL address different problem sets with some overlaps but there are 
 significant differences between the two technologies. Actually '%Solr%' is a 
 worse case for SQL but handled quite elegantly in Solr.
 
 Hope this helps!
 
 Cheers
 
 François
 
 
 On Nov 1, 2011, at 7:46 AM, arshad ansari wrote:
 
 Hi,
 
 Is SQL Like operator feature available in Apache Solr Just like we have it
 in SQL.
 
 SQL example below -
 
 *Select * from Employee where employee_name like '%Solr%'*
 
 If not is it a Bug with Solr. If this feature available, please tell the
 examples available.
 
 Thanks!
 
 --
 Best Regards,
 Arshad
 
 



Re: Uncomplete date expressions

2011-10-29 Thread François Schiettecatte
Erik

I would complement the date with default values as you suggest and store a 
boolean flag indicating whether the date was complete or not, or store the 
original date if it is not complete which would probably be better because the 
presence of that data would tell you that the original date was not complete 
and you would also have it too.

Cheers

François

On Oct 29, 2011, at 9:12 AM, Erik Fäßler wrote:

 Hi all,
 
 I want to index MEDLINE documents which not always contain complete dates of 
 publication. The year is known always. Now the Solr documentation states, 
 dates must have the format 1995-12-31T23:59:59Z for which month, day and 
 even the time of the day must be known.
 I could, of course, just complement uncomplete dates with default values, 
 01-01 for example. But then I won't be able to distinguish between complete 
 and uncomplete dates afterwards which is of importance when displaying the 
 documents.
 
 I could just store the known information, e.g. the year, into an 
 integer-typed field, but then I won't have date math.
 
 Is there a good solution to my problem? Probably I'm just missing the 
 obvious, perhaps you can help me :-)
 
 Best regards,
 
   Erik



Re: drastic performance decrease with 20 cores

2011-09-26 Thread François Schiettecatte
You have not said how big your index is but I suspect that allocating 13GB for 
your 20 cores is starving the OS of memory for caching file data. Have you 
tried 6GB with 20 cores? I suspect you will see the same performance as 6GB  
10 cores.

Generally it is better to allocate just enough memory to SOLR to run optimally 
rather than as much as possible. 'Just enough' depends as well. You will need 
to try out different allocations and see where the sweet spot is.

Cheers

François


On Sep 26, 2011, at 9:53 AM, Bictor Man wrote:

 Hi everyone,
 
 Sorry if this issue has been discussed before, but I'm new to the list.
 
 I have a solr (3.4) instance running with 20 cores (around 4 million docs
 each).
 The instance has allocated 13GB in a 16GB RAM server. If I run several sets
 of queries sequentially in each of the cores, the I/O access goes very high,
 so does the system load, while the CPU percentage remains always low.
 It takes almost 1 hour to complete the set of queries.
 
 If I stop solr and restart it with 6GB allocated and 10 cores, after a bit
 the I/O access goes down and the CPU goes up, taking only around 5 minutes
 to complete all sets of queries.
 
 Meaning that for me is MUCH more performant having 2 solr instances running
 with half the data and half the memory than a single instance will all the
 data and memory.
 
 It would be even way faster to have 1 instance with half the cores/memory,
 run the queues, shut it down, start a new instance and repeat the process
 than having a big instance running everything.
 
 Furthermore, if I take the 20cores/13GB instance, unload 10 of the cores,
 trigger the garbage collector and run the sets of queries again, the
 behavior still remains slow taking like 30 minutes.
 
 am I missing something here? does solr change its caching policy depending
 on the number of cores at startup or something similar?
 
 Any hints will be very appreciated.
 
 Thanks,
 Victor



Re: synonyms.txt: different results on admin and on site..

2011-09-08 Thread François Schiettecatte
Wildcard terms are not analyzed, so your synonyms.txt may come into play here, 
have you check the analysis for deniz* ?

François

On Sep 7, 2011, at 10:08 PM, deniz wrote:

 well yea you are right... i realised that lack of detail issue here... so
 here it comes... 
 
 
 This is from my schema.xml and basically i have a synonyms.txt file which
 contains
 
 deniz,denis,denise
 
 
 After posting here, I have checked some stuff that I have faced before,
 while trying to add accented letters to the system... so it seems like same
 or similar stuff... so...
 
 As i want to support partial matches, the search string is modified on php
 side. if user enters deniz, it is sent to solr as deniz*
 
 when i check on solr admin, i was able to make searches with 
 deniz,denise,denis and they all return correct results, but when i put the
 wildcard, i get nothing...
 
 so with the above settings;
 
 deniz
 denise
 denis
 works smoothly
 
 deniz*
 denise*
 denis*
 returns nothing...
 
 
 should i implement some kinda analyzer or tokenizer or any kinda component
 to overtime this thing? 
 
 
 
 
 
 
 
 
 
 
 Rob Casson wrote:
 
 you should probably post your schema.xml and some parts of your
 synonyms.txt.  it could be differences between your index and query
 analysis chains, synonym expansion errors, etc, but folks will likely
 need more details to help you out.
 
 cheers,
 rob
 
 On Wed, Sep 7, 2011 at 9:46 PM, deniz lt;denizdurmu...@gmail.comgt;
 wrote:
 could it be related with analysis issue about synonyms once again?
 
 
 
 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318503.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: MMapDirectory failed to map a 23G compound index segment

2011-09-07 Thread François Schiettecatte
My memory of this is a little rusty but isn't mmap also limited by mem + swap 
on the box? What does 'free -g' report?

François

On Sep 7, 2011, at 12:25 PM, Rich Cariens wrote:

 Ahoy ahoy!
 
 I've run into the dreaded OOM error with MMapDirectory on a 23G cfs compound
 index segment file. The stack trace looks pretty much like every other trace
 I've found when searching for OOM  map failed[1]. My configuration
 follows:
 
 Solr 1.4.1/Lucene 2.9.3 (plus
 SOLR-1969https://issues.apache.org/jira/browse/SOLR-1969
 )
 CentOS 4.9 (Final)
 Linux 2.6.9-100.ELsmp x86_64 yada yada yada
 Java SE (build 1.6.0_21-b06)
 Hotspot 64-bit Server VM (build 17.0-b16, mixed mode)
 ulimits:
core file size (blocks, -c) 0
data seg size(kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals(-i) 1024
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files(-n) 256000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size(kbytes, -s) 10240
cpu time(seconds, -t) unlimited
max user processes (-u) 1064959
virtual memory(kbytes, -v) unlimited
file locks(-x) unlimited
 
 Any suggestions?
 
 Thanks in advance,
 Rich
 
 [1]
 ...
 java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(Unknown Source)
 at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(Unknown
 Source)
 at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(Unknown
 Source)
 at org.apache.lucene.store.MMapDirectory.openInput(Unknown Source)
 at org.apache.lucene.index.SegmentReader$CoreReaders.init(Unknown Source)
 
 at org.apache.lucene.index.SegmentReader.get(Unknown Source)
 at org.apache.lucene.index.SegmentReader.get(Unknown Source)
 at org.apache.lucene.index.DirectoryReader.init(Unknown Source)
 at org.apache.lucene.index.ReadOnlyDirectoryReader.init(Unknown Source)
 at org.apache.lucene.index.DirectoryReader$1.doBody(Unknown Source)
 at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown
 Source)
 at org.apache.lucene.index.DirectoryReader.open(Unknown Source)
 at org.apache.lucene.index.IndexReader.open(Unknown Source)
 ...
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 ...



Re: Solr and wikipedia for schools

2011-09-04 Thread François Schiettecatte
I note that there is a full download option available, might be easier than 
crawling.

François

On Sep 4, 2011, at 9:56 AM, Markus Jelsma wrote:

 Hi,
 
 Solr is a search engine, not a crawler. You can use Apache Nutch to crawl 
 your 
 site and have it indexed in Solr.
 
 Cheers,
 
 Hi,
 
 I am new to Solr/Lucene, and have some problems trying to figure out the
 best way to perform indexing. I think I understand the general principles,
 but have some trouble translating this to my specific goal, which is the
 following:
 
 I want to use SolR as a search engine based on general (English) keywords,
 that has indexed Wikipedia for Schools
 (http://www.soschildrensvillages.org.uk/charity-news/archive/2008/10/2008-
 wikipedia-for-schools).
 
 I initially thought that it would be sufficient to add the root document
 (index.html) to Solr, after which everything would be automagically
 indexed, but this does not seem to work. I have also tried to use
 urldatasource in data-config.xml, but there I get a bit confused by the
 settings.
 
 Could anyone help me understand how I can achieve my goal?
 
 Thanks
 
 Kees



Re: shareSchema=true - location of schema.xml?

2011-08-31 Thread François Schiettecatte
Satish

You don't say which platform you are on but have you tried links (with ln on 
linux/unix) ?

François

On Aug 31, 2011, at 12:25 AM, Satish Talim wrote:

 I have 1000's of cores and to reduce the cost of loading unloading
 schema.xml, I have my solr.xml as mentioned here -
 http://wiki.apache.org/solr/CoreAdmin
 namely:
 
 solr
  cores adminPath=/admin/cores shareSchema=true
...
  /cores
 /solr
 
 However, I am not sure where to keep the common schema.xml file? In which
 case, do I need the schema.xml in the conf folder of each and every core?
 
 My folder structure is:
 
 multicore (contains solr.xml)
|_ core0
 |_ conf
 ||_ schema.xml
 ||_ solrconfig.xml
 ||_ other files
   core1
 |_ conf
 ||_ schema.xml
 ||_ solrconfig.xml
 ||_ other files
 |
   exampledocs (contains 1000's of .csv files and post.jar)
 
 Satish



Re: Error while decoding %DC (Ü) from URL - results in ?

2011-08-29 Thread François Schiettecatte
Merlin

Just to make sure I understand what is going on here, you are getting searches 
from external crawlers. These are coming in the form of an HTTP request I 
assume?

Have you checked the encoding specified in these requests (in the content type 
header). If the encoding is not specified then iso-8859-1 is usually assumed. 
Also have you checked the default encoding of your container? If you are using 
tomcat that is set using URIEncoding, for example:

Connector address=localhost port=8000 protocol=HTTP/1.1
   connectionTimeout=2 URIEncoding=UTF-8 /

François

On Aug 28, 2011, at 3:10 PM, Merlin Morgenstern wrote:

 I double checked all code on that page and it looks like everything is in
 utf-8 and works just perfect. The problematic URLs are called always by bots
 like google bot. Looks like they are operating with a different encoding.
 The page itself has an utf-8 meta tag.
 
 So it looks like I have to find a way that checks for the encoding and
 encodes apropriatly. this should be a common solr problem if all search
 engines treat utf-8 that way, right?
 
 Any ideas how to fix that? Is there maybe a special solr functionality for
 this?
 
 2011/8/27 François Schiettecatte fschietteca...@gmail.com
 
 Merlin
 
 Ü encodes to two characters in utf-8 (C39C), and one in iso-8859-1 (%DC) so
 it looks like there is a charset mismatch somewhere.
 
 
 Cheers
 
 François
 
 
 
 On Aug 27, 2011, at 6:34 AM, Merlin Morgenstern wrote:
 
 Hello,
 
 I am having problems with searches that are issued from spiders that
 contain
 the ASCII encoded character ü
 
 For example in : Übersetzung
 
 The solr log shows following query request: /suche/%DCbersetzung
 which has been translated into solr query: q=?ersetzung
 
 If you enter the search term directly as a user into the search box it
 will
 result into:
 /suche/Übersetzung which returns perfect results.
 
 I am decoding the URL within PHP: $term = trim(urldecode($q));
 
 Somehow urldecode() translates the Character Ü (%DC) into a ? which is a
 illigeal first character in Solr.
 
 I tried it without urldecode(), with rawurldecode() and with
 utf8_decode()
 but all of those did not help.
 
 Thank you for any help or hint on how to solve that problem.
 
 Regards, Merlin
 
 



Re: Error while decoding %DC (Ü) from URL - results in ?

2011-08-27 Thread François Schiettecatte
Merlin

Ü encodes to two characters in utf-8 (C39C), and one in iso-8859-1 (%DC) so it 
looks like there is a charset mismatch somewhere.


Cheers

François



On Aug 27, 2011, at 6:34 AM, Merlin Morgenstern wrote:

 Hello,
 
 I am having problems with searches that are issued from spiders that contain
 the ASCII encoded character ü
 
 For example in : Übersetzung
 
 The solr log shows following query request: /suche/%DCbersetzung
 which has been translated into solr query: q=?ersetzung
 
 If you enter the search term directly as a user into the search box it will
 result into:
 /suche/Übersetzung which returns perfect results.
 
 I am decoding the URL within PHP: $term = trim(urldecode($q));
 
 Somehow urldecode() translates the Character Ü (%DC) into a ? which is a
 illigeal first character in Solr.
 
 I tried it without urldecode(), with rawurldecode() and with utf8_decode()
 but all of those did not help.
 
 Thank you for any help or hint on how to solve that problem.
 
 Regards, Merlin



Re: SolrServer instances

2011-08-26 Thread François Schiettecatte
Sounds to me that you are looking for HTTP Persistent Connections (connection 
keep-alive as opposed to close), and a singleton object. This would be outside 
SOLR per se.

A few caveats though, I am not sure if tomcat supports keep-alive, and I am not 
sure how SOLR deals with multiple requests coming down the pipe, and you will 
need to deal with concurrency, and I am not sure what you are looking to gain 
from this, opening an http connection is pretty cheap.

François

On Aug 26, 2011, at 2:09 AM, Jonty Rhods wrote:

 do I also required to close the connection from solr server
 (CommonHttpSolrServer).
 
 regards
 
 On Fri, Aug 26, 2011 at 9:45 AM, Jonty Rhods jonty.rh...@gmail.com wrote:
 
 Deal all please help I am stuck here as I have not much experience..
 
 thanks
 
 On Thu, Aug 25, 2011 at 6:51 PM, Jonty Rhods jonty.rh...@gmail.comwrote:
 
 Hi All,
 
 I am using SolrJ (3.1) and Tomcat 6.x. I want to open solr server once (20
 concurrence) and reuse this across all the site. Or something like
 connection pool like we are using for DB (ie Apache DBCP). There is a way to
 use static method which is a way but I want better solution from you people.
 
 
 
 I read one threade where Ahmet suggest to use something like that
 
 String serverPath = http://localhost:8983/solr;;
 HttpClient client = new HttpClient(new
 MultiThreadedHttpConnectionManager());
 URL url = new URL(serverPath);
 CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(url, client);
 
 But how to use instance of this across all class.
 
 Please suggest.
 
 regards
 Jonty
 
 
 



Re: Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread François Schiettecatte
Assuming you are running on Linux, you might want to check /var/log/messages 
too (the location might vary), I think the kernel logs forced process 
termination there. I recall that the kernel will usually picks the process 
consuming the most memory, there may be other factors involved too.

François

On Aug 2, 2011, at 9:04 AM, wakemaster 39 wrote:

 Monitor your memory usage.  I use to encounter a problem like this before
 where nothing was in the logs and the process was just gone.
 
 Turned out my system was out odd memory and swap got used up because of
 another process which then forced the kernel to start killing off processes.
 Google OOM linux and you will find plenty of other programs and people with
 a similar problem.
 
 Cameron
 On Aug 2, 2011 6:02 AM, alexander sulz a.s...@digiconcept.net wrote:
 Hello folks,
 
 I'm using the latest stable Solr release - 3.3 and I encounter strange
 phenomena with it.
 After about 19 hours it just crashes, but I can't find anything in the
 logs, no exceptions, no warnings,
 no suspicious info entries..
 
 I have an index-job running from 6am to 8pm every 10 minutes. After each
 job there is a commit.
 An optimize-job is done twice a day at 12:15pm and 9:15pm.
 
 Does anyone have an idea what could possibly be wrong or where to look
 for further debug info?
 
 regards and thank you
 alex



Re: Solr can not index F**K!

2011-07-31 Thread François Schiettecatte
That seems a little far fetched, have you checked your analysis?

François

On Jul 31, 2011, at 4:58 PM, randohi wrote:

 One of our clients (a hot girl!) brought this to our attention: 
 In this document there are many f* words:
 
 http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm
 
 and we have indexed it with latest version of Solr (ver 3.3). But, we if we
 search F**K, it does not return the document back!
 
 We have tried to index it with different text types, but still not working.
 
 Any idea why F* can not be indexed - being censored by the government? :D
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214246.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr can not index F**K!

2011-07-31 Thread François Schiettecatte
Indeed, the analysis will show if the term is a stop word, the term gets 
removed by the stop filter, turning on verbose output shows that.

François

On Jul 31, 2011, at 6:27 PM, Shashi Kant wrote:

 Check your Stop words list
 On Jul 31, 2011 6:25 PM, François Schiettecatte fschietteca...@gmail.com
 wrote:
 That seems a little far fetched, have you checked your analysis?
 
 François
 
 On Jul 31, 2011, at 4:58 PM, randohi wrote:
 
 One of our clients (a hot girl!) brought this to our attention:
 In this document there are many f* words:
 
 http://sec.gov/Archives/edgar/data/1474227/00014742271032/d424b3.htm
 
 and we have indexed it with latest version of Solr (ver 3.3). But, we if
 we
 search F**K, it does not return the document back!
 
 We have tried to index it with different text types, but still not
 working.
 
 Any idea why F* can not be indexed - being censored by the government? :D
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-can-not-index-F-K-tp3214246p3214246.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: schema.xml changes, need re-indexing ?

2011-07-27 Thread François Schiettecatte
I have not seen this mentioned anywhere, but I found a useful 'trick' to 
restart solr without having to restart tomcat. All you need to do is 'touch' 
the solr.xml in the solr.home directory. It can take a few seconds but solr 
will restart and reload any config.

Cheers

François 

On Jul 27, 2011, at 2:56 PM, Alexei Martchenko wrote:

 I believe you're fine with that. Don't need to reindex all solr database.
 
 2011/7/27 Charles-Andre Martin charles-andre.mar...@sunmedia.ca
 
 Hi,
 
 
 
 We currently have a big index in production. We would like to add 2
 non-required fields to our schema.xml :
 
 
 
 field name=myfield type=boolean indexed=true stored=true
 required=false/
 
 field name=myotherfield type=string indexed=true stored=true
 required=false multiValued=true/
 
 
 
 I made some tests:
 
 
 
 -  I stopped tomcat
 
 -  I changed the schema.xml
 
 -  I started tomcat
 
 
 
 The data was still there and I was able to add new document with theses 2
 fields.
 
 
 
 So far, it looks I won't need to re-index all my data. Am I right ? Do I
 need to re-index all my data or in that case I'm fine ?
 
 
 
 Thank you !
 
 
 
 Charles-André Martin
 
 
 
 
 -- 
 
 *Alexei Martchenko* | *CEO* | Superdownloads
 ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
 5083.1018/5080.3535/5080.3533



Re: Spellcheck compounded words

2011-07-26 Thread François Schiettecatte
FWIW, here is the process I follow to create a log4j aware version of the 
apache solr war file and the corresponding lo4j.properties files.

Have fun :)

François


##
#
# Log4J configuration for SOLR
#
#   http://wiki.apache.org/solr/SolrLogging
#
#
# 1) Download SLF4J:
#   http://www.slf4j.org/
#   http://www.slf4j.org/download.html
#   http://www.slf4j.org/dist/slf4j-1.6.1.tar.gz
#
# 2) Unpack Solr:
#   jar xvf apache-solr-3.3.0.war
#
# 3) Delete:
#   WEB-INF/lib/log4j-over-slf4j-1.6.1.jar
#   WEB-INF/lib/slf4j-jdk14-1.6.1.jar
#
# 4) Copy:
#   slf4j-1.6.1/slf4j-log4j12-1.6.1.jar -  
WEB-INF/lib
#   log4j.properties (this file)-  
WEB-INF/classes/ (needs to be created)
#
# 5) Pack Solr:
#   jar cvf apache-solr-3.3.0.war admin favicon.ico index.jsp 
META-INF WEB-INF
#
#
#   Author: Francois Schiettecatte
#   Version:1.0
#
##



##
#
# Logging levels (helpful reminder)
#
# DEBUG  INFO  WARN  ERROR  FATAL
#



##
#
# Logging setup
#

log4j.rootLogger=ERROR, SOLR


# Daily Rolling File Appender (SOLR)
log4j.appender.SOLR=org.apache.log4j.DailyRollingFileAppender
log4j.appender.SOLR.File=${catalina.base}/logs/solr.log
log4j.appender.SOLR.Append=true
log4j.appender.SOLR.Encoding=UTF-8
log4j.appender.SOLR.DatePattern='-'-MM-dd
log4j.appender.SOLR.layout=org.apache.log4j.PatternLayout
log4j.appender.SOLR.layout.ConversionPattern=%d [%t] %-5p %c - %m%n



##
#
# Logging levels for SOLR
#

# Default logging level
log4j.logger.org.apache.solr=ERROR



##




On Jul 26, 2011, at 2:49 PM, O. Klein wrote:

 Adding log4j-1.2.16.jar and deleting slf4j-jdk14-1.6.1.jar does not fix
 logging for 4.0 for me.
 
 Anyways, tried it on 3.3 and Solr just hangs here also. No logging, no
 exceptions.
 
 I'll let you know if I manage to find source of problem.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3201202.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Spellcheck compounded words

2011-07-26 Thread François Schiettecatte
I get slf4j-log4j12-1.6.1.jar from 
http://www.slf4j.org/dist/slf4j-1.6.1.tar.gz, it is what interfaces  slf4j to 
log4j, you will also need to add log4j-1.2.16.jar to WEB-INF/lib.


François 


On Jul 26, 2011, at 3:40 PM, O. Klein wrote:

 
 François Schiettecatte wrote:
 
 #
 # 4) Copy:
 #slf4j-1.6.1/slf4j-log4j12-1.6.1.jar -  
 WEB-INF/lib
 #log4j.properties (this file)-  
 WEB-INF/classes/ (needs to be
 created)
 #
 
 
 Don't you mean log4j-1.2.16/slf4j-log4j12-1.6.1.jar ?
 
 Anyways. I was testing on 3.3 and found that when I added
 spellcheck.maxCollations=2spellcheck.maxCollationTries=2 as parameters to
 the URL there was no problem at all.
 
 Adding 
 
  str name=spellcheck.maxCollations2/str
  str name=spellcheck.maxCollationTries2/str
 
 to the default requestHandler in solrconfig.xml caused request to hang.
 
 Can someone verify if this is a bug?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3201332.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: problem searching on non standard characters

2011-07-22 Thread François Schiettecatte
Check your analyzers to make sure that these characters are not getting 
stripped out in the tokenization process, the url for 3.3 is somewhere along 
the lines of:

http://localhost/solr/admin/analysis.jsp?highlight=on

And you should be indeed be searching on \#test.

François

On Jul 22, 2011, at 10:34 AM, Jason Toy wrote:

 How does one search for words with characters like # and +.   I have tried
 searching solr with #test and \#test but all my results always come up
 with test and not #test. Is this some kind of configuration option I
 need to set in solr?
 
 -- 
 - sent from my mobile
 6176064373



Re: problem searching on non standard characters

2011-07-22 Thread François Schiettecatte
Adding to my previous reply, I just did a quick check on the 'text_en' and 
'text_en_splitting' field types and they both strip leading '#'.

Cheers

François

On Jul 22, 2011, at 10:49 AM, Shawn Heisey wrote:

 On 7/22/2011 8:34 AM, Jason Toy wrote:
 How does one search for words with characters like # and +.   I have tried
 searching solr with #test and \#test but all my results always come up
 with test and not #test. Is this some kind of configuration option I
 need to set in solr?
 
 I would guess that your analysis chain (in schema.xml) includes something 
 that removes and/or splits terms at non-alphanumeric characters.  There are a 
 several components that do this, but WordDelimiterFilter is the one that 
 comes to mind most readily.  I've never used the StandardTokenizer, but I 
 believe it might do something similar.
 
 Thanks,
 Shawn
 



Re: How to find whether solr server is running or not

2011-07-19 Thread François Schiettecatte
I think anything but a 200 OK mean it is dead like the proverbial parrot :)

François

On Jul 19, 2011, at 7:42 AM, Romi wrote:

 But the problem is when solr server is not runing 
 *http://host:port/solr/admin/ping*
 
 will not give me any json response
 then how will i get the status :(
 
 when i run this url browser gives me following error
 *Unable to connect
 Firefox can't establish a connection to the server at 192.168.1.9:8983.*
 
 -
 Thanks  Regards
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-find-whether-solr-server-is-running-or-not-tp3181870p3182202.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: - character in search query

2011-07-14 Thread François Schiettecatte
Easy, the hyphen is out on its own (with spaces on either side) and is probably 
getting removed from the search by the tokenizer. Check your analysis.

François

On Jul 14, 2011, at 6:05 AM, roySolr wrote:

 It looks like it's still not working.
 
 I send this to SOLR: q=arsenal \- london
 
 I get no results. When i look at the debugQuery i see this:
 
 (name: arsenal | city:arsenal)~1.0 (name: \ | city:\)~1.0 (name: london |
 city: london)~1.0
 
 
 my requesthandler:
 
requestHandler name=dismax class=solr.SearchHandler default=true
lst name=defaults
str name=defTypedismax/str
str name=qf
   name city
/str
   /lst
  /requestHandler
 
 What is going wrong?
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3168666.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Wildcard

2011-07-13 Thread François Schiettecatte
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html

http://wiki.apache.org/solr/SolrQuerySyntax

François

On Jul 13, 2011, at 1:29 PM, GAURAV PAREEK wrote:

 Hello,
 
 What are wildcards we can use with the SOLR ?
 
 Regards,
 Gaurav



Re: Result list order in case of ties

2011-07-12 Thread François Schiettecatte
You just need to provide a second sort field along the lines of:

sort=score desc, author desc

François

On Jul 12, 2011, at 6:13 AM, Lox wrote:

 Hi,
 
 In the case where two or more documents are returned with the same score, is
 there a way to tell Solr to sort them alphabetically?
 
 I have already tried to use the tie-breaker, but I have just one field to
 search.
 
 Thank you.
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Result-list-order-in-case-of-ties-tp3162001p3162001.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: performance variation with respect to the index size

2011-07-08 Thread François Schiettecatte
Hi

I don't think that anyone has run such benchmarks, in fact this topic came up 
two weeks ago and I volunteered some time to do that because I have some spare 
time this week, so I am going to run some benchmarks this weekend and report 
back.

The machine I have to do this a core i7 960, 24GB, 4TB of disk. I am going to 
run SOLR 3.3 under Tomcat 7.0.16. I have three databases I can use for this, 
icwsm-2009 (38.5GB compressed), cdip (24GB compressed), trec vlc2 (31GB 
compressed). I could also use a copy of wikipedia. I have lots of user searches 
I can use (saved from Feedster days).

I would like some input on a couple of things to make this test as real-world 
as possible. One is any optimizations I should set in solrconfig.xml, and the 
other are the heap/GC settings I should set for tomcat. Anything else?

Cheers

François

On Jul 8, 2011, at 4:08 AM, jame vaalet wrote:

 hi,
 
 is there any performance degradation (response time etc ) if the index has
 document content text stored in it  (stored=true)?
 
 -JAME



Re: Wildcard search not working if full word is queried

2011-07-01 Thread François Schiettecatte
Celso

You are very welcome and yes I should have mentioned that wildcard searches are 
not analyzed (which is a recurring theme). This also means that they are not 
downcased, so the search TEST* will probably not find anything either in  your 
set up.

Cheers

François

On Jul 1, 2011, at 5:16 AM, Celso Pinto wrote:

 Hi again,
 
 read (past tense) TFM :-) and:
 
 On wildcard and fuzzy searches, no text analysis is performed on the
 search word.
 
 Thanks a lot François!
 
 Regards,
 Celso
 
 On Fri, Jul 1, 2011 at 10:02 AM, Celso Pinto cpi...@yimports.com wrote:
 Hi François,
 
 it is indeed being stemmed, thanks a lot for the heads up. It appears
 that stemming is also configured for the query so it should work just
 the same, no?
 
 Thanks again.
 
 Regards,
 Celso
 
 
 2011/6/30 François Schiettecatte fschietteca...@gmail.com:
 I would run that word through the analyzer, I suspect that the word 'teste' 
 is being stemmed to 'test' in the index, at least that is the first place I 
 would check.
 
 François
 
 On Jun 30, 2011, at 2:21 PM, Celso Pinto wrote:
 
 Hi everyone,
 
 I'm having some trouble figuring out why a query with an exact word
 followed by the * wildcard, eg. teste*, returns no results while a
 query for test* returns results that have the word teste in them.
 
 I've created a couple of pasties:
 
 Exact word with wildcard : http://pastebin.com/n9SMNsH0
 Similar word: http://pastebin.com/jQ56Ww6b
 
 Parameters other than title, description and content have no effect
 other than filtering out unwanted results. In a two of the four
 results, the title has the complete word teste. On the other two,
 the word appears in the other fields.
 
 Does anyone have any insights about what I'm doing wrong?
 
 Thanks in advance.
 
 Regards,
 Celso
 
 
 



Re: Wildcard search not working if full word is queried

2011-06-30 Thread François Schiettecatte
I would run that word through the analyzer, I suspect that the word 'teste' is 
being stemmed to 'test' in the index, at least that is the first place I would 
check.

François

On Jun 30, 2011, at 2:21 PM, Celso Pinto wrote:

 Hi everyone,
 
 I'm having some trouble figuring out why a query with an exact word
 followed by the * wildcard, eg. teste*, returns no results while a
 query for test* returns results that have the word teste in them.
 
 I've created a couple of pasties:
 
 Exact word with wildcard : http://pastebin.com/n9SMNsH0
 Similar word: http://pastebin.com/jQ56Ww6b
 
 Parameters other than title, description and content have no effect
 other than filtering out unwanted results. In a two of the four
 results, the title has the complete word teste. On the other two,
 the word appears in the other fields.
 
 Does anyone have any insights about what I'm doing wrong?
 
 Thanks in advance.
 
 Regards,
 Celso



Re: filters effect on search results

2011-06-29 Thread François Schiettecatte
Indeed, I find the Porter stemmer to be too 'aggressive' for my taste, I prefer 
the EnglishMinimalStemFilterFactory, with the caveat that it depends on your 
data set.

Cheers

François

On Jun 29, 2011, at 6:21 AM, Ahmet Arslan wrote:

 Hi, when i query for elegant in
 solr i get results for elegance too. 
 
 *I used these filters for index analyze*
 WhitespaceTokenizerFactory 
 StopFilterFactory 
 WordDelimiterFilterFactory
 LowerCaseFilterFactory 
 SynonymFilterFactory
 EnglishPorterFilterFactory
 RemoveDuplicatesTokenFilterFactory
 ReversedWildcardFilterFactory 
 
 *
 and for query analyze:*
 
 .WhitespaceTokenizerFactory
 SynonymFilterFactory
 StopFilterFactory
 WordDelimiterFilterFactory 
 LowerCaseFilterFactory 
 EnglishPorterFilterFactory 
 RemoveDuplicatesTokenFilterFactory 
 
 I want to know which filter affecting my search result.
 
 
 It is EnglishPorterFilterFactory, you can verify it from admin/analysis.jsp 
 page.



Re: Include synonys in solr

2011-06-28 Thread François Schiettecatte
Well you need to find word lists and/or a thesaurus.

This is one place to start:

http://wordlist.sourceforge.net/

I used the US/UK english word list for my synonyms for an index I have because 
it contains both US and UK english terms, the list lacks some medical terms 
though so we just added them.

Cheers

François

On Jun 28, 2011, at 6:55 AM, Romi wrote:

 Please see
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
 No offence, but a simple Google search, or a search of the Wiki
 would have turned this up. Please try such simpler avenues before
 dashing off a message to the list.
 
 
 Gora, I heve already read the document and also included synonyms in my
 search results :)
 
 My question is , when i use this *filter class=solr.SynonymFilterFactory
 synonyms=syn.txt ignoreCase=true expand=false/
 * i need to enter synonyms manually in synonyms.txt. which is really tough
 if you have many words for synonyms. i wanted to ask is there any other
 option so that i need not to enter synonyms manually.. i hope you got my
 point :)
 
 
 -
 Thanks  Regards
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Include-synonyms-in-solr-tp3116836p3117365.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
Create a hash from the url and use that as the unique key, md5 or sha1 would 
probably be good enough.

Cheers

François

On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:

 I also have the problem of duplicate docs.
 I am indexing news articles, Every news article will have the source URL,
 If two news-article has the same URL, only one need to index,
 removal of duplicate at index time.
 
 
 
 On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:
 
 have you checked out the deduplication process that's available at
 indexing time ? This includes a fuzzy hash algorithm .
 
 http://wiki.apache.org/solr/Deduplication
 
 -Simon
 
 On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com wrote:
 This approach would definitely work is the two documents are *Exactly*
 the
 same. But this is very fragile. Even if one extra space has been added,
 the
 whole hash would change. What I am really looking for is some %age
 similarity between documents, and remove those documents which are more
 than
 95% similar.
 
 *Pranav Prakash*
 
 temet nosce
 
 Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com |
 Google http://www.google.com/profiles/pranny
 
 
 On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:
 
 What you need to do, is to calculate some HASH (using any message digest
 algorithm you want, md5, sha-1 and so on), then do some reading on solr
 field collapse capabilities. Should not be too complicated..
 
 *Omri Cohen*
 
 
 
 Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
 +972-3-6036295
 
 
 
 
 My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
 [image:
 Twitter] http://www.twitter.com/omricohe [image:
 WordPress]http://omricohen.me
 Please consider your environmental responsibility. Before printing this
 e-mail message, ask yourself whether you really need a hard copy.
 IMPORTANT: The contents of this email and any attachments are
 confidential.
 They are intended for the named recipient(s) only. If you have received
 this
 email by mistake, please notify the sender immediately and do not
 disclose
 the contents to anyone or make copies thereof.
 Signature powered by
 
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 WiseStamp
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 
 
 
 -- Forwarded message --
 From: Pranav Prakash pra...@gmail.com
 Date: Thu, Jun 23, 2011 at 12:26 PM
 Subject: Removing duplicate documents from search results
 To: solr-user@lucene.apache.org
 
 
 How can I remove very similar documents from search results?
 
 My scenario is that there are documents in the index which are almost
 similar (people submitting same stuff multiple times, sometimes
 different
 people submitting same stuff). Now when a search is performed for
 keyword,
 in the top N results, quite frequently, same document comes up multiple
 times. I want to remove those duplicate (or possible duplicate)
 documents.
 Very similar to what Google does when they say In order to show you
 most
 relevant result, duplicates have been removed. How can I achieve this
 functionality using Solr? Does Solr has an implied or plugin which could
 help me with it?
 
 
 *Pranav Prakash*
 
 temet nosce
 
 Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com
 
 |
 Google http://www.google.com/profiles/pranny
 
 
 
 
 
 
 -- 
 Thanks and Regards
 Mohammad Shariq



Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
Maybe there is a way to get Solr to reject documents that already exist in the 
index but I doubt it, maybe someone else with can chime here here. You could do 
a search for each document prior to indexing it so see if it is already in the 
index, that is probably non-optimal, maybe it is easiest to check if the 
document exists in your Riak repository, it no add it and index it, and drop if 
it already exists.

François

On Jun 28, 2011, at 8:24 AM, Mohammad Shariq wrote:

 I am making the Hash from URL, but I can't use this as UniqueKey because I
 am using UUID as UniqueKey,
 Since I am using SOLR as  index engine Only and using Riak(key-value
 storage) as storage engine, I dont want to do the overwrite on duplicate.
 I just need to discard the duplicates.
 
 
 
 2011/6/28 François Schiettecatte fschietteca...@gmail.com
 
 Create a hash from the url and use that as the unique key, md5 or sha1
 would probably be good enough.
 
 Cheers
 
 François
 
 On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:
 
 I also have the problem of duplicate docs.
 I am indexing news articles, Every news article will have the source URL,
 If two news-article has the same URL, only one need to index,
 removal of duplicate at index time.
 
 
 
 On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:
 
 have you checked out the deduplication process that's available at
 indexing time ? This includes a fuzzy hash algorithm .
 
 http://wiki.apache.org/solr/Deduplication
 
 -Simon
 
 On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
 wrote:
 This approach would definitely work is the two documents are *Exactly*
 the
 same. But this is very fragile. Even if one extra space has been added,
 the
 whole hash would change. What I am really looking for is some %age
 similarity between documents, and remove those documents which are more
 than
 95% similar.
 
 *Pranav Prakash*
 
 temet nosce
 
 Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com |
 Google http://www.google.com/profiles/pranny
 
 
 On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:
 
 What you need to do, is to calculate some HASH (using any message
 digest
 algorithm you want, md5, sha-1 and so on), then do some reading on
 solr
 field collapse capabilities. Should not be too complicated..
 
 *Omri Cohen*
 
 
 
 Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
 +972-3-6036295
 
 
 
 
 My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
 [image:
 Twitter] http://www.twitter.com/omricohe [image:
 WordPress]http://omricohen.me
 Please consider your environmental responsibility. Before printing
 this
 e-mail message, ask yourself whether you really need a hard copy.
 IMPORTANT: The contents of this email and any attachments are
 confidential.
 They are intended for the named recipient(s) only. If you have
 received
 this
 email by mistake, please notify the sender immediately and do not
 disclose
 the contents to anyone or make copies thereof.
 Signature powered by
 
 
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 WiseStamp
 
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 
 
 
 -- Forwarded message --
 From: Pranav Prakash pra...@gmail.com
 Date: Thu, Jun 23, 2011 at 12:26 PM
 Subject: Removing duplicate documents from search results
 To: solr-user@lucene.apache.org
 
 
 How can I remove very similar documents from search results?
 
 My scenario is that there are documents in the index which are almost
 similar (people submitting same stuff multiple times, sometimes
 different
 people submitting same stuff). Now when a search is performed for
 keyword,
 in the top N results, quite frequently, same document comes up
 multiple
 times. I want to remove those duplicate (or possible duplicate)
 documents.
 Very similar to what Google does when they say In order to show you
 most
 relevant result, duplicates have been removed. How can I achieve this
 functionality using Solr? Does Solr has an implied or plugin which
 could
 help me with it?
 
 
 *Pranav Prakash*
 
 temet nosce
 
 Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com
 
 |
 Google http://www.google.com/profiles/pranny
 
 
 
 
 
 
 --
 Thanks and Regards
 Mohammad Shariq
 
 
 
 
 -- 
 Thanks and Regards
 Mohammad Shariq



Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
Indeed, take a look at this:

http://wiki.apache.org/solr/Deduplication

I have not used it but it looks like it will do the trick.

François

On Jun 28, 2011, at 8:44 AM, Pranav Prakash wrote:

 I found the deduplication thing really useful. Although I have not yet
 started to work on it, as there are some other low hanging fruits I've to
 capture. Will share my thoughts soon.
 
 
 *Pranav Prakash*
 
 temet nosce
 
 Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
 Google http://www.google.com/profiles/pranny
 
 
 2011/6/28 François Schiettecatte fschietteca...@gmail.com
 
 Maybe there is a way to get Solr to reject documents that already exist in
 the index but I doubt it, maybe someone else with can chime here here. You
 could do a search for each document prior to indexing it so see if it is
 already in the index, that is probably non-optimal, maybe it is easiest to
 check if the document exists in your Riak repository, it no add it and index
 it, and drop if it already exists.
 
 François
 
 On Jun 28, 2011, at 8:24 AM, Mohammad Shariq wrote:
 
 I am making the Hash from URL, but I can't use this as UniqueKey because
 I
 am using UUID as UniqueKey,
 Since I am using SOLR as  index engine Only and using Riak(key-value
 storage) as storage engine, I dont want to do the overwrite on duplicate.
 I just need to discard the duplicates.
 
 
 
 2011/6/28 François Schiettecatte fschietteca...@gmail.com
 
 Create a hash from the url and use that as the unique key, md5 or sha1
 would probably be good enough.
 
 Cheers
 
 François
 
 On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:
 
 I also have the problem of duplicate docs.
 I am indexing news articles, Every news article will have the source
 URL,
 If two news-article has the same URL, only one need to index,
 removal of duplicate at index time.
 
 
 
 On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:
 
 have you checked out the deduplication process that's available at
 indexing time ? This includes a fuzzy hash algorithm .
 
 http://wiki.apache.org/solr/Deduplication
 
 -Simon
 
 On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
 wrote:
 This approach would definitely work is the two documents are
 *Exactly*
 the
 same. But this is very fragile. Even if one extra space has been
 added,
 the
 whole hash would change. What I am really looking for is some %age
 similarity between documents, and remove those documents which are
 more
 than
 95% similar.
 
 *Pranav Prakash*
 
 temet nosce
 
 Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com |
 Google http://www.google.com/profiles/pranny
 
 
 On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:
 
 What you need to do, is to calculate some HASH (using any message
 digest
 algorithm you want, md5, sha-1 and so on), then do some reading on
 solr
 field collapse capabilities. Should not be too complicated..
 
 *Omri Cohen*
 
 
 
 Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
 +972-3-6036295
 
 
 
 
 My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
 [image:
 Twitter] http://www.twitter.com/omricohe [image:
 WordPress]http://omricohen.me
 Please consider your environmental responsibility. Before printing
 this
 e-mail message, ask yourself whether you really need a hard copy.
 IMPORTANT: The contents of this email and any attachments are
 confidential.
 They are intended for the named recipient(s) only. If you have
 received
 this
 email by mistake, please notify the sender immediately and do not
 disclose
 the contents to anyone or make copies thereof.
 Signature powered by
 
 
 
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 WiseStamp
 
 
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 
 
 
 -- Forwarded message --
 From: Pranav Prakash pra...@gmail.com
 Date: Thu, Jun 23, 2011 at 12:26 PM
 Subject: Removing duplicate documents from search results
 To: solr-user@lucene.apache.org
 
 
 How can I remove very similar documents from search results?
 
 My scenario is that there are documents in the index which are
 almost
 similar (people submitting same stuff multiple times, sometimes
 different
 people submitting same stuff). Now when a search is performed for
 keyword,
 in the top N results, quite frequently, same document comes up
 multiple
 times. I want to remove those duplicate (or possible duplicate)
 documents.
 Very similar to what Google does when they say In order to show you
 most
 relevant result, duplicates have been removed. How can I achieve
 this
 functionality using Solr? Does Solr has an implied or plugin which
 could
 help me with it?
 
 
 *Pranav Prakash*
 
 temet nosce
 
 Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com
 
 |
 Google http://www.google.com/profiles/pranny
 
 
 
 
 
 
 --
 Thanks and Regards
 Mohammad Shariq

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
Yeah, I read the overview which suggests that duplicates can be prevented from 
entering the index and scanned the rest, it does not look like you can actually 
drop the document entirely. Maybe I am missing something here.

François

On Jun 28, 2011, at 9:14 AM, Mohammad Shariq wrote:

 Hey François,
 thanks for your suggestion, I followed the same link (
 http://wiki.apache.org/solr/Deduplication)
 
 they have the solution*, either make Hash as uniqueKey OR overwrite on
 duplicate,
 I dont need either.
 
 I need Discard on Duplicate.
 *
 
 
 
 I have not used it but it looks like it will do the trick.
 
 François
 
 On Jun 28, 2011, at 8:44 AM, Pranav Prakash wrote:
 
 I found the deduplication thing really useful. Although I have not yet
 started to work on it, as there are some other low hanging fruits I've to
 capture. Will share my thoughts soon.
 
 
 *Pranav Prakash*
 
 temet nosce
 
 Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com |
 Google http://www.google.com/profiles/pranny
 
 
 2011/6/28 François Schiettecatte fschietteca...@gmail.com
 
 Maybe there is a way to get Solr to reject documents that already exist
 in
 the index but I doubt it, maybe someone else with can chime here here.
 You
 could do a search for each document prior to indexing it so see if it is
 already in the index, that is probably non-optimal, maybe it is easiest
 to
 check if the document exists in your Riak repository, it no add it and
 index
 it, and drop if it already exists.
 
 François
 
 On Jun 28, 2011, at 8:24 AM, Mohammad Shariq wrote:
 
 I am making the Hash from URL, but I can't use this as UniqueKey
 because
 I
 am using UUID as UniqueKey,
 Since I am using SOLR as  index engine Only and using Riak(key-value
 storage) as storage engine, I dont want to do the overwrite on
 duplicate.
 I just need to discard the duplicates.
 
 
 
 2011/6/28 François Schiettecatte fschietteca...@gmail.com
 
 Create a hash from the url and use that as the unique key, md5 or sha1
 would probably be good enough.
 
 Cheers
 
 François
 
 On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote:
 
 I also have the problem of duplicate docs.
 I am indexing news articles, Every news article will have the source
 URL,
 If two news-article has the same URL, only one need to index,
 removal of duplicate at index time.
 
 
 
 On 23 June 2011 21:24, simon mtnes...@gmail.com wrote:
 
 have you checked out the deduplication process that's available at
 indexing time ? This includes a fuzzy hash algorithm .
 
 http://wiki.apache.org/solr/Deduplication
 
 -Simon
 
 On Thu, Jun 23, 2011 at 5:55 AM, Pranav Prakash pra...@gmail.com
 wrote:
 This approach would definitely work is the two documents are
 *Exactly*
 the
 same. But this is very fragile. Even if one extra space has been
 added,
 the
 whole hash would change. What I am really looking for is some %age
 similarity between documents, and remove those documents which are
 more
 than
 95% similar.
 
 *Pranav Prakash*
 
 temet nosce
 
 Twitter http://twitter.com/pranavprakash | Blog 
 http://blog.myblive.com |
 Google http://www.google.com/profiles/pranny
 
 
 On Thu, Jun 23, 2011 at 15:16, Omri Cohen o...@yotpo.com wrote:
 
 What you need to do, is to calculate some HASH (using any message
 digest
 algorithm you want, md5, sha-1 and so on), then do some reading on
 solr
 field collapse capabilities. Should not be too complicated..
 
 *Omri Cohen*
 
 
 
 Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 |
 +972-3-6036295
 
 
 
 
 My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric
 [image:
 Twitter] http://www.twitter.com/omricohe [image:
 WordPress]http://omricohen.me
 Please consider your environmental responsibility. Before printing
 this
 e-mail message, ask yourself whether you really need a hard copy.
 IMPORTANT: The contents of this email and any attachments are
 confidential.
 They are intended for the named recipient(s) only. If you have
 received
 this
 email by mistake, please notify the sender immediately and do not
 disclose
 the contents to anyone or make copies thereof.
 Signature powered by
 
 
 
 
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 WiseStamp
 
 
 
 
 http://www.wisestamp.com/email-install?utm_source=extensionutm_medium=emailutm_campaign=footer
 
 
 
 
 -- Forwarded message --
 From: Pranav Prakash pra...@gmail.com
 Date: Thu, Jun 23, 2011 at 12:26 PM
 Subject: Removing duplicate documents from search results
 To: solr-user@lucene.apache.org
 
 
 How can I remove very similar documents from search results?
 
 My scenario is that there are documents in the index which are
 almost
 similar (people submitting same stuff multiple times, sometimes
 different
 people submitting same stuff). Now when a search is performed for
 keyword,
 in the top N results, quite frequently, same document comes up
 multiple
 times. I want to remove those duplicate (or possible

Re: Include synonys in solr

2011-06-28 Thread François Schiettecatte
Well no, you need to see which files (if any) will suit your needs, they are 
not all synonyms files, I only needed the UK/US english file and I needed to 
process it into a format suitable for the synonyms file.

There may well be other word lists on the net suitable for your needs. I would 
not recommend the use of synonyms unless you have a specific need for them. I 
needed them because we have documents which mix UK/US english, and we need to 
be able to search on medical terms e.g. hemoglobin/haemoglobin and get the same 
results.

Cheers 

François

On Jun 28, 2011, at 9:21 AM, Romi wrote:

 Thanks François Schiettecatte, information you provided is very helpful.
 i need to know one more thing, i downloaded one of the given dictionary but
 it contains many files, do i need to add all this files data in to
 synonyms.text ??
 
 -
 Thanks  Regards
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Include-synonyms-in-solr-tp3116836p3117733.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Extending Solr Highlighter to pull information from external source

2011-06-20 Thread François Schiettecatte
Mike

I would be very interested in the answer to that question too. My hunch is that 
the answer is no too. I have a few text databases that range from 200MB to 
about 60GB with which I could run some tests. I will have some downtime in 
early July and will post results.

From what I can tell the Guardian newspaper is doing just that:


http://www.guardian.co.uk/open-platform/blog/what-is-powering-the-content-api

http://www.lucidimagination.com/blog/2010/04/29/for-the-guardian-solr-is-the-new-database/

Cheers

François


On Jun 20, 2011, at 9:05 AM, Mike Sokolov wrote:

 I'd be very interested in this, as well, if you do it before me and are 
 willing to share...
 
 A related question I have tried to ask on this list, and have never really 
 gotten a good answer to, is whether it makes sense to just chuck the external 
 storage and treat the lucene index as the primary storage for documents.  I 
 have a feeling the answer is no; perhaps because of increased I/O costs for 
 lucene and solr, but I don't really know.  I've been considering doing some 
 experimentation, but would really love an expert opinion...
 
 -Mike
 
 On 06/20/2011 08:41 AM, Jamie Johnson wrote:
 I am trying to index data where I'm concerned that storing the contents of a
 specific field will be a bit of a hog so we are planning to retrieve this
 information as needed for highlighting from an external source.  I am
 looking to extend the default solr highlighting capability to work with
 information pulled from this external source and it looks like this is
 possible by extending DefaultSolrHighlighter (line 418 to pull a particular
 field from external source) for standard highlighting and
 BaseFragmentsBuilder (line 99) for FastVectorHighlighter.  I could just hard
 code this to say if the field name is a specific value look into the
 external source, is this the best way to accomplish this?  Are there any
 other extension points to do what I'm suggesting?
 
   



Re: Searching in Traditional / Simplified Chinese Record

2011-06-20 Thread François Schiettecatte
Wayne

I am not sure what you mean by 'changing the record'.

One option would be to implement something like the synonyms filter to generate 
the TC for SC when you index the document, which would index both the TC and 
the SC in the same location. That way your users would be able to search with 
either TC or SC.

Another option would be to use the same synonyms filter but do the expansion at 
search time.

Cheers

François


On Jun 20, 2011, at 5:41 AM, waynelam wrote:

 Hi,
 
 I 've recently make change to my schema.xml to support import of Chinese 
 Record.
 What i want to do is to search both Traditional Chinese(TC) (e.g. ?? )and 
 Simplified Chinese (SC) (e.g. ??) Record
 when in the same query. I know I can do that by encoding all SC Record to TC. 
 I want to change to way to index
 rather that change the record.
 
 Anyone should show me the way in much appreciated.
 
 
 Thanks
 
 Wayne
 
 
 -- 
 -
 Wayne Lam
 Assistant Library Officer I
 Systems Development  Support
 Fong Sum Wood Library
 Lingnan University
 8 Castle Peak Road
 Tuen Mun, New Territories
 Hong Kong SAR
 China
 Phone:   +852 26168585
 Email:   wayne...@ln.edu.hk
 Website: http://www.library.ln.edu.hk
 



Re: Is it true that I cannot delete stored content from the index?

2011-06-19 Thread François Schiettecatte
That is correct, but you only need to commit, optimize is not a requirement 
here.

François

On Jun 18, 2011, at 11:54 PM, Mohammad Shariq wrote:

 I have define uniqueKey in my solr and Deleting the docs from solr using
 this uniqueKey.
 and then doing optimization once in a day.
 is this right way to delete ???
 
 On 19 June 2011 05:14, Erick Erickson erickerick...@gmail.com wrote:
 
 Yep, you've got to delete and re-add. Although if you have a
 uniqueKey defined you
 can just re-add that document and Solr will automatically delete the
 underlying
 document.
 
 You might have to optimize the index afterwards to get the data to really
 disappear since the deletion process just marks the document as
 deleted.
 
 Best
 Erick
 
 On Sat, Jun 18, 2011 at 1:20 PM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
 Hello,
 
 I've indexing with the content field stored. Now I'd like to delete all
 stored content, is there how to do that without re-indexing?
 
 It seems not from lucene
 FAQ
 http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_update_a_document_or_a_set_of_documents_that_are_already_indexed.3F
 
 :
 How do I update a document or a set of documents that are already
 indexed? There
 is no direct update procedure in Lucene. To update an index incrementally
 you must first *delete* the documents that were updated, and *then
 re-add*them to the index.
 
 --
 Regards,
 K. Gabriele
 
 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)
  Now + 48h) ⇒ ¬resend(I, this).
 
 If an email is sent by a sender that is not a trusted contact or the
 email
 does not contain a valid code then the email is not received. A valid
 code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).
 
 
 
 
 
 -- 
 Thanks and Regards
 Mohammad Shariq



Re: Why does paste get parsed into past?

2011-06-18 Thread François Schiettecatte
What do you have set up for stemming?

François

On Jun 18, 2011, at 8:00 AM, Gabriele Kahlout wrote:

 Hello,
 
 Debugging query results I find that:
 str name=querystringpaste/str
  str name=parsedquerycontent:past/str
 
 Now paste and past are two different words. Why does Solr not consider
 that? How do I make it?
 
 --
 Regards,
 K. Gabriele
 
 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).
 
 If an email is sent by a sender that is not a trusted contact or the
 email does not contain a valid code then the email is not received. A
 valid code starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
 ∈ L(-[a-z]+[0-9]X)).



Re: Why does paste get parsed into past?

2011-06-18 Thread François Schiettecatte
What I meant was what stemmer are you using? Maybe it is the stemmer that is 
cutting the 'e'. You can check that on the field analysis solr web page.

François

On Jun 18, 2011, at 11:42 AM, Gabriele Kahlout wrote:

 I'm !sure where those are set, but on reflection I'd keep the default
 settings. My real issue is why are not query keywords treated as a
 set?http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201106.mbox/%3CBANLkTikHunhyWc2WVTofRYU4ZW=c8oe...@mail.gmail.com%3E
 2011/6/18 François Schiettecatte fschietteca...@gmail.com
 
 What do you have set up for stemming?
 
 François
 
 On Jun 18, 2011, at 8:00 AM, Gabriele Kahlout wrote:
 
 Hello,
 
 Debugging query results I find that:
 str name=querystringpaste/str
 str name=parsedquerycontent:past/str
 
 Now paste and past are two different words. Why does Solr not consider
 that? How do I make it?
 
 --
 Regards,
 K. Gabriele
 
 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).
 
 If an email is sent by a sender that is not a trusted contact or the
 email does not contain a valid code then the email is not received. A
 valid code starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
 ∈ L(-[a-z]+[0-9]X)).
 
 
 
 
 -- 
 Regards,
 K. Gabriele
 
 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).
 
 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).



Re: Multiple indexes

2011-06-18 Thread François Schiettecatte
Sure.

François

On Jun 18, 2011, at 2:25 PM, shacky wrote:

 2011/6/15 Edoardo Tosca e.to...@sourcesense.com:
 Try to use multiple cores:
 http://wiki.apache.org/solr/CoreAdmin
 
 Can I do concurrent searches on multiple cores?



Re: Multiple indexes

2011-06-18 Thread François Schiettecatte
You would need to run two independent searches and then 'join' the results.

It is best not to apply a 'sql' mindset to SOLR when it comes to 
(de)normalization, whereas you strive for normalization in sql, that is usually 
counter-productive in SOLR. For example, I am working on a project with 30+ 
normalized tables, but only 4 cores.

Perhaps describing what you are trying to achieve would give us greater insight 
and thus be able to make more concrete recommendation?

Cheers

François 

On Jun 18, 2011, at 2:36 PM, shacky wrote:

 Il 18 giugno 2011 20:27, François Schiettecatte
 fschietteca...@gmail.com ha scritto:
 Sure.
 
 So I can have some searches similar to JOIN on MySQL?
 The problem is that I need at least two tables in which search data..



Re: Performance loss - querying more than 64 cores (randomly)

2011-06-16 Thread François Schiettecatte
I am assuming that you are running on linux here, I have found atop to be very 
useful to see what is going on.

http://freshmeat.net/projects/atop/

dstat is also very useful too but needs a little more work to 'decode'.

Obviously there is contention going on, you just need to figure out where it 
is, most likely it is disk I/O but it could also be the number of cores you 
have. Also I would not say that performance is decreasing rapidly, probably 
more of a gentle slope down if you plot it (your double the number of cores 
every time).

I would be very interested in hearing about what you find.

Cheers

François

On Jun 16, 2011, at 10:00 AM, Andrzej Bialecki wrote:

 On 6/16/11 3:22 PM, Mark Schoy wrote:
 Hi,
 
 I set up a Solr instance with 512 cores. Each core has 100k documents and 15
 fields. Solr is running on a CPU with 4 cores (2.7Ghz) and 16GB RAM.
 
 Now I've done some benchmarks with JMeter. On each thread iteration JMeter
 queriing another Core by random. Here are the results (Duration:  each with
 180 second):
 
 Randomly queried cores | queries per second
 1| 2016
 2 | 2001
 4 | 1978
 8 | 1958
 16 | 2047
 32 | 1959
 64 | 1879
 128 | 1446
 256 | 1009
 512 | 428
 
 Why are the queries per second until 64 constant and then the performance is
 degreasing rapidly?
 
 Solr only uses 10GB of the 16GB memory so I think it is not a memory issue.
 
 
 This may be an OS-level disk buffer issue. With a limited disk buffer space 
 the more random IO occurs from different files, the higher is the churn rate, 
 and if the buffers are full then the churn rate may increase dramatically 
 (and the performance will drop then). Modern OS-es try to keep as much data 
 in memory as possible, so the memory usage itself is not that informative - 
 but check what are the pagein/pageout rates when you start hitting the 32 vs 
 64 cores.
 
 -- 
 Best regards,
 Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com
 



Re: Strange behavior

2011-06-14 Thread François Schiettecatte
I think you will need to provide more information than this, no-one on this 
list is omniscient AFAIK.

François

On Jun 14, 2011, at 10:44 AM, Denis Kuzmenok wrote:

 Hi.
 
 I've  debugged search on test machine, after copying to production server
 the  entire  directory  (entire solr directory), i've noticed that one
 query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
 production.
 How can that be?
 



Re: Solr Field name restrictions

2011-06-04 Thread François Schiettecatte
Underscores and dashes are fine, but I would think that colons (:) are verboten.

François

On Jun 4, 2011, at 9:49 PM, Jamie Johnson wrote:

 Is there a list anywhere detailing field name restrictions.  I imagine
 fields containing periods (.) are problematic if you try to use that field
 when doing faceted queries, but are there any others?  Are underscores (_)
 or dashes (-) ok?



Re: synonyms problem

2011-06-02 Thread François Schiettecatte
Are you sure solr.StrField is the way to go with this? solr.StrField stores the 
entire text verbatim and I am pretty sure skips any analysis. Perhaps you 
should use solr.TextField instead.

François

On Jun 2, 2011, at 2:28 AM, deniz wrote:

 Hi all,
 
 here is a piece from my solfconfig:   
 
 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
  /analyzer
/fieldType
 
 
 but somehow synonyms are not read... I mean there is no match when i use a
 word in the synonym file... any ideas?
 
 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3014006.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to request for Json object

2011-06-02 Thread François Schiettecatte
This is not really an issue with SOLR per se, and I have run into this before, 
you will need to read up on 'Access-Control-Allow-Origin' which needs to be set 
in the http headers that your ajax pager is returning. Beware that not all 
browsers obey it and Olivier is right when he suggested creating a proxy, which 
I did.

François

On Jun 2, 2011, at 3:27 AM, Romi wrote:

 How to parse Json through ajax when your ajax pager is on one
 server(Tomcat)and Json object is of onther server(solr server). i mean i
 have to make a request to another server, how can i do it .
 
 -
 Thanks  Regards
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH: Exception with Too many connections

2011-05-31 Thread François Schiettecatte
Hi

You might also check the 'max_user_connections' settings too if you have that 
set:

# Maximum number of connections, and per user
max_connections   = 2048
max_user_connections  = 2048

http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html

Cheers

François


On May 31, 2011, at 7:39 AM, Stefan Matheis wrote:

 Tiffany,
 
 On Tue, May 31, 2011 at 12:45 PM, tiffany tiffany.c...@future.co.jp wrote:
 I executed the  SHOW PROCESSLIST; command. (Is it what you mean? I've never
 tried it before...)
 
 Exactly this, yes :)
 
 On Tue, May 31, 2011 at 12:45 PM, tiffany tiffany.c...@future.co.jp wrote:
 So, if the number of threads in the process list is larger than
 max_connections, I would get the too many connections error.  Am I
 thinking the right way?
 
 Yepp, right
 
 On Tue, May 31, 2011 at 12:45 PM, tiffany tiffany.c...@future.co.jp wrote:
 If it is right, maybe I should think of the commit timing, changing the
 number of max_connections, and/or some other ways...
 
 You may lift the allowed Number of Connections for the MySQL-Server?
 Or, of course - if possible - tweak your SOLR-Settings, correct
 
 Regards
 Stefan



Re: UniqueKey field in schema.xml

2011-05-26 Thread François Schiettecatte
You concatenate the two keys into a single string, with some sort of delimiter 
between the two keys.

François

On May 26, 2011, at 6:05 AM, Romi wrote:

 what do you mean by combine two fields customerID and ProductId. 
 what i tried is 
 1. make both fields unique but it doesnot server my purpose
 2. make a new field ID and copy both customerID , ProductId into ID using
 CopyField and now make ID as uniqueKey
 but i got a error saying: Document specifies multiple unique ids
 
 -
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/UniqueKey-field-in-schema-xml-tp2987807p2988168.html
 Sent from the Solr - User mailing list archive at Nabble.com.



  1   2   >