Re: Custom filter development

2011-05-09 Thread Tom Hill
On Mon, May 9, 2011 at 5:07 AM, solrfan a2701...@jnxjn.com wrote:
 Hi, I would like to write my own filter. I try to use the following class:
 But this is a problem for me. The one-to-one mapping. I want to map a given
 Token, for example a to three Tokens a1, a2, a3. I want to do a
 one-to-one mapping to b - c too, and I want to have the possibility to
 remove a Token d - .

 How can I do this, when the next methods returns only one Token, not a
 collection?

Buffer them internally. Look at SynonymFilter.java, it does exactly this.

Tom



 Thanks!

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Custom-filter-development-tp2918459p2918459.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-10 Thread Tom Hill
Hi John,

WeakReferences allow things to get GC'd, if there are no other
references to the object referred to.

My understanding is that WeakHashMaps use weak references for the Keys
in the HashMap.

What this means is that the keys in HashMap can be GC'd, once there
are no other references to the key. I _think_ this occurs when the
IndexReader is closed.

It does not mean that objects in the FieldCache will get evicted in
low memory conditions, unless that field cache entry is no longer
needed (i.e. the IndexReader has closed). It just means they can be
collected, when they are no longer needed (but not before).

So, if you are seeing the FieldCache for the current IndexReader
taking up 2.1, that's probably for the current cache usage.

There isn't a knob you can turn to cut the cache size, but you can
evaluate your usage of the cache. Some ideas:

How many fields are you searching on? Sorting on? Are you sorting on
String fields, where you could be using a numeric field? Numerics save
space. Do you need to sort on every field that you are sorting on?
Could you facet on fewer fields? For a String field, do you have too
many distinct values? If so, can you reduce the number or unique
terms? You might check your faceting algorithms, and see if you could
use enum, instead of fc for some of them.

Check your statistics page, what's your insanity count?

Tom



On Fri, Dec 10, 2010 at 12:17 PM, John Russell jjruss...@gmail.com wrote:
 I have been load testing solr 1.4.1 and have been running into OOM errors.
 Not out of heap but with the GC overhead limit exceeded message meaning that
 it didn't actually run out of heap space but just spent too much CPU time
 trying to make room and gave up.

 I got a heap dump and sent it through the Eclipse MAT and found that a
 single WeakHashMap in FieldCacheImpl called readerCache is taking up 2.1GB
 of my 2.6GB heap.

 From my understanding of WeakHashMaps the GC should be able to collect those
 references if it needs to but for some reason it isn't here.

 My questions are:

 1) Any ideas why the GC is not collecting those weak references in that
 single hashmap?
 2) Is there a nob in the solr config that can limit the size of that cache?


 Also, after the OOM is thrown solr doesn't respond much at all and throws
 the exception below, however when I go to the code I see this

 try {
                  processor.processAdd(addCmd);
                  addCmd.clear();
                } catch (IOException e) {
                  throw new
 SolrException(SolrException.ErrorCode.SERVER_ERROR, ERROR adding document 
 + document);
                }
              }

 So its swallowing the IOException and throwing  a new one without setting
 the cause so I can't see what the IOException is.  Is this fixed in any
 newer version? Should I open a bug?


 Thanks a lot for your help

 John


 SEVERE: org.apache.solr.common.SolrException: ERROR adding document
 SolrInputDocument[{de.id=de.id(1.0)={C2B3B03F112C549254560A568C18},
 de.type=de.type(1.0)={Social
 Contact}, sc.author=sc.author(1.0)={Author-3944},
 sc.sourceType=sc.sourceType(1.0)={rss}, sc.link=sc.link(1.0)={
 http://www.cisco.com/feed/date_12.07.10_16.18.03/idx/107
 52}, sc.title=sc.title(1.0)={Title-erat metus eget vestibulum},
 sc.publishedDate=sc.publishedDate(1.0)={Tue Dec 07 16:22:09 EST 2010},
 sc.createdDate=sc.createdDate(1.0
 )={Tue Dec 07 16:20:20 EST 2010},
 sc.socialContactStatus=sc.socialContactStatus(1.0)={unread},
 sc.socialContactStatusUserId=sc.socialContactStatusUserId(1.0)={}, sc.soc
 ialContactStatusDate=sc.socialContactStatusDate(1.0)={Tue Dec 07 16:20:20
 EST 2010}, sc.tags=sc.tags(1.0)={[]}, sc.authorId=sc.authorId(1.0)={},
 sc.replyToId=sc.replyTo
 Id(1.0)={}, sc.replyToAuthor=sc.replyToAuthor(1.0)={},
 sc.replyToAuthorId=sc.replyToAuthorId(1.0)={},
 sc.feedId=sc.feedId(1.0)={[124852]}, filterResult_124932_ti=filter
 Result_124932_ti(1.0)={67},
 filterStatus_124932_s=filterStatus_124932_s(1.0)={COMPLETED},
 filterResult_124937_ti=filterResult_124937_ti(1.0)={67},
 filterStatus_124937_s
 =filterStatus_124937_s(1.0)={COMPLETED},
 campaignDateAdded_124957_tdt=campaignDateAdded_124957_tdt(1.0)={Tue Dec 07
 16:20:20 EST 2010}, campaignStatus_124957_s=campaign
 Status_124957_s(1.0)={NEW},
 campaignDateAdded_124947_tdt=campaignDateAdded_124947_tdt(1.0)={Tue Dec 07
 16:20:20 EST 2010}, campaignStatus_124947_s=campaignStatus_124947
 _s(1.0)={NEW},
 sc.campaignResultsSummary=sc.campaignResultsSummary(1.0)={[NEW, NEW]}}]
        at
 org.apache.solr.handler.BinaryUpdateRequestHandler$2.document(BinaryUpdateRequestHandler.java:81)
        at
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:136)
        at
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readIterator(JavaBinUpdateRequestCodec.java:126)
        at
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:210)
        at
 

Re: singular/plurals

2010-12-10 Thread Tom Hill
Check out this page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Look, in particular, for stemming.

On Fri, Dec 10, 2010 at 7:58 PM, Jack O jack_...@yahoo.com wrote:
 Hello,

 Need one more help:

 What do I have to do so that search will work for singulars and plurals ?



 I would really appreciate all your help.

 /J





Re: command line parameters for solr

2010-12-10 Thread Tom Hill
java -jar start.jar --help

More docs here 
http://docs.codehaus.org/display/JETTY/A+look+at+the+start.jar+mechanism

Personally, I usually limit access to localhost by using whatever
firewall the machine uses.

Tom

On Fri, Dec 10, 2010 at 7:55 PM, Jack O jack_...@yahoo.com wrote:
 Hello,

 For starting solr, from where do i find the list of command line parameters.

 java -jar start.jar blahblah...

 I am especially looking for how to specify my own jetty config file. I want to
 allow access of solr from localhost only.


 I would really appreciate all your help.

 /J





Re: Delete by query or Id very slow

2010-12-09 Thread Tom Hill
I'd bet it's the optimize that's taking the time, and not the delete.
You don't really need to optimize these days, and you certainly don't
need to do it on every delete.

And you can give solr a list of ids to delete, which would be more efficient.

I don't believe you can tell which ones have failed, if any do, if you
delete with a list, but you are not using unsuccessful now anyway.

Tom
On Thu, Dec 9, 2010 at 7:55 AM, Ravi Kiran ravi.bhas...@gmail.com wrote:
 Thank you Tom for responding. On an average the docs are around 25-35 KB.
 The code is as follows, Kindly let me know if you see anything weird, a
 second pair of eyes always helps :-)

    public ListString deleteDocs(ListString ids) throws
 SolrCustomException {
        CommonsHttpSolrServer server = (CommonsHttpSolrServer)
 getServerInstance();
        ListString unsuccessful = new ArrayListString();
        try {
            if(ids!= null  !ids.isEmpty()) {
                for(String id : ids) {
                    server.deleteById(id);
                }
                server.commit();
                server.optimize();
            }
        }catch(IOException ioex) {
            throw new SolrCustomException(IOException while deleting : ,
 ioex);
        }catch(SolrServerException solrex) {
            throw new SolrCustomException(Could not delete : , solrex);
        }

        return unsuccessful;
    }

    private SolrServer getServerInstance() throws SolrCustomException {
        if(server != null) {
            return server;
        } else {
            String url = getServerURL();
            log.debug(Server URL:  + url);
            try {
                server = new CommonsHttpSolrServer(url);
                server.setSoTimeout(100); // socket read timeout
                server.setConnectionTimeout(100);
                server.setDefaultMaxConnectionsPerHost(1000);
                server.setMaxTotalConnections(1000);
                server.setFollowRedirects(false); // defaults to false
                // allowCompression defaults to false.Server side must
 support gzip or deflate for this to have any effect.
                server.setAllowCompression(true);
                server.setMaxRetries(1); // defaults to 0.  1 not
 recommended.

            } catch (MalformedURLException mex) {
                throw new SolrCustomException(Cannot resolve Solr Server at
 ' + url + '\n, mex);
            }
            return server;
        }
    }

 Thanks,

 Ravi Kiran Bhaskar

 On Wed, Dec 8, 2010 at 6:16 PM, Tom Hill solr-l...@worldware.com wrote:

 That''s a pretty low number of documents for auto complete. It means
 that when getting to 850,000 documents, you will create 8500 segments,
 and that's not counting merges.

 How big are your documents? I just created an 850,000 document (and a
 3.5 m doc index) with tiny documents (id and title), and they deleted
 quickly (17 milliseconds).

 Maybe if you post your delete code? Are you doing anything else (like
 commit/optimize?)

 Tom



 On Wed, Dec 8, 2010 at 12:55 PM, Ravi Kiran ravi.bhas...@gmail.com
 wrote:
  Hello,
 
              Iam using solr 1.4.1 when I delete by query or Id from solrj
 it
  is very very slow almost like a hang. The core form which Iam deleting
 has
  close to 850K documents in the index. In the solrconfig.xml autocommit is
  set as follows. Any idea how to speed up the deletion process. Please let
 me
  know if any more info is required
 
 
 
   updateHandler class=*solr.DirectUpdateHandler2*
 
     !-- Perform a commit/ automatically under certain conditions:
 
          maxDocs - number of updates since last commit is greater than
 this
 
          maxTime - oldest *uncommited* update (in *ms*) is this long ago
  --
 
     autoCommit
 
       maxDocs100/maxDocs
 
       maxTime12/maxTime
 
     /autoCommit
 
   /updateHandler
 
 
 
  Thanks,
 
 
 
  *Ravi Kiran Bhaskar*
 




Re: Triggering a reload of replicated configuration files

2010-12-09 Thread Tom Hill
On Thu, Dec 9, 2010 at 4:49 AM, Ophir Adiv firt...@gmail.com wrote:
 On Thu, Dec 9, 2010 at 2:25 PM, Upayavira u...@odoko.co.uk wrote:


 On Thu, 09 Dec 2010 13:34 +0200, Ophir Adiv firt...@gmail.com wrote:
 Hi,

 I added a configuration file which is updated on one of the master
 cores' conf directory, and also added the file name to the list of
 confFiles.
 As as expected, after index change and commit, this file gets
 replicated to the slave core.
 However, the problem that remains is how to reload this file's data
 after it's replicated.

 What I did on the master core, is to initiate a core reload, and
 through a custom CoreAdminHandler override handleReloadAction() to
 reload the new file too.
 But this cannot be done on the slave, since the master, which triggers
 the update, is unaware who is slaves are.

 Any ideas on how to do this?

 http://wiki.apache.org/solr/CoreAdmin#RELOAD

 Doesn't this do it?

 Upayavira


 This works on the master core, since the application knows its master
 cores - but this does not trigger a reload on the slave cores.


I believe it does. See SnapPuller.java

  if (successfulInstall) {
LOG.info(Configuration files are modified, core will be reloaded);
logReplicationTimeAndConfFiles(modifiedConfFiles,
successfulInstall);//write to a file time of replication and conf
files.
reloadCore();
  }

And I tested it awhile ago, and it seemed to be working.

Check your logs for errors, perhaps?

Tom


Re: How badly does NTFS file fragmentation impact search performance? 1.1X? 10X? 100X?

2010-12-08 Thread Tom Hill
If you can benchmark before and after, please post the results when
you are done!

Things like your index's size, and the amount of RAM in your computer
will help make it meaningful. If all of your index can be cached, I
don't think fragmentation is going matter much, once you get warmed
up.

Tom




On Wed, Dec 8, 2010 at 9:59 AM, Will Milspec will.mils...@gmail.com wrote:
 Hi all,

 Pardon if this isn't the best place to post this email...maybe it belongs on
 the lucene-user list .  Also, it's basically windows-specific,so not of use
 to everyone...

 The question: does NTFS fragmentation affect  search performance a little
 bit or a lot? It's obvious that fragmentation will slow things down,
 but is it a factor of .1, 10 , or 100? (i.e what order of magnitude)?

 As a follow up: should solr/lucene users periodically remind Windows
 sysadmins to defrag their drives ?

 On a production system, I ran the windows defrag analyzer and found heavy
 fragmentation on the lucene index.

 11,839          492 MB          \data\index\search\_6io5.cfs
 7,153           433 MB          \data\index\search\_5ld6.cfs
 6,953           661 MB          \data\index\search\_8jvj.cfs
 5,824           74 MB           \data\index\search\_5ld7.frq
 5,691           356 MB          \data\index\search\_9eev.fdt
 5,638           352 MB          \data\index\search\_8mqi.fdt
 5,629           352 MB          \data\index\search\_8jvj.fdt
 5,609           351 MB          \data\index\search\_88z8.fdt
 5,590           355 MB          \data\index\search\_96l5.fdt
 5,568           354 MB          \data\index\search\_8zjn.fdt
 5,471           342 MB          \data\index\search\_5wgo.fdt
 5,466           342 MB          \data\index\search\_5uo1.fdt
 5,450           340 MB          \data\index\search\_5hrn.fdt
 5,429           345 MB          \data\index\search\_6nyy.fdt
 5,371           353 MB          \data\index\search\_8sob.fdt

 Incidentally, we periodically experience some *very* slow searches. Out of
 curiousity, I checked for file fragmentation (using 'analyze' mode of the
 nfts defragger)

 nota bene: Windows sysinternals has a utility Contig.exe whic allows you
 to defragment individual drives/directories. We'll use that to defragmeent
 the  index direcotires

 will



Re: Delete by query or Id very slow

2010-12-08 Thread Tom Hill
That''s a pretty low number of documents for auto complete. It means
that when getting to 850,000 documents, you will create 8500 segments,
and that's not counting merges.

How big are your documents? I just created an 850,000 document (and a
3.5 m doc index) with tiny documents (id and title), and they deleted
quickly (17 milliseconds).

Maybe if you post your delete code? Are you doing anything else (like
commit/optimize?)

Tom



On Wed, Dec 8, 2010 at 12:55 PM, Ravi Kiran ravi.bhas...@gmail.com wrote:
 Hello,

             Iam using solr 1.4.1 when I delete by query or Id from solrj it
 is very very slow almost like a hang. The core form which Iam deleting has
 close to 850K documents in the index. In the solrconfig.xml autocommit is
 set as follows. Any idea how to speed up the deletion process. Please let me
 know if any more info is required



  updateHandler class=*solr.DirectUpdateHandler2*

    !-- Perform a commit/ automatically under certain conditions:

         maxDocs - number of updates since last commit is greater than this

         maxTime - oldest *uncommited* update (in *ms*) is this long ago
 --

    autoCommit

      maxDocs100/maxDocs

      maxTime12/maxTime

    /autoCommit

  /updateHandler



 Thanks,



 *Ravi Kiran Bhaskar*



Re: only index synonyms

2010-12-07 Thread Tom Hill
Hi Lee,

Sorry, I think Erick and I both thought the issue was converting the
synonyms, not removing the other words.

To keep only a set of words that match a list, use the
KeepWordFilterFactory, with your list of synonyms.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.KeepWordFilterFactory

I'd put the synonym filter first in your configuration for the field,
then the keep words filter factory.

Tom




On Tue, Dec 7, 2010 at 12:06 PM, lee carroll
lee.a.carr...@googlemail.com wrote:
 ok thanks for your response

 To summarise the solution then:

 To only index synonyms you must only send words that will match the synonym
 list. If words with out synonym ,atches are in the field to be indexed these
 words will be indexed. No way to avoid this by using schema.xml config.

 thanks lee c

 On 7 December 2010 13:21, Erick Erickson erickerick...@gmail.com wrote:

 OK, the light finally dawns

 *If* you have a defined list of words to remove, you can put them in
 with your stopwords and add a stopword filter to the field in
 schema.xml.

 Otherwise, you'll have to do some pre-processing and only send to
 solr words you want. I'm assuming you have a list of valid words
 (i.e. the words in your synonyms file) and could pre-filter the input
 to remove everything else. In that case you don't need a synonyms
 filter since you're controlling the whole process anyway

 Best
 Erick

 On Tue, Dec 7, 2010 at 6:07 AM, lee carroll lee.a.carr...@googlemail.com
 wrote:

  Hi tom
 
  This seems to place in the index
  This is a scenic line of words
  I just want scenic and words in the index
 
  I'm not at a terminal at the moment but will try again to make sure. I'm
  sure I'm missing the obvious
 
  Cheers lee
  On 7 Dec 2010 07:40, Tom Hill solr-l...@worldware.com wrote:
   Hi Lee,
  
  
   On Mon, Dec 6, 2010 at 10:56 PM, lee carroll
   lee.a.carr...@googlemail.com wrote:
   Hi Erik
  
   Nope, Erik is the other one. :-)
  
   thanks for the reply. I only want the synonyms to be in the index
   how can I achieve that ? Sorry probably missing something obvious in
 the
   docs
  
   Exactly what he said, use the = syntax. You've already got it. Add the
  lines
  
   pretty = scenic
   text = words
  
   to synonyms.txt, and it will do what you want.
  
   Tom
  
   On 7 Dec 2010 01:28, Erick Erickson erickerick...@gmail.com
 wrote:
   See:
  
  
 
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
  
   with the = syntax, I think that's what you're looking for
  
   Best
   Erick
  
   On Mon, Dec 6, 2010 at 6:34 PM, lee carroll 
  lee.a.carr...@googlemail.com
  wrote:
  
   Hi Can the following usecase be achieved.
  
   value to be analysed at index time this is a pretty line of text
  
   synonym list is pretty = scenic , text = words
  
   valued placed in the index is scenic words
  
   That is to say only the matching synonyms. Basically i want to
 produce
  a
   normalised set of phrases for faceting.
  
   Cheers Lee C
  
  
 




Re: customer ping response

2010-12-07 Thread Tom Hill
Hi Tri,

Well, I wouldn't really recommend this, but I just tried making a
custom XMLReponseWriter that wrote the response you wanted. So you can
use it with any request handler you want. Works fine, but it's pretty
hack-y.

The downside is, you are writing code, and you have to modify
SolrCore. But it's trivial to do.

So, I wouldn't recommend it, but it was fun to play around with. :)

It's probably easier to fix the load balancer, which is almost
certainly just looking for any string you specify. Just change what
it's expecting. They are built so you can configure this.

Tom

On Tue, Dec 7, 2010 at 5:56 PM, Erick Erickson erickerick...@gmail.com wrote:
 That's the query term being sent to the server.

 On Tue, Dec 7, 2010 at 8:50 PM, Tri Nguyen tringuye...@yahoo.com wrote:

 Hi,

 I'm reading the wiki.

 What does q=apache mean in the url?


 http://localhost:8983/solr/select/?stylesheet=q=apachewt=xslttr=example.xsl

 thanks,

 tri





 
 From: Markus Jelsma markus.jel...@openindex.io
 To: Tri Nguyen tringuye...@yahoo.com
 Cc: solr-user@lucene.apache.org
 Sent: Tue, December 7, 2010 4:35:28 PM
 Subject: Re: customer ping response

 Well, you can go a long way with xslt but i wouldn't know how to embed the
 server name in the response as Solr simply doesn't return that information.

 You'd have to patch the response Solr's giving or put a small script in
 front
 that can embed the server name.

  I need to return this:
 
  ?xml version=1.0 encoding=UTF-8?
  admin
  status
  nameServer/name
  valueok/value
  /status
  /admin
 
 
 
 
  
  From: Markus Jelsma markus.jel...@openindex.io
  To: solr-user@lucene.apache.org
  Cc: Tri Nguyen tringuye...@yahoo.com
  Sent: Tue, December 7, 2010 4:27:32 PM
  Subject: Re: customer ping response
 
  Of course! The ping request handler behaves like any other request
 handler
  and accepts at last the wt parameter [1]. Use xslt [2] to transform the
  output to any desirable form or use other response writers [1].
 
  Why anyway, is it a load balancer that only wants an OK output or
  something?
 
  [1]: http://wiki.apache.org/solr/CoreQueryParameters
  [2]: http://wiki.apache.org/solr/XsltResponseWriter
  [3]: http://wiki.apache.org/solr/QueryResponseWriter
 
   Can I have a custom xml response for the ping request?
  
   thanks,
  
   Tri




Re: complex boolean filtering in fq queries

2010-12-07 Thread Tom Hill
For one thing, you wouldn't have fq= in there, except at the beginning.

fq=location:national OR (location:CA AND city:San Francisco)

more below...

On Tue, Dec 7, 2010 at 10:25 PM, Andy angelf...@yahoo.com wrote:
 Forgot to add, my defaultOperator is AND.

 --- On Wed, 12/8/10, Andy angelf...@yahoo.com wrote:

 From: Andy angelf...@yahoo.com
 Subject: complex boolean filtering in fq queries
 To: solr-user@lucene.apache.org
 Date: Wednesday, December 8, 2010, 1:21 AM
 I have a facet query that requires
 some complex boolean filtering. Something like:

 fq=location:national OR (fq=location:CA AND fq=city:San
 Francisco)

 1) How do I turn the above filters into a REST query
 string?

Do you mean URL encoding it? You can just type your query into the
search box in the admin UI, and copy from the resulting URL.

 2) Do I need the double quotes around San Francisco?

Yes. Else is will be
(city:San) (Francisco)
Probably not what you want.

 3) Will complex boolean filters like this substantially
 slow down query performance?

That's not very complex, and the filter may be cached. Probably won't
be a problem.

Tom


 Thanks










Re: Index version on slave nodes

2010-12-07 Thread Tom Hill
Just off the top of my head, aren't you able to use a slave as a
repeater, so it's configured as both a master and a slave?

http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater

This would seem to require that the slave return the same values as
its master for indexversion. What happens if you configure your slave
as a master, also? Does that get the behavior you want?

Tom



On Tue, Dec 7, 2010 at 8:16 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
 Yes, i read that too in the replication request handler's source comments. But
 i would find it convenient if it would just use the same values as we see 
 using
 the details command.

 Any devs agree? Then i'd open a ticket for this one.

 On Tuesday 07 December 2010 17:14:09 Xin Li wrote:
 I read it somewhere (sorry for not remembering the source).. the
 indexversion command gets the replicable index version #. Since it
 is a slave machine, so the result is 0.

 Thanks,

 On Tue, Dec 7, 2010 at 11:06 AM, Markus Jelsma

 markus.jel...@openindex.io wrote:
  But why? I'd expect valid version numbers although the replication
  handler's source code seems to agree with you judging from the comments.
 
  On Monday 06 December 2010 17:49:16 Xin Li wrote:
  I think this is expected behavior. You have to issue the details
  command to get the real indexversion for slave machines.
 
  Thanks,
  Xin
 
  On Mon, Dec 6, 2010 at 11:26 AM, Markus Jelsma
 
  markus.jel...@openindex.io wrote:
   Hi,
  
   The indexversion command in the replicationHandler on slave nodes
   returns 0 for indexversion and generation while the details command
   does return the correct information. I haven't found an existing
   ticket on this one although
   https://issues.apache.org/jira/browse/SOLR-1573 has
   similarities.
  
   Cheers,
  
   --
   Markus Jelsma - CTO - Openindex
   http://www.linkedin.com/in/markus17
   050-8536620 / 06-50258350
 
  --
  Markus Jelsma - CTO - Openindex
  http://www.linkedin.com/in/markus17
  050-8536620 / 06-50258350

 --
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350



Re: only index synonyms

2010-12-06 Thread Tom Hill
Hi Lee,


On Mon, Dec 6, 2010 at 10:56 PM, lee carroll
lee.a.carr...@googlemail.com wrote:
 Hi Erik

Nope, Erik is the other one. :-)

 thanks for the reply. I only want the synonyms to be in the index
 how can I achieve that ? Sorry probably missing something obvious in the
 docs

Exactly what he said, use the = syntax. You've already got it. Add the lines

pretty = scenic
text = words

to synonyms.txt, and it will do what you want.

Tom

 On 7 Dec 2010 01:28, Erick Erickson erickerick...@gmail.com wrote:
 See:

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

 with the = syntax, I think that's what you're looking for

 Best
 Erick

 On Mon, Dec 6, 2010 at 6:34 PM, lee carroll lee.a.carr...@googlemail.com
wrote:

 Hi Can the following usecase be achieved.

 value to be analysed at index time this is a pretty line of text

 synonym list is pretty = scenic , text = words

 valued placed in the index is scenic words

 That is to say only the matching synonyms. Basically i want to produce a
 normalised set of phrases for faceting.

 Cheers Lee C




Re: Need help with spellcheck city name

2010-09-27 Thread Tom Hill
Maybe process the city name as a single token?

On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett
savannah_becket...@yahoo.com wrote:
 Hi,
   I have city name as a text field, and I want to do spellcheck on it.  I use
 setting in http://wiki.apache.org/solr/SpellCheckComponent

 If I setup city name as text field and do spell check on San Jos for San 
 Jose,
 I get suggestion for Jos as ojos.  I checked the extendedresult and I found
 that Jose is in the middle of all 10 suggestions in term of score and
 frequency.  I then set city name as string field, and spell check again, I got
 Van for San and Ross for Jos, which is weird because San is correct.


 How do you setup spellchecker to spellcheck city names?  City name can have
 multiple words.
 Thanks.





Re: Delete Dynamic Fields

2010-09-22 Thread Tom Hill
Delete all docs with the dynamic fields, and then optimize.

On Wed, Sep 22, 2010 at 1:58 PM, Moiz Bhukhiya moiz.bhukh...@gmail.com wrote:
 Hi All:

 I had used dynamic fields for some of my fields and then later decided to
 make it static. I removed that dynamic field from the schema but I still see
 it on admin interface(FIELD LIST). Could somebody please point me out how
 can I remove these dynamic fields?

 Thanks,
 Moiz



Re: Searching solr with a two word query

2010-09-20 Thread Tom Hill
It will probably be clearer if you don't use the pseudo-boolean
operators, and just use + for required terms.

If you look at your output from debug, you see your query becomes:

    all_text:open +all_text:excel +presentation_id:294 +type:blob

Note that all_text:open does not have a + sign, but
all_text:excel has one. So all_text:open is not required, but
all_text:excel is.

I think this is because AND marks both of its operands as required.
(which puts the + on +all_text:excel), but the open has no explicit
op, so it uses OR, which marks that term as optional.

What I would suggest you do is:

   opening excellent +presentation_id:294 +type:blob

Which is think is much clearer.

I think you could also do
  opening excellent presentation_id:294 AND type:blob
but I think it's  non-obvious how the result will differ from
  opening excellent AND presentation_id:294 AND type:blob
So I wouldn't use either of the last two.


Tom
p.s. Not sure what is going on with the last lines of your debug
output for the query. Is that really what shows up after presentation
ID? I see Euro, hash mark, zero, semi-colon, and H with stroke

str name=parsedquery_toString
all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
/str

On Mon, Sep 20, 2010 at 12:46 PM, n...@frameweld.com wrote:

 Say if I had a two word query that was opening excellent, I would like it 
 to return something like:

 opening excellent
 opening
 opening
 opening
 excellent
 excellent
 excellent

 Instead of:
 opening excellent
 excellent
 excellent
 excellent

 If I did a search, I would like the first word alone to also show up in the 
 results, because currently my results show both words in one result and only 
 the second word for the rest of the results. I've done a search on each word 
 by itself, and there are results for them.

 Thanks.

 -Original Message-
 From: Erick Erickson erickerick...@gmail.com
 Sent: Monday, September 20, 2010 2:37pm
 To: solr-user@lucene.apache.org
 Subject: Re: Searching solr with a two word query

 I'm missing what you really want out of your query, your
 phrase either word as a single result just isn't connecting
 in my grey matter.. Could you give some example inputs and
 outputs that demonstrates what you want?

 Best
 Erick

 On Mon, Sep 20, 2010 at 11:41 AM, n...@frameweld.com wrote:

  I noticed that my defaultOperator is OR, and that does have an effect on
  what does come up. If I were to change that to and, it's an exact match to
  my query, but Im would like similar matches with either word as a single
  result. Is there another value I can use? Or maybe I should use another
  query parser?
 
  Thanks.
  - Noel
 
  -Original Message-
  From: Erick Erickson erickerick...@gmail.com
  Sent: Monday, September 20, 2010 10:05am
  To: solr-user@lucene.apache.org
  Subject: Re: Searching solr with a two word query
 
  Here's an excellent description of the Lucene query operators and how they
  differ from strict
  boolean logic:
  http://www.gossamer-threads.com/lists/lucene/java-user/47928
 
  http://www.gossamer-threads.com/lists/lucene/java-user/47928But the
  short
  form is that (and boy, doesn't the fact that the URL escaping spaces
  as '+', which is also a Lucene operator make looking at these interesting),
  is that the
  first term is essentially a SHOULD clause in a Lucene BooleanQuery and is
  matching your docs all by itself.
 
  HTH
  Erick
 
  On Mon, Sep 20, 2010 at 8:58 AM, n...@frameweld.com wrote:
 
   Here is my raw query:
  
  q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablobversion=1.3
   json.nl
  
  =maprows=10start=0wt=xmlhl=truehl.fl=texthl.simple.pre=span+class%3Dhlhl.simple.post=%2Fspanhl.fragsize=0hl.mergeContiguous=falsedebugQuery=on
  
   and here is what I get on the debugQuery:
   lst name=debug
   −
   str name=rawquerystring
   opening excellent AND presentation_id:294 AND type:blob
   /str
   −
   str name=querystring
   opening excellent AND presentation_id:294 AND type:blob
   /str
   −
   str name=parsedquery
   all_text:open +all_text:excel +presentation_id:294 +type:blob
   /str
   −
   str name=parsedquery_toString
   all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
   /str
   −
   lst name=explain
   −
   str name=1435675blob
  
   3.1143723 = (MATCH) sum of:
    0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
      0.5531408 = queryWeight(all_text:open), product of:
        5.3283896 = idf(docFreq=162, maxDocs=12359)
        0.10381013 = queryNorm
      0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
        1.0 = tf(termFreq(all_text:open)=1)
        5.3283896 = idf(docFreq=162, maxDocs=12359)
        0.15625 = fieldNorm(field=all_text, doc=4457)
    0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
      0.7043054 = queryWeight(all_text:excel), product of:
        6.7845535 = idf(docFreq=37, maxDocs=12359)
        0.10381013 = queryNorm
      1.0600865 = (MATCH) 

Re: Odd query result

2010-04-20 Thread Tom Hill
When I run it, with that fieldType, it seems to work for me. Here's a sample
query output

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
 int name=status0/int
 int name=QTime17/int
 lst name=params
  str name=indenton/str
  str name=start0/str

  str name=qxtext:I-Car/str
  str name=version2.2/str
  str name=rows10/str
 /lst
/lst
result name=response numFound=2 start=0
 doc
  str name=idALLCAPS/str

  str name=xtextI-CAR/str
 /doc
 doc
  str name=idCAMEL/str
  str name=xtextI-Car/str
 /doc
/result
/response


Did I miss something?

Could you show the output with debugQuery=on for the user's failing query?
Assuming I did this right, I'd next look for is a copyField. Is the user's
query really being executed against this field?

Schema.xml could be useful, too.

Tom

On Tue, Apr 20, 2010 at 10:19 AM, Charlie Jackson 
charlie.jack...@cision.com wrote:

 I've got an odd scenario with a query a user's running. The user is
 searching for the term I-Car. It will hit if the document contains the
 term I-CAR (all caps) but not if it's I-Car.  When I throw the terms
 into the analysis page, the resulting tokens look identical, and my
 I-Car tokens hit on either term.



 Here's the definition of the field:



fieldType name=text class=solr.TextField
 positionIncrementGap=100

  analyzer type=index

tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.StopFilterFactory

ignoreCase=true

words=stopwords.txt

enablePositionIncrements=true

/

filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.RemoveDuplicatesTokenFilterFactory/

  /analyzer

  analyzer type=query

tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/

filter class=solr.StopFilterFactory

ignoreCase=true

words=stopwords.txt

enablePositionIncrements=true

/

filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.RemoveDuplicatesTokenFilterFactory/

  /analyzer

/fieldType



 I'm pretty sure this has to do with the settings on the
 WordDelimiterFactory, but I must be missing something because I don't
 see anything that would cause the behavior I'm seeing.




Re: Problem with suggest search

2010-03-15 Thread Tom Hill
You need a query string with the standard request handler. (dismax has
q.alt)

Try q=*:*, if you are trying to get facets for all documents.

And yes, a friendlier error message would be a good thing.

Tom

On Mon, Mar 15, 2010 at 9:03 AM, David Rühr d...@marketing-factory.de wrote:

 Hi List.

 We have two Servers dev and live.
 Dev is not our Problem but on live we see with the facet.prefix paramter -
 if there is no q param - for suggest search this error:

 HTTP Status 500 - null java.lang.NullPointerException at
 java.io.StringReader.init(StringReader.java:54) at
 org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) at
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at
 org.apache.solr.search.QParser.getQuery(QParser.java:137) at
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:85)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
 at java.lang.Thread.run(Thread.java:811)

 The Query looks like:
 facet=onfacet.mincount=1facet.limit=10json.nl
 =mapwt=jsonrows=0version=1.2omitHeader=truefl=contentstart=0q=facet.prefix=matefacet.field=contentfq=group:0+OR+group:-2+OR+group:1+OR+group:11+-group:-1fq=language:0

 When we add the q param f.e. q=material we have no error.
 Anyone have the same error or can help?

 Thanks to all.
 David



Re: java.lang.OutOfMemoryError, VM may need to be forcibly terminated

2010-03-12 Thread Tom Hill
Hi -

The best way is probably to add more ram. :-)

That error apparently results from running out of perm gen space, and with
512m, you may not have much perm gen space.

Options for increasing this can be found
http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp

But, if you don't have enough memory, that's just going to move the problem.

You can watch memory usage with jconsole, or get more detail with something
like yourkit.

Tom


On Fri, Mar 12, 2010 at 10:17 AM, Oleg Burlaca o...@burlaca.com wrote:

 Hello,

 I've searched the list for this kind of error but never find one that is
 similar to my case:

Java HotSpot(TM) Client VM warning:
Exception java.lang.OutOfMemoryError occurred dispatching signal
SIGTERM to handler- the VM may need to be forcibly terminated


 I use the latest stable SOLR 1.4 and start it with Jetty from the /example/
 folder.

 Sometimes SOLR dies without writing to the stderrout.log
 (I use the script from http://wiki.apache.org/solr/SolrJetty)

 The messages above appears in the standard error stream instead of the log
 file. (i.e. it appears directly in the SSH window).

 I've set:
  New class=org.mortbay.thread.BoundedThreadPool
Set name=minThreads2/Set
Set name=lowThreads2/Set
Set name=maxThreads2/Set
  /New

 Is there a way to solve this? SOLR is on a VPS with 512MB of RAM.

 Regards,
 Oleg Burlaca



Re: Warning : no lockType configured for...

2010-03-02 Thread Tom Hill.

Hi Mani,

Mani EZZAT wrote:
 I'm dynamically creating cores with a new index, using the same schema 
 and solrconfig.xml

Does the problem occur if you use the same configuration in a single, static
core?

Tom

-- 
View this message in context: 
http://old.nabble.com/Re%3A-Warning-%3A-no-lockType-configured-for...-tp27740724p27758951.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple Cores Vs. Single Core for the following use case

2010-01-27 Thread Tom Hill
Hi -

I'd probably go with a single core on this one, just for ease of operations.

But here are some thoughts:

One advantage I can see to multiple cores, though, would be better idf
calculations. With individual cores, each user only sees the idf for his own
documents. With a single core, the idf will be across all documents. In
theory, better relevance.

While multi-core will use more ram to start with, and I would expect it to
use more disk (term dictionary per core). Filters would add to the memory
footprint of the multiple core setup.

However, if you only end up sorting/faceting on some of the cores, your
memory use with multiple cores may actually be less. With multiple cores,
each field cache only covers one user's docs. With single core, you have one
field cache entry per doc in the whole corpus. Depending on usage patterns,
index sizes, etc, this could be a significant amount of memory.

Tom


On Wed, Jan 27, 2010 at 11:38 AM, Amit Nithian anith...@gmail.com wrote:

 It sounds to me that multiple cores won't scale.. wouldn't you have to
 create multiple configurations per each core and does the ranking function
 change per user?

 I would imagine that the filter method would work better.. the caching is
 there and as mentioned earlier would be fast for multiple searches. If you
 have searches for the same user, then add that to your warming queries list
 so that on server startup, the cache will be warm for certain users that
 you
 know tend to do a lot of searches. This can be known empirically or by log
 mining.

 I haven't used multiple cores but I suspect that having that many
 configuration files parsed and loaded in memory can't be good for memory
 usage over filter caching.

 Just my 2 cents
 Amit

 On Wed, Jan 27, 2010 at 8:58 AM, Matthieu Labour
 matthieu_lab...@yahoo.comwrote:

  Thanks Didier for your response
  And in your opinion, this should be as fast as if I would getCore(userId)
  -- provided that the core is already open -- and then search for Paris
 ?
  matt
 
  --- On Wed, 1/27/10, didier deshommes dfdes...@gmail.com wrote:
 
  From: didier deshommes dfdes...@gmail.com
  Subject: Re: Multiple Cores Vs. Single Core for the following use case
  To: solr-user@lucene.apache.org
  Date: Wednesday, January 27, 2010, 10:52 AM
 
  On Wed, Jan 27, 2010 at 9:48 AM, Matthieu Labour
  matthieu_lab...@yahoo.com wrote:
   What I am trying to understand is the search/filter algorithm. If I
 have
  1 core with all documents and I  search for Paris for userId=123, is
  lucene going to first search for all Paris documents and then apply a
 filter
  on the userId ? If this is the case, then I am better off having a
 specific
  index for the user=123 because this will be faster
 
  If you want to apply the filter to userid first, use filter queries
  (http://wiki.apache.org/solr/CommonQueryParameters#fq). This will
  filter by userid first then search for Paris.
 
  didier
 
  
  
  
  
  
   --- On Wed, 1/27/10, Marc Sturlese marc.sturl...@gmail.com wrote:
  
   From: Marc Sturlese marc.sturl...@gmail.com
   Subject: Re: Multiple Cores Vs. Single Core for the following use case
   To: solr-user@lucene.apache.org
   Date: Wednesday, January 27, 2010, 2:22 AM
  
  
   In case you are going to use core per user take a look to this patch:
   http://wiki.apache.org/solr/LotsOfCores
  
   Trey-13 wrote:
  
   Hi Matt,
  
   In most cases you are going to be better off going with the userid
  method
   unless you have a very small number of users and a very large number
 of
   docs/user. The userid method will likely be much easier to manage, as
  you
   won't have to spin up a new core every time you add a new user.  I
 would
   start here and see if the performance is good enough for your
  requirements
   before you start worrying about it not being efficient.
  
   That being said, I really don't have any idea what your data looks
 like.
   How many users do you have?  How many documents per user?  Are any
   documents
   shared by multiple users?
  
   -Trey
  
  
  
   On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
   matthieu_lab...@yahoo.comwrote:
  
   Hi
  
  
  
   Shall I set up Multiple Core or Single core for the following use
 case:
  
  
  
   I have X number of users.
  
  
  
   When I do a search, I always know for which user I am doing a search
  
  
  
   Shall I set up X cores, 1 for each user ? Or shall I set up 1 core
 and
   add
   a userId field to each document?
  
  
  
   If I choose the 1 core solution then I am concerned with performance.
   Let's say I search for NewYork ... If lucene returns all New York
   matches for all users and then filters based on the userId, then this
   is going to be less efficient than if I have sharded per user and
 send
   the request for New York to the user's core
  
  
  
   Thank you for your help
  
  
  
   matt
  
  
  
  
  
  
  
  
  
  
   --
   View this message in context:
 
 

Re: Plurals in solr indexing

2010-01-27 Thread Tom Hill
I recommend getting familiar with the analysis tool included with solr. From
Solr's main admin screen, click on analysis, Check verbose, and enter your
text, and you can see the changes that happen during analysis.

It's really helpful, especially when getting started.

Tom


On Wed, Jan 27, 2010 at 2:41 AM, murali k ilar...@gmail.com wrote:


 Hi,
 I am having trouble with indexing plurals,

 I have the schema with following fields
 gender (field) - string (field type) (eg. data Boys)
 all (field) - text (field type)  - solr.WhitespaceTokenizerFactory,
 solr.SynonymFilterFactory, solr.WordDelimiterFilterFactory,
 solr.LowerCaseFilterFactory, SnowballPorterFilterFactory

 i am using copyField from gender to all

 and searching on all field

 When i search for Boy, I get the results, If i search for Boys i dont get
 results,
 I have tried things like boys bikes - no results
 boy bikes - works

 kid and kids are synonymns for boy and boys, so i tried adding
 kid,kids,boy,boys in synonyms hoping it will work, it doesnt work that way

 I also have other content fields which are copied to all , and it
 contains
 words like kids, boys etc...
 any idea?





 --
 View this message in context:
 http://old.nabble.com/Plurals-in-solr-indexing-tp27335639p27335639.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Improvising solr queries

2010-01-04 Thread Tom Hill
Hi -

Something doesn't make sense to me here:

On Mon, Jan 4, 2010 at 5:55 AM, dipti khullar dipti.khul...@gmail.comwrote:

 - optimize runs on master in every 7 minutes
 - using postOptimize , we execute snapshooter on master
 - snappuller/snapinstaller on 2 slaves runs after every 10 minutes


Why would you optimize every 7 minutes, and update the slaves every ten?
After 70 minutes you'll be doing both at the same time.

How about optimizing every ten minutes, at :00,:10, :20, :30, :40, :50 and
then pulling every ten minutes at :01, :11, :21, :31, :41, :51 (assuming
your optimize completes in one minute).

Or did I misunderstand something?


 The issue gets resolved as soon as we optimize the slave index. In the solr
 admin, it shows only 4 requests/sec is handled with 400 ms response time.


From your earlier description, it seems like you should only be distributing
an optimized index, so optimizing the slave should be a no-op. Check to see
what files you have on the slave after snappulling.

Tom


Re: Case Insensitive search not working

2009-12-08 Thread Tom Hill
Did you rebuild the index? Changing the analyzer for the index doesn't
affect already indexed documents.

Tom


On Tue, Dec 8, 2009 at 11:57 AM, insaneyogi3008 insaney...@gmail.comwrote:


 Hello,

 I tried to force case insensitive search by having the following setting in
 my schema.xml file which I guess is standard for Case sensitive searches :

 fieldType name=text_ws class=solr.TextField
 positionIncrementGap=100
 analyzer type = index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class = solr.LowerCaseFilterFactory/
  /analyzer

analyzer type=query
tokenizer class = solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/

/analyzer
/fieldType


 However when I perform searches on San Jose  san jose , I get 16  0
 responses back respectively is there anything else I missing here ?


 --
 View this message in context:
 http://old.nabble.com/Case-Insensitive-search-not-working-tp26699734p26699734.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: why no results?

2009-12-07 Thread Tom Hill
Hi -

That's a common one to get bit by. The string

On Mon, Dec 7, 2009 at 7:44 PM, regany re...@newzealand.co.nz wrote:


 hi all - newbie solr question - I've indexed some documents and can search
 /
 receive results using the following schema - BUT ONLY when searching on the
 id field. If I try searching on the title, subtitle, body or text field I
 receive NO results. Very confused. :confused: Can anyone see anything
 obvious I'm doing wrong Regan.



 ?xml version=1.0 ?

 schema name=core0 version=1.1

 types
fieldtype name=string class=solr.StrField
 sortMissingLast=true omitNorms=true /
 /types

  fields
 !-- general --
field  name=id type=string indexed=true stored=true
 multiValued=false required=true /
field name=title type=string indexed=true stored=true
 multiValued=false /
field name=subtitle type=string indexed=true stored=true
 multiValued=false /
field name=body type=string indexed=true stored=true
 multiValued=false /
field name=text type=string indexed=true stored=false
 multiValued=true /
  /fields

  !-- field to use to determine and enforce document uniqueness. --
  uniqueKeyid/uniqueKey

  !-- field for the QueryParser to use when an explicit fieldname is absent
 --
  defaultSearchFieldtext/defaultSearchField

  !-- SolrQueryParser configuration: defaultOperator=AND|OR --
  solrQueryParser defaultOperator=OR/

  !-- copyFields group fields into one single searchable indexed field for
 speed.  --
 copyField source=title dest=text /
 copyField source=subtitle dest=text /
 copyField source=body dest=text /

 /schema

 --
 View this message in context:
 http://old.nabble.com/why-no-results--tp26688249p26688249.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: why no results?

2009-12-07 Thread Tom Hill
Sorry, just discovered a keyboard shortcut for send. :-)

That's a common one to get bit by. The fieldtype StrField indexes the entire
field as one item. So you can only find it if your search term is everything
in the field. That is, fox will not find The Quick Brown Fox, because
it's not the whole field.

The ID field probably works because it has one term in it. 1 finds 1
just fine.

Try solr.TextField instead.

Tom


On Mon, Dec 7, 2009 at 7:47 PM, Tom Hill solr-l...@worldware.com wrote:

 Hi -

 That's a common one to get bit by. The string


 On Mon, Dec 7, 2009 at 7:44 PM, regany re...@newzealand.co.nz wrote:


 hi all - newbie solr question - I've indexed some documents and can search
 /
 receive results using the following schema - BUT ONLY when searching on
 the
 id field. If I try searching on the title, subtitle, body or text field
 I
 receive NO results. Very confused. :confused: Can anyone see anything
 obvious I'm doing wrong Regan.



 ?xml version=1.0 ?

 schema name=core0 version=1.1

 types
fieldtype name=string class=solr.StrField
 sortMissingLast=true omitNorms=true /
 /types

  fields
 !-- general --
field  name=id type=string indexed=true stored=true
 multiValued=false required=true /
field name=title type=string indexed=true stored=true
 multiValued=false /
field name=subtitle type=string indexed=true stored=true
 multiValued=false /
field name=body type=string indexed=true stored=true
 multiValued=false /
field name=text type=string indexed=true stored=false
 multiValued=true /
  /fields

  !-- field to use to determine and enforce document uniqueness. --
  uniqueKeyid/uniqueKey

  !-- field for the QueryParser to use when an explicit fieldname is
 absent
 --
  defaultSearchFieldtext/defaultSearchField

  !-- SolrQueryParser configuration: defaultOperator=AND|OR --
  solrQueryParser defaultOperator=OR/

  !-- copyFields group fields into one single searchable indexed field for
 speed.  --
 copyField source=title dest=text /
 copyField source=subtitle dest=text /
 copyField source=body dest=text /

 /schema

 --
 View this message in context:
 http://old.nabble.com/why-no-results--tp26688249p26688249.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: deleteById without solrj?

2009-12-03 Thread Tom Hill
http://wiki.apache.org/solr/UpdateXmlMessages#A.22delete.22_by_ID_and_by_Query

On Thu, Dec 3, 2009 at 11:57 AM, Joel Nylund jnyl...@yahoo.com wrote:

 Is there a url based approach to delete a document?

 thanks
 Joel




Re: Multi-Term Synonyms

2009-11-24 Thread Tom Hill
Hi Brad,


I suspect that this section from the wiki for SynonymFilterFactory might be
relevant:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

*Keep in mind that while the SynonymFilter will happily work with synonyms
containing multiple words (ie: **sea biscuit, sea biscit, seabiscuit**)
The recommended approach for dealing with synonyms like this, is to expand
the synonym when indexing. This is because there are two potential issues
that can arrise at query time:*

   1.

   *The Lucene QueryParser tokenizes on white space before giving any text
   to the Analyzer, so if a person searches for the words **sea biscit** the
   analyzer will be given the words sea and biscit seperately, and will not
   know that they match a synonym.*

   ...

Tom

On Tue, Nov 24, 2009 at 10:47 AM, brad anderson solrinter...@gmail.comwrote:

 Hi Folks,

 I was trying to get multi term synonyms to work. I'm experiencing some
 strange behavior and would like some feedback.

 In the synonyms file I have the line:

 thomas, boll holly, thomas a, john q = tom

 And I have a document with the text field as;

 tom

 However, when I do a search on boll holly, it does not return the document
 with tom. The same thing happens if I do a query on john q. But if I do a
 query on thomas, it gives me the document. Also, if I quote boll holly or
 john q it gives back the document.

 When I look at the analyzer page on the solr admin page, it is transforming
 boll holly to tom when it isn't quoted. Why is it that it is not
 returning the document? Is there some configuration I can make so it does
 return the document if I do an unquoted search on boll holly?

 My synonym filter is defined as follows, and is only defined on the query
 side:

 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/


 I've also tried changing the synonym file to be

 tom, thomas, boll holly, thomas a, john q

 This produces the same results.

 Thanks,
 Brad



Webinar: An Introduction to Basics of Search and Relevancy with Apache Solr hosted by Lucid Imagination

2009-11-23 Thread Tom Hill
In this introductory technical presentation, renowned search expert Mark
Bennett, CTO of Search Consultancy New Idea Engineering,

will present practical tips and examples to help you quickly get productive
with Solr, including:

* Working with the web command line and controlling your inputs and
outputs
* Understanding the DISMAX parser
* Using the Explain output to tune your results relevance
* Using the Schema browser

Wednesday, December 2, 2009
11:00am PST / 2:00pm EST

Click here to sign up:
http://www.eventsvc.com/lucidimagination/120209?trk=WR-DEC2009-AP


Talk on Solr - Oakland, CA June 18, 2008

2008-06-17 Thread Tom Hill - Solr

Hi -

I'll be giving a talk on Solr at the East Bay Innovations Group (eBig) Java
SIG on Wed, June 18.

http://www.ebig.org/index.cfm?fuseaction=Calendar.eventDetaileventID=16

This is an introductory / overview talk intended to get you from What is
Solr  Why Would I Use It to Cool, now I know enough go home and start
playing with Solr.

Tom

-- 
View this message in context: 
http://www.nabble.com/Talk-on-Solr---Oakland%2C-CA-June-18%2C-2008-tp17880636p17880636.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr + Tomcat Undeploy Leaks

2007-10-18 Thread Tom Hill
I certainly have seen memory problems when I just drop a new war file in
place. So now I usually stop tomcat and restart.

I used to see problems (pre-1.0) when I just redeployed repeatedly, without
even accessing the app, but I've got a little script running in the
background that has done that 50 times now, without running out of space.
Are you on a current version? I'm on 1.2

Tlom

On 10/18/07, Mike Klaas [EMAIL PROTECTED] wrote:

 I'm not sure that many people are dynamically taking down/starting up
 Solr webapps in servlet containers.  I certainly perfer process-level
 management of my (many) Solr instances.

 -Mike

 On 18-Oct-07, at 10:40 AM, Stu Hood wrote:

  Any ideas?
 
  Has anyone had experienced this problem with other containers? I'm
  not tied to Tomcat if I can find another servlet host with a REST
  api for deploying apps.
 
  Thanks,
  Stu
 
  -Original Message-
  From: Stu Hood [EMAIL PROTECTED]
  Sent: Wednesday, October 17, 2007 4:46pm
  To: solr-user@lucene.apache.org
  Subject: Solr + Tomcat Undeploy Leaks
 
  Hello,
 
  I'm using the Tomcat Manager app with 6.0.14 to start and stop Solr
  instances, and I believe I am running into a variant of the linked
  issue:
 
  http://wiki.apache.org/jakarta-commons/Logging/UndeployMemoryLeak?
  action=print
 
  According to `top`, the 'size' of the Tomcat process reaches the
  limit I have set for it with the Java -Xmx flag soon after starting
  and launching a few instances. The 'RSS' varies based on how full
  the caches are at any particular time, but I don't think it ever
  reaches the 'size'.
 
  After a few days, I will get OOM errors in the logs when I try and
  start new instances (note: this is typically in the middle of the
  night, when usage is low), and all of the instances will stop
  responding until I (hard) restart Tomcat.
 
  
 
  Has anyone run into this issue before? Is logging the culprit? If
  so, what options do I have (besides setting up a cron job to
  restart Tomcat nightly...)
 
  Thanks,
 
  Stu Hood
  Webmail.us
  You manage your business. We'll manage your email.(R)
 
 
 




Re: Availability Issues

2007-10-08 Thread Tom Hill
Hi -

We're definitely not seeing that. What do your logs show? What do your
schema/solrconfig look like?

Tom


On 10/8/07, David Whalen [EMAIL PROTECTED] wrote:

 Hi All.

 I'm seeing all these threads about availability and I'm
 wondering why my situation is so different than others'.

 We're running SOLR 1.2 with a 2.5G heap size.  On any
 given day, the system becomes completely unresponsive.
 We can't even get /solr/admin/ to come up, much less
 any select queries.

 The only thing we can do is kill the SOLR process and
 re-start it.

 We are indexing over 25 million documents and we add
 about as much as we remove daily, so the number remains
 fairly constant.

 Again, it seems like other folks are having a much
 easier time with SOLR than we are.  Can anyone help
 by sharing how you've got it configured?  Does anyone
 have a similar experience?

 TIA.

 DW




Re: Solr live at Netflix

2007-10-02 Thread Tom Hill
Nice!

And there seem to be some improvements. For example, Gamers and Gamera
no longer stem to the same word :-)

Tom

On 10/2/07, Walter Underwood [EMAIL PROTECTED] wrote:

 Here at Netflix, we switched over our site search to Solr two weeks ago.
 We've seen zero problems with the server. We average 1.2 million
 queries/day on a 250K item index. We're running four Solr servers
 with simple round-robin HTTP load-sharing.

 This is all on 1.1. I've been too busy tuning to upgrade.

 Thanks everyone, this is a great piece of software.

 wunder
 --
 Walter Underwood
 Search Guy, Netflix




Re: pluggable functions

2007-09-18 Thread Tom Hill
Hi -

I'm not sure what you mean by a reflection based approach, but I've been
thinking about doing this for a bit, since we needed it, too.

I'd just thought about listing class names in the config file. The functions
would probably need to extend a subclass of ValueSource which will handle
argument parsing for the function, so you won't need to hard code the
parsing in a VSParser subclass. I think this might simplify the existing
code a bit.

You might have to do a bit of reflection to instantiate the function. Did
you have an alternate approach in mind? Are there any other things this
would need to do?

Is anyone else working on this?

Tom




On 9/18/07, Jon Pierce [EMAIL PROTECTED] wrote:

 I see Yonik recently opened an issue in JIRA to track the addition of
 pluggable functions (https://issues.apache.org/jira/browse/SOLR-356).
 Any chance this will be implemented soon?  It would save users like me
 from having to hack the Solr source or write custom request handlers
 for trivial additions (e.g., adding a distance function), not to
 mention changes to downstream dependencies (e.g., solr-ruby).  Perhaps
 a reflection-based approach would do the trick?

 - Jon



Re: Query for German Special Characters (i.e., ä, ö, ß)

2007-09-14 Thread Tom Hill
Hi Marc,

Are you using the same stemmer on your queries that you use when indexing?

Try the analysis function in the admin UI, to see how things are stemmed for
indexing vs. querying. If they don't match for really and fünny, and do
match for kraßen, then that's your problem.

Tom


On 9/14/07, Marc Bechler [EMAIL PROTECTED] wrote:

 Hi,

 oops, the URIEncoding was lost during the update to tomcat 6.0.14.
 Thanks for the advice.

 But now I am really curioused. After indexing the document from scratch,
 I have the effect that queries to this and is work fine, whereas
 queries to really and fünny do not return the result. Fünnily ;-) ,
 after extending my sometext to This is really fünny kraßen., queries
 to really and fünny still do not work, but kraßen is found.
 Now I am somehow confused -- hopefully anyone has a good explanation ;-)

 Regards,

   marc

  Tom Hill schrieb:
  If you are using tomcat, try adding URIEncoding=UTF-8 to your
  tomcat connector.
 
  Connector port=8080 maxHttpHeaderSize=8192 maxThreads=150
  minSpareThreads=25 maxSpareThreads=75 enableLookups=false
  redirectPort=8443 acceptCount=100 connectionTimeout=2
  disableUploadTimeout=true URIEncoding=UTF-8 /
 
  use the analysis page of the admin interface to check to see what's
   happening to your queries, too.
 
  http://localhost:8080/solr/admin/analysis.jsp?highlight=on  (your
  port # may vary)
 
  Tom
 
  On 9/13/07, Marc Bechler [EMAIL PROTECTED] wrote:
  Hi SOLR kings,
 
  I'm just playing around with queries, but I was not able to query
  for any special characters like the German Umlaute (i.e., ä, ö,
  ü). Maybe others might have the same effects and already found a
  solution ;-)
 
  Here is my example: I have one field called sometext of type
  text (the one delivered with the SOLR example). I indexed a few
  words similar to
 
  field name=sometext ![CDATA[ This is really fünny
  ]]/field
 
  Works fine, and searching for really shows the result and fünny
  will be displayed correctly. However, the query for fünny using
  the /solr/admin page is resolved (correctly) to the URL
  ...q=f%C3%BCnny... but does not find the document.
 
  And now the question: Any ideas? ;-)
 
  Cheers,
 
  marc
 
 
 



Re: Query for German Special Characters (i.e., ä, ö, ß)

2007-09-14 Thread Tom Hill
Hi Marc,

The searches are going to look for an exact match of the query (after
analysis) in the index (after analysis).

So, realli will not match really.

So you want to have the same stemmer (probably not the English one, given
your examples) in both in index analyzer, and the query analyzer. I've
appended the section from solr 1.2 example schema.xml, note
EnglishPorterFilterFactory is in both sections. That would be what you want
to do, with the appropriate stemmer for your application.

Or, you could use no stemmer for BOTH, but I think most people go with
stemming. At least, I do. :-)

Tom

fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
filter class=solr.StopFilterFactory ignoreCase=true words=
stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory protected=
protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true words=
stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory protected=
protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

On 9/14/07, Marc Bechler [EMAIL PROTECTED] wrote:

 Index for really: 5* really. Query for really: 5* really, 2* realli
 (from: EnglishPorterFilterFactory {protected=protwords.txt},
 RemoveDuplicatesTokenFilterFactory {})

 For this everyting is completely fine.

 Is a complete matching required between index and query or is a partial
 matching also okay?

 Thanks for helping me

   marc




 Tom Hill schrieb:
  Hi Marc,
 
  Are you using the same stemmer on your queries that you use when
 indexing?
 
  Try the analysis function in the admin UI, to see how things are stemmed
 for
  indexing vs. querying. If they don't match for really and fünny, and do
  match for kraßen, then that's your problem.
 
  Tom
 
 
  On 9/14/07, Marc Bechler [EMAIL PROTECTED] wrote:
  Hi,
 
  oops, the URIEncoding was lost during the update to tomcat 6.0.14.
  Thanks for the advice.
 
  But now I am really curioused. After indexing the document from
 scratch,
  I have the effect that queries to this and is work fine, whereas
  queries to really and fünny do not return the result. Fünnily ;-) ,
  after extending my sometext to This is really fünny kraßen., queries
  to really and fünny still do not work, but kraßen is found.
  Now I am somehow confused -- hopefully anyone has a good explanation
 ;-)
 
  Regards,
 
marc
 
  Tom Hill schrieb:
  If you are using tomcat, try adding URIEncoding=UTF-8 to your
  tomcat connector.
 
  Connector port=8080 maxHttpHeaderSize=8192 maxThreads=150
  minSpareThreads=25 maxSpareThreads=75 enableLookups=false
  redirectPort=8443 acceptCount=100 connectionTimeout=2
  disableUploadTimeout=true URIEncoding=UTF-8 /
 
  use the analysis page of the admin interface to check to see what's
   happening to your queries, too.
 
  http://localhost:8080/solr/admin/analysis.jsp?highlight=on  (your
  port # may vary)
 
  Tom
 
  On 9/13/07, Marc Bechler  [EMAIL PROTECTED] wrote:
  Hi SOLR kings,
 
  I'm just playing around with queries, but I was not able to query
  for any special characters like the German Umlaute ( i.e., ä, ö,
  ü). Maybe others might have the same effects and already found a
  solution ;-)
 
  Here is my example: I have one field called sometext of type
  text (the one delivered with the SOLR example). I indexed a few
  words similar to
 
  field name=sometext ![CDATA[ This is really fünny
  ]]/field
 
  Works fine, and searching for really shows the result and fünny
  will be displayed correctly. However, the query for fünny using
  the /solr/admin page is resolved (correctly) to the URL
  ...q=f%C3%BCnny... but does not find the document.
 
  And now the question: Any ideas? ;-)
 
  Cheers,
 
  marc
 
 



Re: Slow response

2007-09-14 Thread Tom Hill
Hi Mike,

Thanks for clarifying what has been a bit of a black box to me.

A couple of questions, to increase my understanding, if you don't mind.

If I am only using fields with multiValued=false, with a type of string
or integer  (untokenized), does solr automatically use approach 2? Or is
this something I have to actively configure?

And is approach 2 better than 1? Or vice versa? Or is the answer it
depends? :-)

If, as I suspect, the answer was it depends, are there any general
guidelines on when to use or approach or the other?

Thanks,

Tom














On 9/6/07, Mike Klaas [EMAIL PROTECTED] wrote:


 On 6-Sep-07, at 3:25 PM, Mike Klaas wrote:

 
  There are essentially two facet computation strategies:
 
  1. cached bitsets: a bitset for each term is generated and
  intersected with the query restul bitset.  This is more general and
  performs well up to a few thousand terms.
 
  2. field enumeration: cache the field contents, and generate counts
  using this data.  Relatively independent of #unique terms, but
  requires at most a single facet value per field per document.
 
  So, if you factor author into Primary author/Secondary author,
  where each is guaranteed to only have one value per doc, this could
  greatly accelerate your faceting.  There are probably fewer unique
  subjects, so strategy 1 is likely fine.
 
  To use strategy 2, just make sure that multivalued=false is set
  for those fields in schema.xml

 I forgot to mention that strategy 2 also requires a single token for
 each doc (see http://wiki.apache.org/solr/
 FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3)

 -Mike



Re: Query for German Special Characters (i.e., ä, ö, ß)

2007-09-13 Thread Tom Hill
If you are using tomcat, try adding URIEncoding=UTF-8 to your tomcat
connector.

Connector port=8080 maxHttpHeaderSize=8192
   maxThreads=150 minSpareThreads=25 maxSpareThreads=75
   enableLookups=false redirectPort=8443 acceptCount=100
   connectionTimeout=2 disableUploadTimeout=true
URIEncoding=UTF-8 /

use the analysis page of the admin interface to check to see what's
happening to your queries, too.

http://localhost:8080/solr/admin/analysis.jsp?highlight=on  (your port # may
vary)

Tom

On 9/13/07, Marc Bechler [EMAIL PROTECTED] wrote:

 Hi SOLR kings,

 I'm just playing around with queries, but I was not able to query for
 any special characters like the German Umlaute (i.e., ä, ö, ü). Maybe
 others might have the same effects and already found a solution ;-)

 Here is my example: I have one field called sometext of type text
 (the one delivered with the SOLR example). I indexed a few words similar
 to

 field name=sometext
 ![CDATA[
 This is really fünny
 ]]/field

 Works fine, and searching for really shows the result and fünny will
 be displayed correctly. However, the query for fünny using the
 /solr/admin page is resolved (correctly) to the URL ...q=f%C3%BCnny...
 but does not find the document.

 And now the question: Any ideas? ;-)

 Cheers,

   marc



Re: update servlet not working

2007-09-06 Thread Tom Hill
I don't use the java client, but when I switched to 1.2, I'd get that
message when I forget to add the content type header, as described in
CHANGES.txt

  9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler using
the new request dispatcher (SOLR-104).  This requires posted content to
have a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8'
The response format matches that of /select and returns standard error
codes.  To enable solr1.1 style /update, do not map /update to any
handler in solrconfig.xml (ryan)

But your request log shows a GET, should be a POST, I would think. I'd
double check the parameters on post.jar


On 9/6/07, Benjamin Li [EMAIL PROTECTED] wrote:
 oops, sorry, its says missing content stream

 as far as logs go: i have a request log, didn't find anything with
 stack traces though. where is it? we're using the example one packaged
 with solr.
 GET /solr/update HTTP/1.1 400 1401

 just to make sure, i typed java -jar post.jar solrfile.xml

 thanks!

 On 9/6/07, Chris Hostetter [EMAIL PROTECTED] wrote:
  : We are able to navigate to the solr/admin page, but when we try to
  : POST an xml document via the command line, there is a fatal error. It
  : seems that the solr/update servlet isnt running, giving a http 400
  : error.
 
  a 400 could mean a lot of things ... what is the full HTTP response you
  get back from Solr?  what kinds of Stack traces show up in the Jetty log
  output?
 
 
 
 
  -Hoss
 
 


 --
 cheers,
 ben



Re: Facet for multiple values field

2007-08-30 Thread Tom Hill
Hi -

I wouldn't facet on a text field, I tend to use string for the reasons
you describe. e.g. Use

   field name=neighborhood_id type=string  indexed=true stored=true
multiValued=true/
or in your example
  field name=sensor type=string indexed=true stored=true
multiValued=true/

If I have multiple values, I add them as separate occurrences of the field I
am faceting on.

If you still need them all in one field for other reasons, use copyField to
assemble them.

Tom

On 8/30/07, Giri [EMAIL PROTECTED] wrote:

 Hi,

 I am trying to get the facet values from a field that contains multiple
 words, for example:

 I have a field keywords

 and values for this: Keywords=  relative humidity, air temperature,
 atmospheric moisture

 Please note: I am combining multiple keywords in to one single field, with
 comma delimiter

 When I query for facet, I am getting some thing like:

 - relative (10)

 - humidity (10)

 - temperature (5)


 But I really need to display:

 - relative humidity(10)

 - air temperature(5)


 How can I do this? I know I am missing something in my schema field type
 declaration. I would appreciate if any one can post me an example schema
 field type that can handle this.

 Thanks!


 Here is my schema excerpt:

fieldtype name=text class=solr.TextField
 positionIncrementGap=100

   analyzer type=index

   tokenizer class=solr.WhitespaceTokenizerFactory/

   !-- in this example, we will only use synonyms at query time

   filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/

   --

   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1/

   filter class=solr.StopFilterFactory ignoreCase=true/

   filter class=solr.LowerCaseFilterFactory/

   /analyzer

   analyzer type=query

   tokenizer class=solr.WhitespaceTokenizerFactory/

   filter class=solr.StopFilterFactory ignoreCase=true/

   filter class=solr.LowerCaseFilterFactory/

   /analyzer

 /fieldtype



 And, the I declared the field as:

 field name=sensor type=text indexed=true stored=true/


Re: How to realize index spaces

2007-08-23 Thread Tom Hill
Hi -

On 8/23/07, Marc Bechler [EMAIL PROTECTED] wrote:

 I was wondering whether or not it is possible to realize different index
 spaces with one solr instance.

 Example: imagine, you want to have 2 index spaces that coexist
 independently (and wich can be identified, e.g., by a unique id). In
 your query, you specify an id, and the query should be performed only in
 the index space with the respective id. Moreover, it should be possible
 to add/remove additional index spaces dynamically.


Just add a field that tells which index space the document belongs to
(belonging to multiple is OK). And then to query only that index space, add,
for example
fq=space:product
to your query URL. Assuming you named the field 'space', and wanted the
'product' space.

There's a related example in the example solrconfig, look at
requestHandler name=partitioned class=solr.DisMaxRequestHandler 

Tom


Synonym questions

2007-08-09 Thread Tom Hill
Hi -

Just looking at synonyms, and had a couple of questions.

1) For some of my synonyms, it seems to make senses to simply replace the
original word with the other (e.g. theatre = theater, so searches for
either will find either). For others, I want to add an alternate term while
preserving the original (e.g. cirque = circus, so searches for circus
find Cirque du Soleil, but searches for cirque only match cirque, not
circus.

I was thinking that the best way to do this was with two different synonym
filters. The replace filter would be used both at index and query time, the
other only at index time.

Does doing this using two synonym filters make sense?

section from my schema.xml
fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StandardFilterFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory words=stopwords.txt/
  filter class=solr.SynonymFilterFactory
synonyms=synonyms_replace.txt ignoreCase=true expand=false
includeOrig=false/
  filter class=solr.SynonymFilterFactory
synonyms=synonyms_add.txt ignoreCase=true expand=false
includeOrig=true/
  filter class=solr.EnglishPorterFilterFactory protected=
protwords.txt/
  /analyzer
  analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StandardFilterFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory words=stopwords.txt/
  filter class=solr.SynonymFilterFactory
synonyms=synonyms_replace.txt ignoreCase=true expand=false
includeOrig=false/
  filter class=solr.EnglishPorterFilterFactory protected=
protwords.txt/
  /analyzer
/fieldType

2) For this to work, I need to use includeOrig. It appears that
includeOrig is hard coded to be false in SynonymFilterFactory. Is there
any reason for this? It's pretty easy to change (diff below), any reason
this should not be supported?

Thanks,

Tom

Diffing vs. my local  copy of 1.2, but it appears to be the same in HEAD.

--- src/java/org/apache/solr/analysis/SynonymFilterFactory.java
+++ src/java/org/apache/solr/analysis/SynonymFilterFactory.java (working
copy)
@@ -37,6 +37,7 @@

 ignoreCase = getBoolean(ignoreCase,false);
 expand = getBoolean(expand,true);
+includeOrig = getBoolean(includeOrig,false);

 if (synonyms != null) {
   ListString wlist=null;
@@ -57,8 +58,9 @@
   private SynonymMap synMap;
   private boolean ignoreCase;
   private boolean expand;
+  private boolean includeOrig;

-  private static void parseRules(ListString rules, SynonymMap map, String
mappingSep, String synSep, boolean ignoreCase, boolean expansion) {
+  private void parseRules(ListString rules, SynonymMap map, String
mappingSep, String synSep, boolean ignoreCase, boolean expansion) {
 int count=0;
 for (String rule : rules) {
   // To use regexes, we need an expression that specifies an odd number
of chars.
@@ -88,7 +90,6 @@
 }
   }

-  boolean includeOrig=false;
   for (ListString fromToks : source) {
 count++;
 for (ListString toToks : target) {


Returning errors from request handler

2007-07-26 Thread Tom Hill

Hi -

With solr 1.2, when using XmlUpdateRequestHandler , if I post a valid
command like commit/ I get a response like

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime0/int/lst
/response

Nice, valid xml. But If I have an error (for example, commit/comit) I
get an HTML page back.

This tends to confuse the client software. Is there a way to get a return
like:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status1/intexceptionblah, blah,
blah/exeptionint name=QTime0/int/lst
/response


I've seen comments in solrconfig about setting handleSelect to true or
false, but didn't see any difference with either setting.

I've actually written my own handler, but since XmlUpdateHandler does the
same thing, I thought it would make a simple example.

Am I doing something wrong? Or is there some config I need to do, or is that
just how it is?

Tom


Using request parameters in dismax boost functions

2007-06-04 Thread Tom Hill

Hi -

Perhaps I'm missing something obvious, but it there a way to get values from
the user's request as arguments to boost functions in dismax?

I'm thinking about distance based weighting for search results, which
requires the user's x,y.

Tom


Optimizing frequently updated index

2007-05-29 Thread Tom Hill

Hi -

I have an index that is updated fairly frequently (every few seconds), and
I'm replicating to several slave servers.

Because of the frequent updates, I'm usually pushing an index that is not
optimized. And, as it takes several minutes to optimize, I don't want to do
it every time I replicate (at least not on the master).

I was wondering if it make sense to replicate to a slave instance, optimize
it there, and then distribute the optimized index from the first level
slave?

Any thoughts?

Thanks,

Tom


Re: optimize/ takes an hour

2007-05-18 Thread Tom Hill

Hi -

What happens if updates occur during the optimize?

Thanks,

Tom


Re: Index corruptions?

2007-05-07 Thread Tom Hill

Hi Charlie,

On 5/3/07, Charlie Jackson [EMAIL PROTECTED] wrote:


I have a couple of questions regarding index corruptions.

1) Has anyone using Solr in a production environment ever experienced an
index corruption? If so, how frequently do they occur?



I once had all slaves complain about a missing file in the index. The master
never had a problem. The problem went away at the next snapshot.

Is the cp-lr in snapshot really guaranteed to be atomic? Or is it just
fast, and unlikely to be interrupted?

This has only occurred once over the last 5  months.

2) It seems like the CollectionDistribution setup would be a good way to

put in place a recovery plan for (or at least have some viable backups
of) the index. However, I have a small concern that if the index gets
corrupted on the master server, the corruption would propagate down to
the slave servers as well. Is this concern unfounded?



I would expect this to be true.

Also, each of the

snapshots taken by snapshooter are viable full indexes, correct? If so,
that means I'd have a backup of the index each and every time a commit
(or optimize for that matter) is done, which would be awesome.



That's my understanding.

Tom


Re: Group results by field?

2007-05-02 Thread Tom Hill

Hi Matthew,

You might be able to just get away with just using facets, depending on
whether your goal is to provide a clickable list of styles_ids to the user,
or if you want to only return one search result for each style_id.

For a list of clickable styles, it's basic faceting, and works really well.

http://wiki.apache.org/solr/SimpleFacetParameters
Facet on style_id, present the list of facets to the user, and if the user
selects style_id =37, then reissue the query with one more clause
(+style_id:37)

If you want the ability to only show one search result from each group, then
you might consider the structure of your data. Is each style/size a separate
record? Or is each style a record with multi-valued sizes? The latter might
give you what you really want.

Or, if you really want to remove dups from search results, you could do what
I've done.I ended up modifying SolrIndexSearcher, and replacing
FieldSortedHitQueue, and ScorePriorityQueue with versions that remove dups
based in a particular field.

Tom


On 5/2/07, Matthew Runo [EMAIL PROTECTED] wrote:


Hello!

I was wondering - is it possible to search and group the results by a
given field?

For example, I have an index with several million records. Most of
them are different sizes of the same style_id.

I'd love to be able to do.. group.by=style_id or something like that
in the results, and provide the style_id as a clickable link to see
all the sizes of that style.

Any ideas?

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++





Re: browse a facet without a query?

2007-04-23 Thread Tom Hill

Hi -

On 4/23/07, Yonik Seeley [EMAIL PROTECTED] wrote:


On 4/23/07, Jennifer Seaman [EMAIL PROTECTED] wrote:
 When there is no q Solr complains. How can I browse a facet without
 a keyword query? For example, I want to view all document for a given
state;

 ?q=fq=state:California

With a relatively recent nightly build, you can use q=*:*
Before that, use an open-ended range query like q=state:[* TO *]



I was doing the q=state[* TO *] for a short time, and found it very slow. I
switched to doing a query on a single field that covered the part of the
index I was interested in, for example:

inStock:true

And got much faster performance. I was getting execution times in seconds
(for example, I just manually did this and got. 2.2 seconds for the [* TO
*], and 50 milliseconds for the latter (inStock:true), uncached)

In my case the filter query hits about 80% of the docs, so it's doing a
similar amount of work. I don't know how well *:* performs, but if it is
similar to state:[* TO *], I would benchmark it before using.

For us, facet queries are a high percentage, so the time was critical. It
might even be worth adding a field, if you don't already have an appropriate
one.

Tom