Re: Collection Distirbution in windows
damn, there goes the platform independance ... is there anybody with a lillte more experience when it comes to collection distribution on Windows ? tnx in advance ! Bill Au [EMAIL PROTECTED] 02/05/2007 15:09 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: Collection Distirbution in windows The collection distribution scripts relies on hard links and rsync. It seems that both maybe avaialble on Windows hard links: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/fsutil_hardlink.mspx?mfr=true rsync: http://samba.anu.edu.au/rsync/download.html I say maybe because I don't know if hard link on windows work the same way as hard link on Linux/Unix. You will also need something like cygwin to run the bash scripts. Bill On 5/2/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: i know this is a stupid question, but are there any collection distribution scripts for windows available ? thanks !
Collection Distirbution in windows
i know this is a stupid question, but are there any collection distribution scripts for windows available ? thanks !
Re: AW: Leading wildcards
hey, we've stumbled on something weird while using wildcards we enabled leading wildcards in solr (see previous message from Christian Burkamp) when we do a search on a nonexisting field, we get a SolrException: undefined field (this was for query nonfield:test) but when we use wildcards in our query, we dont get the undefined field exception, so the query nonfield:*test works fine ... just zero results... is this normal behaviour ? Burkamp, Christian [EMAIL PROTECTED] 19/04/2007 12:37 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject AW: Leading wildcards Hi there, Solr does not support leading wildcards, because it uses Lucene's standard QueryParser class without changing the defaults. You can easily change this by inserting the line parser.setAllowLeadingWildcards(true); in QueryParsing.java line 92. (This is after creating a QueryParser instance in QueryParsing.parseQuery(...)) and it obviously means that you have to change solr's source code. It would be nice to have an option in the schema to switch leading wildcards on or off per field. Leading wildcards really make no sense on richly populated fields because queries tend to result in too many clauses exceptions most of the time. This works for leading wildcards. Unfortunately it does not enable searches with leading AND trailing wildcards. (E.g. searching for *lega* does not find results even if the term elegance is in the index. If you put a second asterisk at the end, the term elegance is found. (search for *lega** to get hits). Can anybody explain this though it seems to be more of a lucene QueryParser issue? -- Christian -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 19. April 2007 08:35 An: solr-user@lucene.apache.org Betreff: Leading wildcards hi, we have been trying to get the leading wildcards to work. we have been looking around the Solr website, the Lucene website, wiki's and the mailing lists etc ... but we found a lot of contradictory information. so we have a few question : - is the latest version of lucene capable of handling leading wildcards ? - is the latest version of solr capable of handling leading wildcards ? - do we need to make adjustments to the solr source code ? - if we need to adjust the solr source, what do we need to change ? thanks in advance ! Maarten
Re: AW: Leading wildcards
thanks, this worked like a charm !! we built a custom QueryParser and we integrated the *foo** in it, so basically we can now search leading, trailing and both ... only crappy thing is the max Boolean clauses, but i'm going to look into that after the weekend for the next release of Solr : do not make this default, too many risks but do make an option in the config to enable it, it's a very nice feature thanks everybody for the help and have a nice weekend, maarten Burkamp, Christian [EMAIL PROTECTED] 19/04/2007 12:37 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject AW: Leading wildcards Hi there, Solr does not support leading wildcards, because it uses Lucene's standard QueryParser class without changing the defaults. You can easily change this by inserting the line parser.setAllowLeadingWildcards(true); in QueryParsing.java line 92. (This is after creating a QueryParser instance in QueryParsing.parseQuery(...)) and it obviously means that you have to change solr's source code. It would be nice to have an option in the schema to switch leading wildcards on or off per field. Leading wildcards really make no sense on richly populated fields because queries tend to result in too many clauses exceptions most of the time. This works for leading wildcards. Unfortunately it does not enable searches with leading AND trailing wildcards. (E.g. searching for *lega* does not find results even if the term elegance is in the index. If you put a second asterisk at the end, the term elegance is found. (search for *lega** to get hits). Can anybody explain this though it seems to be more of a lucene QueryParser issue? -- Christian -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 19. April 2007 08:35 An: solr-user@lucene.apache.org Betreff: Leading wildcards hi, we have been trying to get the leading wildcards to work. we have been looking around the Solr website, the Lucene website, wiki's and the mailing lists etc ... but we found a lot of contradictory information. so we have a few question : - is the latest version of lucene capable of handling leading wildcards ? - is the latest version of solr capable of handling leading wildcards ? - do we need to make adjustments to the solr source code ? - if we need to adjust the solr source, what do we need to change ? thanks in advance ! Maarten
Leading wildcards
hi, we have been trying to get the leading wildcards to work. we have been looking around the Solr website, the Lucene website, wiki's and the mailing lists etc ... but we found a lot of contradictory information. so we have a few question : - is the latest version of lucene capable of handling leading wildcards ? - is the latest version of solr capable of handling leading wildcards ? - do we need to make adjustments to the solr source code ? - if we need to adjust the solr source, what do we need to change ? thanks in advance ! Maarten
Re: C# API for Solr
Well, i think there will be a lot of people who will be very happy with this C# client. grts,m Jeff Rodenburg [EMAIL PROTECTED] 31/03/2007 18:00 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject C# API for Solr We built our first search system architecture around Lucene.Net back in 2005 and continued to make modifications through 2006. We quickly learned that search management is so much more than query algorithms and indexing choices. We were not readily prepared for the operational overhead that our Lucene-based search required: always-on availability, fast response times, batch and real-time updates, etc. Fast forward to 2007. Our front-end is Microsoft-based, but we needed to support parallel development on non-Microsoft architecture, and thus needed a cross-platform search system. Hello Solr! We've transitioned our search system to Solr with a Linux/Tomcat back-end, and it's been a champ. We now use solr not only for standard keyword search, but also to drive queries for lots of different content sections on our site. Solr has moved beyond mission critical in our operation. As we've proceeded, we've built out a nice C# client library to abstract the interaction from C# to Solr. It's mostly generic and designed for extensibilty. With a few modifications, this could be a stand-alone library that works for others. I have clearance from the organization to contribute our library to the community if there's interest. I'd first like to gauge the interest of everyone before doing so; please reply if you do. cheers, jeff r.
Re: Reposting unABLE to match
what exactly is the problem ? seems like you end up with the same term text in both query and index analyzer ... you should have found a match... Shridhar Venkatraman [EMAIL PROTECTED] 27/03/2007 14:08 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Reposting unABLE to match Solr http://localhost:8084/Genie/ Solr Admin (GENIE) ShridharVAIO:8084 cwd=C:\Program Files\netbeans-5.5\enterprise3\apache-tomcat-5.5.17\bin SolrHome=c:\Documents and Settings\Shridhar\Desktop\Public\Sana\KN\Genie\GenieConf/ Field Analysis *Field name* *Field value (Index)* verbose output highlight matchesunABLE TO CONNECT *Field value (Query)* verbose output unABLE TO CONNECT Index Analyzer org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory {} term position1 2 3 term textunABLE TO CONNECT term typewordwordword source start,end 0,7 8,1011,19 org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, expand=true, ignoreCase=true} term position1 2 3 term textunABLE TO CONNECT term typewordwordword source start,end 0,7 8,1011,19 org.apache.solr.analysis.StandardFilterFactory {} term position1 2 3 term textunABLE TO CONNECT term typewordwordword source start,end 0,7 8,1011,19 org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position1 2 term textunABLE CONNECT term typewordword source start,end 0,7 11,19 org.apache.solr.analysis.WordDelimiterFilterFactory {generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=1, catenateNumbers=1} term position1 2 3 term textun ABLECONNECT unABLE term typewordwordword word source start,end 1,3 3,7 11,18 1,7 org.apache.solr.analysis.LowerCaseFilterFactory {} term position1 2 3 term textun ableconnect unable term typewordwordword word source start,end 1,3 3,7 11,18 1,7 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position1 2 3 term textun ableconnect unable term typewordwordword word source start,end 1,3 3,7 11,18 1,7 Query Analyzer org.apache.solr.analysis.HTMLStripStandardTokenizerFactory {} term position1 2 3 term textunABLE TO CONNECT term typeALPHANUM ALPHANUM ALPHANUM source start,end 1,7 8,1011,18 org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, expand=true, ignoreCase=true} term position1 2 3 term textunABLE TO CONNECT term typeALPHANUM ALPHANUM ALPHANUM source start,end 1,7 8,1011,18 org.apache.solr.analysis.StandardFilterFactory {} term position1 2 3 term textunABLE TO CONNECT term typeALPHANUM ALPHANUM ALPHANUM source start,end 1,7 8,1011,18 org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position1 2 term textunABLE CONNECT term typeALPHANUM ALPHANUM source start,end 1,7 11,18
Re: Reposting unABLE to match
the only thing i can think of is the fact that in the index analysis the term-type is word and in the query analysis the term-type is alphanumeric you should be getting a match if that doesnt matter ... you get exactly the same term texts ... Shridhar Venkatraman [EMAIL PROTECTED] 27/03/2007 14:08 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Reposting unABLE to match Solr http://localhost:8084/Genie/ Solr Admin (GENIE) ShridharVAIO:8084 cwd=C:\Program Files\netbeans-5.5\enterprise3\apache-tomcat-5.5.17\bin SolrHome=c:\Documents and Settings\Shridhar\Desktop\Public\Sana\KN\Genie\GenieConf/ Field Analysis *Field name* *Field value (Index)* verbose output highlight matchesunABLE TO CONNECT *Field value (Query)* verbose output unABLE TO CONNECT Index Analyzer org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory {} term position1 2 3 term textunABLE TO CONNECT term typewordwordword source start,end 0,7 8,1011,19 org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, expand=true, ignoreCase=true} term position1 2 3 term textunABLE TO CONNECT term typewordwordword source start,end 0,7 8,1011,19 org.apache.solr.analysis.StandardFilterFactory {} term position1 2 3 term textunABLE TO CONNECT term typewordwordword source start,end 0,7 8,1011,19 org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position1 2 term textunABLE CONNECT term typewordword source start,end 0,7 11,19 org.apache.solr.analysis.WordDelimiterFilterFactory {generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=1, catenateNumbers=1} term position1 2 3 term textun ABLECONNECT unABLE term typewordwordword word source start,end 1,3 3,7 11,18 1,7 org.apache.solr.analysis.LowerCaseFilterFactory {} term position1 2 3 term textun ableconnect unable term typewordwordword word source start,end 1,3 3,7 11,18 1,7 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position1 2 3 term textun ableconnect unable term typewordwordword word source start,end 1,3 3,7 11,18 1,7 Query Analyzer org.apache.solr.analysis.HTMLStripStandardTokenizerFactory {} term position1 2 3 term textunABLE TO CONNECT term typeALPHANUM ALPHANUM ALPHANUM source start,end 1,7 8,1011,18 org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, expand=true, ignoreCase=true} term position1 2 3 term textunABLE TO CONNECT term typeALPHANUM ALPHANUM ALPHANUM source start,end 1,7 8,1011,18 org.apache.solr.analysis.StandardFilterFactory {} term position1 2 3 term textunABLE TO CONNECT term typeALPHANUM ALPHANUM ALPHANUM source start,end 1,7 8,1011,18 org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position1 2 term textunABLE CONNECT term typeALPHANUM
Re: multiple indexes
Why not create a multivalued field that stores the customer perms? add has_access:cust1 has_access:cust2, etc to the document at index time, and turn this into a filter query at query time? that is what we are doing at the moment, and i must say, it works very and does not slow the server down at all (because of the efficient indexes that solr builds) Mike Klaas [EMAIL PROTECTED] 22/03/2007 19:15 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: multiple indexes On 3/22/07, Kevin Osborn [EMAIL PROTECTED] wrote: Here is an issue that I am trying to resolve. We have a large catalog of documents, but our customers (several hundred) can only see a subset of those documents. And the subsets vary in size greatly. And some of these customers will be creating a lot of traffic. Also, there is no way to map the subsets to a query. The customer either has access to a document or they don't. Has anybody worked on this issue before? If I use one large index and do the filtering in my application, then Solr will be serving a lot of useless documents. The counts would also be screwed up for facet queries. Is the best solution to extend Solr and do the filtering there? The other potential solution is to have one index per customer. This would require one instance of the servlet per index, correct? It just seems like this would require a lot of hardware and complexity (configuring the memory of each servlet instance to index size and traffic). Why not create a multivalued field that stores the customer perms? add has_access:cust1 has_access:cust2, etc to the document at index time, and turn this into a filter query at query time? -Mike
Re: Problems with special characters
we didnt use it, but i took a quick look : you need to implement the hl=on attribute in the getquerystring() method of the solrqueryImpl the resultdocs allready contain highlighting, that's why you found processHighlighting in the Resultparser good luck ! m Thierry Collogne [EMAIL PROTECTED] 21/03/2007 17:04 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: Problems with special characters Thank you. When I add the code you described, the Solr Java Client works. One more question about the Solr Java Client. Does it allow the use of highlighting? I void a processHighlighting method in ResultsParser.java, but I can't find a way of enabling it. Did you use highlighting? On 21/03/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: hey, we had the same problem with the Solr Java Client ... they forgot to put UTF-8 encoding on the stream ... i posted our fix on http://issues.apache.org/jira/browse/SOLR-20 it's this post : http://issues.apache.org/jira/browse/SOLR-20#action_12478810 Frederic Hennequin [07/Mar/07 08:27 AM] grts,m Bertrand Delacretaz [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 21/03/2007 11:19 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: Problems with special characters On 3/21/07, Thierry Collogne [EMAIL PROTECTED] wrote: I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar call and the problem isn't there anymore... ok, thanks for the feedback! -Bertrand
Re: How to assure a permanent index.
well, yes indeed :) but i do think it is easier to put up synchronisation for deleted documents as well clearing the whole index is kind of overkill when you do this : * delete all documents * submit all documents * commit you should also keep in mind that Solr will do an autocommit after a certain number of documents ... so if the process takes a couple of minutes/hours, you might end up with an empty index and no results for the users ! cheers, m Walter Underwood [EMAIL PROTECTED] 21/03/2007 17:32 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: How to assure a permanent index. On 3/21/07 1:33 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: note that you dont have to delete all documents you can just upload new documents with the same UniqueID and Solr will delete the old documents automaticly ... this way you are guaranteed not to have an empty index That works if you keep track of all documents that have disappeared since the last index run. Otherwise, you end up with orphans in the search index, documents that exist in search, but not in the real world, also known as serving 404's in results. wunder -- Walter Underwood Search Guru, Netflix
Re: Problems with special characters
nice one ! Thierry Collogne [EMAIL PROTECTED] 22/03/2007 09:00 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: Problems with special characters Thanks. I made some modifications to SolrQuery.java to allow highlighting. I will post the code on http://issues.apache.org/jira/browse/SOLR-20 On 22/03/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: we didnt use it, but i took a quick look : you need to implement the hl=on attribute in the getquerystring() method of the solrqueryImpl the resultdocs allready contain highlighting, that's why you found processHighlighting in the Resultparser good luck ! m Thierry Collogne [EMAIL PROTECTED] 21/03/2007 17:04 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: Problems with special characters Thank you. When I add the code you described, the Solr Java Client works. One more question about the Solr Java Client. Does it allow the use of highlighting? I void a processHighlighting method in ResultsParser.java, but I can't find a way of enabling it. Did you use highlighting? On 21/03/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: hey, we had the same problem with the Solr Java Client ... they forgot to put UTF-8 encoding on the stream ... i posted our fix on http://issues.apache.org/jira/browse/SOLR-20 it's this post : http://issues.apache.org/jira/browse/SOLR-20#action_12478810 Frederic Hennequin [07/Mar/07 08:27 AM] grts,m Bertrand Delacretaz [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 21/03/2007 11:19 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: Problems with special characters On 3/21/07, Thierry Collogne [EMAIL PROTECTED] wrote: I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar call and the problem isn't there anymore... ok, thanks for the feedback! -Bertrand
Re: How to assure a permanent index.
the documents are only deleted when you do a commit ... so you should never have an empty index (or at least not for more then a couple of seconds) note that you dont have to delete all documents you can just upload new documents with the same UniqueID and Solr will delete the old documents automaticly ... this way you are guaranteed not to have an empty index grts,m Thierry Collogne [EMAIL PROTECTED] 21/03/2007 09:22 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: How to assure a permanent index. Sorry. Did a send by accident. This the next part of the mail. I mean if I do the following. - delete all documents from the index - add all documents - do a commit. Will this result in a temporary empty index, or will I always have results?
Re: Bug ? unique id
thanks for your reply... it kind of solved our problem ! we were in fact using Tokenizers that produce multiple tokens ... so i guess there is no other way for us then to use the copyfield workaround. it would maybe be a good idea to have Lucene check the *stored* value for duplicate keys ... that seems so much more logical to me ! (imho, it makes no sense to check the *indexed* value for duplicate keys, but maybe there is a reason ?) or maybe give us the option to choose wether Lucene should check the *stored* or *indexed* value for duplicate keys. it is really confusing to get duplicate unique key *stored* values back from the server (and kind of frustrating) since we now use a copyfield to perform searches on the IDs, there is no more reason to index our unique key field what would happen if I set indexed=false on my unique id field ?? Maarten :-) Chris Hostetter [EMAIL PROTECTED] 16/03/2007 19:14 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: Bug ? unique id : but can someone please answer my question :'( : is it illegal to put filters on the unique id ? : or is it a bug that we get duplicate id's? : or is this a know issue (since everybody is using copyfields?) there's nothing illegal about using an Analyzer on your uniqueKey, but you have to ensure that your Analyzer: 1) never produces multiple tokens (ie: KeywordTokenizer is fine) 2) never produces duplicate output for differnet (legal) input. ...so if your dataset can legally contain two different documnets whose keys are foo bar and Foo Bar you certianly wouldn't want to use a Whitspace or StandardTokenizer -- but you also wouldn't ever want to use the LowerCaseFilter. If however you really wanted to ignore all punctuation in keys when clients upload documents to you, and trust that doc 1234-56-7890 is the same as doc 1234567890 then something liek hte pattern striping filter would be fine. the thing to understnad is that it's the *indexed* value of the uniqueKey that must be unique in order for Solr to do things properly ... it has to be able to search on that uniqueKey term to delete/replace a doc properly. -Hoss
Re: Bug ? unique id
because we want to be able to search our unique id's :) and we would like to use the Latin character filter and the Lowercase filter so our searches dont have to be case sensitive and stuff. thanks for the quick response! grts,m Erik Hatcher [EMAIL PROTECTED] 16/03/2007 12:09 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: Bug ? unique id Why in the world would you want to analyze your unique id? Erik On Mar 16, 2007, at 6:07 AM, [EMAIL PROTECTED] wrote: Hello, we have been using Solr for a month now and we are running into a lot of trouble . one of the issues is a problem with the unique id field. can this field have analyzer, filters and tokenizers on it ?? because when we use filters or tokenizers on our unique id field, we get duplicate id's. thanks in advance, maarten
Re: Bug ? unique id
yes, that is exactly what we are doing now ... copyfield with the filters ... we figured that much :) but we are talking about a couple of million records, so the less data we copy the better ... but can someone please answer my question :'( is it illegal to put filters on the unique id ? or is it a bug that we get duplicate id's? or is this a know issue (since everybody is using copyfields?) thanks for all your replys ! grts,m Paul Borgermans [EMAIL PROTECTED] 16/03/2007 16:12 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: Bug ? unique id Hi Maarten Why not copy your unique id into another field with the required filters and use that for search? Regards Paul On 3/16/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: because we want to be able to search our unique id's :) and we would like to use the Latin character filter and the Lowercase filter so our searches dont have to be case sensitive and stuff. thanks for the quick response! grts,m Erik Hatcher [EMAIL PROTECTED] 16/03/2007 12:09 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: Bug ? unique id Why in the world would you want to analyze your unique id? Erik On Mar 16, 2007, at 6:07 AM, [EMAIL PROTECTED] wrote: Hello, we have been using Solr for a month now and we are running into a lot of trouble . one of the issues is a problem with the unique id field. can this field have analyzer, filters and tokenizers on it ?? because when we use filters or tokenizers on our unique id field, we get duplicate id's. thanks in advance, maarten