Re: Filter to cut out all zeors?
won't this replace *all* 0s ? ie, 1024 will become 124 ? _ {Beto|Norberto|Numard} Meijome "The only people that never change are the stupid and the dead" Jorge Luis Borges. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. On 11 March 2010 03:24, Sebastian F wrote: > yes, thank you. That was exactly what I was looking for! Great help! > > > > > > From: Ahmet Arslan > To: solr-user@lucene.apache.org > Sent: Tue, March 9, 2010 7:26:46 PM > Subject: Re: Filter to cut out all zeors? > > > I'm trying to figure out the best way to cut out all zeros > > of an input string like "01.10." or "022.300"... > > Is there such a filter in Solr or anything similar that I > > can adapt to do the task? > > With solr.MappingCharFilterFactory[1] you can replace all zeros with "" > before tokenizer. > > > > SolrHome/conf/mapping.txt file will contain this line: > > "0" => "" > > So that "01.10." will become "1.1." and "022.300" will become "22.3" Is > that you want? > > [1] > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.MappingCharFilterFactory > > > >
Re: weird problem with letters S and T
On Wed, 28 Oct 2009 19:20:37 -0400 Joel Nylund wrote: > Well I tried removing those 2 letters from stopwords, didnt seem to > help, I also tried changing the field type to "text_ws", didnt seem to > work. Any other ideas? Hi Joel, if your stop word filter was applied on index, you will have to reindex again (at least those documents with S and T). If your stop filter was *only* on query, then it should work after you reloaded your app. b _ {Beto|Norberto|Numard} Meijome "Those who do not remember the past are condemned to repeat it." George Santayana I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: 99.9% uptime requirement
On Mon, 3 Aug 2009 13:15:44 -0700 "Robert Petersen" wrote: > Thanks all, I figured there would be more talk about daemontools if there > were really a need. I appreciate the input and for starters we'll put two > slaves behind a load balancer and grow it from there. > Robert, not taking away from daemon tools, but daemon tools won't help you if your whole server goes down. don't put all your eggs in one basket - several servers, load balancer (hardware load balancers x 2, haproxy, etc) and sure, use daemon tools to keep your services running within each server... B _ {Beto|Norberto|Numard} Meijome "Why do you sit there looking like an envelope without any address on it?" Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Updating Solr index from XML files
On Tue, 7 Jul 2009 22:16:04 -0700 Francis Yakin wrote: > > I have the following "curl" cmd to update and doing commit to Solr ( I have > 10 xml files just for testing) [...] hello, DIH supports XML, right? not sure if it works with n files...but it's worth looking at it. alternatively, u can write a relatively simple java app that will pick each file up and post it for you using SolrJ b _ {Beto|Norberto|Numard} Meijome "Mix a little foolishness with your serious plans; it's lovely to be silly at the right moment." Horace I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using "http" connection?
On Tue, 7 Jul 2009 13:54:07 -0700 Francis Yakin wrote: [...] > much on our setup. > > Like said we have file name "test.xml" which come from SQL output , we put it > locally on the solr server under "/opt/test.xml" > > So, I need to execute the commands from solr system to add and update this to > the solr data/indexes. > > What commands do I have to use, for example the xml file > named" /opt/test.xml" ? > Francis, as much as we can tell you the answer, have you tried reading the documentation in the wiki, and the example setup bundled with SOLR? Most, if not ALL your questions, are answered there. Good luck, B _ {Beto|Norberto|Numard} Meijome Computers are like air conditioners; they can't do their job properly if you open windows. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using "http" connection?
On Mon, 6 Jul 2009 09:56:03 -0700 Francis Yakin wrote: > Norberto, > > Thanks, I think my questions is: > > >>why not generate your SQL output directly into your oracle server as a file > > What type of file is this? > > a file in a format that you can then import into SOLR. _ {Beto|Norberto|Numard} Meijome "Gravity cannot be blamed for people falling in love." Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using "http" connection?
On Sun, 5 Jul 2009 10:28:16 -0700 Francis Yakin wrote: [...]> > >upload the file to your SOLR server? Then the data file is local to your SOLR > >server , you will bypass any WAN and firewall you may be having. (or some > >variation of it, sql -> SOLR server as file, etc..) > > How we upload the file? Do we need to convert the data file to Lucene Index > first? And Documentation how we do this? pick your poison... rsync? ftp? scp ? B _ {Beto|Norberto|Numard} Meijome "The freethinking of one age is the common sense of the next." Matthew Arnold I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using "http" connection?
On Sun, 5 Jul 2009 21:36:35 +0200 Marcus Herou wrote: > Sharing some of our exports from DB to solr. Note: many of the statements > below might not work due to clip-clip. thx Marcus - but that's a DIH config right? :) b _ {Beto|Norberto|Numard} Meijome "I respect faith, but doubt is what gives you an education." Wilson Mizner I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using "http" connection?
On Thu, 2 Jul 2009 11:02:28 -0700 Francis Yakin wrote: > Norberto, Thanks for your input. > > What do you mean with "Have you tried connecting to SOLR over HTTP from > localhost, therefore avoiding any firewall issues and network latency ? it > should work a LOT faster than from a remote site." ? > > > Here are how our servers lay out: > > 1) Database ( Oracle ) is running on separate machine > 2) Solr master is running on separate machine by itself > 3) 6 solr slaves ( these 6 pulll the index from master using rsync) > > We have a SQL(Oracle) script to post the data/index from Oracle Database > machine to Solr Master over http. We wrote those script(Someone in Oracle > Database administrator write it). You said in your other email you are having issues with slow transfers between 1) and 2). Your subject relates to the data transfer between 1) and 2, - 2) and 3) is irrelevant to this part. My question (what you quoted above) relates to the point you made about it being slow ( WHY is it slow?), and issues with opening so many connections through firewall. so, I'll rephrase my question (see below...) [] > > We can not do localhost since it's solr is not running on Oracle machine. why not generate your SQL output directly into your oracle server as a file, upload the file to your SOLR server? Then the data file is local to your SOLR server , you will bypass any WAN and firewall you may be having. (or some variation of it, sql -> SOLR server as file, etc..) Any speed issues that are rooted in the fact that you are posting via HTTP (vs embedded solr or DIH) aren't going to go away. But it's the simpler approach without changing too much of your current setup. > Another alternative that we think of is to transform XML into CSV and > import/export it. > > How about if LUSQL, some mentioned about this? Is this apps free(open source) > application? Do you have any experience with this apps? Not i, sorry. Have you looked into DIH? It's designed for this kind of work. B _ {Beto|Norberto|Numard} Meijome "Great spirits have often encountered violent opposition from mediocre minds." Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using "http" connection?
On Thu, 2 Jul 2009 11:28:51 -0700 Francis Yakin wrote: > Norberto, > Hi Francis, Please reply to the list, or keep it in CC. > You saying: > > "Other alternatives are to transform the XML into csv and import it that way" > > How do you transfer that CSV file to Solr? > http://wiki.apache.org/solr/UpdateCSV There actually is a LOT of information in the wiki, as well as the mailing list archives. good luck, B _ {Beto|Norberto|Numard} Meijome "The freethinking of one age is the common sense of the next." Matthew Arnold I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is it problem? I use solr to search and index is made by lucene. (not EmbeddedSolrServer(wiki is old))
On Thu, 2 Jul 2009 16:12:58 +0800 James liu wrote: > I use solr to search and index is made by lucene. (not > EmbeddedSolrServer(wiki is old)) > > Is it problem when i use solr to search? > > which the difference between Index(made by lucene and solr)? Hi James, make sure the version of Lucene used to create your index is the same as the libraries included in your version of SOLR. it should work. it may be that an older lucene index works with a newer lucene-provided-in-solr libs, but after using it you may not be able to go back , but i am not sure of the details. probably an FAQ by now - check the archives :) good luck, B _ {Beto|Norberto|Numard} Meijome "He has no enemies, but is intensely disliked by his friends." Oscar Wilde I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using "http" connection?
On Wed, 1 Jul 2009 15:07:12 -0700 Francis Yakin wrote: > > We have several thousands of xml files in database that we load it to solr > master The Database uses "http" connection and transfer those files to solr > master. Solr then translate xml files to their lindex. > > We are experiencing issue with close/open connection in the firewall and very > very slow. > > Is there any other way to load the data/index from Database to solr master > beside using http connection, so it means we just scp/ftp the xml file from > Database system to solr master and let solr convert those to lucene indexes? > Francis, after reading the whole thread, it seems you have : - Data source : Oracle DB, on separate location to your SOLR. - Data format : XML output. definitely DIH is a great option, but since you are on 1.2, not available to you (you should look into upgrading if you can!). Have you tried connecting to SOLR over HTTP from localhost, therefore avoiding any firewall issues and network latency ? it should work a LOT faster than from a remote site. Also make sure not to commit until you really needed. Other alternatives are to transform the XML into csv and import it that way. Or write a simple app that will parse the xml and post it directly using the embedded solr method. plenty of options, all of them documented @ solr's site. good luck, b _ {Beto|Norberto|Numard} Meijome "People demand freedom of speech to make up for the freedom of thought which they avoid. " Soren Aabye Kierkegaard I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Excluding Characters and SubStrings in a Faceted Wildcard Query
On Mon, 29 Jun 2009 15:10:59 +0100 Ben wrote: > Hi Erik, > > I'm not sure exactly how much context you need here, so I'll try to keep > it short and expand as needed. > > The column I am faceting contains a comma deliniated set of vectors. > Each vector is made up of {Make,Year,Model} e.g. > _ford_1996_focus,mercedes_1996_clk,ford_2000_focus > > I have a custom request handler, where if I want to find all the cars > from 1996 I pass in a facet query for the Year (1996) which is > transformed to a wildcard facet query : > > _*_1996_* > > In otherwords, it'll match any records whose vector column contains a > string, which somewhere has a car from 1996. > > Why not put the Make, Year and Model in separate columns and do a facet > query of multiple columns?... because once we've selected 1996, we > should (in the above example) then be offering "ford and mercedes" as > further facet choices, and nothing more. If the parts were in their own > columns, there would be no way to tie the Makes and Models to specific > years, for example. > [...] Hi, It must be late and I probably need more $coffee... but isn't what u just described (search for 1996, show 'ford', 'mercedes') how facets DO work? once you have the facet on the make field, and solr told you that both 'ford' and 'mercedes' are available in that field, it is up to you to search for 'make=ford and date=1996" if you ONLY want fords, generation 1996... cheers, B _ {Beto|Norberto|Numard} Meijome "He has the attention span of a lightning bolt." Robert Redford I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr document security
On Wed, 24 Jun 2009 23:20:26 -0700 (PDT) pof wrote: > > Hi, I am wanting to add document-level security that works as following: An > external process makes a query to the index, depending on their security > allowences based of a login id a list of hits are returned minus any the > user are meant to know even exist. I was thinking maybe a custom filter with > a JDBC connection to check security of the user vs. the document. I'm not > sure how I would add the filter or how to write the filter or how to get the > login id from a GET parameter. Any suggestions, comments etc.? Hi Brett, (keeping in mind that i've been away from SOLR for 8 months, but i dont think this was added of late) standard approach is to manage security @ your application layer, not @ SOLR. ie, search, return documents (which should contain some kind of data to identify their ACL ) and then you can decide whether to show it or not. HIH _ {Beto|Norberto|Numard} Meijome "They never open their mouths without subtracting from the sum of human knowledge." Thomas Brackett Reed I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i indexing MS-Outlook files?
On Sun, 14 Dec 2008 19:22:00 -0800 (PST) Otis Gospodnetic wrote: > Perhaps an easier alternative is to index not the MS-Outlook files > themselves, but email messages pulled from the IMAP or POP servers, if that's > where the original emails live. PST files ('outlook files') are local to the end user and quite possibly their contents aren't available in the server anymore. Another alternative could be to access, from Exchange's "file system" itself, the files that represent each object... I don't know whether this is still possible in Exchange 2007, or whether it is 'sanctioned' by MS... Possibly some kind of object interface with exchange itself would be most desirable _ {Beto|Norberto|Numard} Meijome FAST, CHEAP, SECURE: Pick Any TWO I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: port of Nutch CommonGrams to Solr for help with slow phrase queries
On Wed, 26 Nov 2008 10:08:03 +1100 Norberto Meijome <[EMAIL PROTECTED]> wrote: > We didn't notice any severe performance hit but : > - data set isn't huge ( ca 1 MM docs). > - reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer > to lower the number of hits to SOLR. To make this clear - there was a noticeable hit when we removed stop words, but the nature of the beast forced our hand. b _ {Beto|Norberto|Numard} Meijome "Peace can only be achieved by understanding." Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: port of Nutch CommonGrams to Solr for help with slow phrase queries
On Mon, 24 Nov 2008 13:31:39 -0500 "Burton-West, Tom" <[EMAIL PROTECTED]> wrote: > The approach to this problem used by Nutch looks promising. Has anyone > ported the Nutch CommonGrams filter to Solr? > > "Construct n-grams for frequently occuring terms and phrases while > indexing. Optimize phrase queries to use the n-grams. Single terms are > still indexed too, with n-grams overlaid." > http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/C > ommonGrams.html Tom, i haven't used Nutch's implementation, but used the current implementation (1.3) of ngrams and shingles to address exactly the same issue ( database of music albums and tracks). We didn't notice any severe performance hit but : - data set isn't huge ( ca 1 MM docs). - reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer to lower the number of hits to SOLR. B _ {Beto|Norberto|Numard} Meijome "Truth has no special time of its own. Its hour is now -- always." Albert Schweitzer I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Using Solr for indexing emails
On Tue, 25 Nov 2008 03:59:31 +0200 Timo Sirainen <[EMAIL PROTECTED]> wrote: > > would it be faster to say q=user: AND highestuid:[ * TO *] ? > > Now that I read again what fq really did, yes, sounds like you're right. you may want to compare them both to see which one is better... I just went from memory :P > > ( and i > > guess you'd sort DESC and return 1 record only). > > No, I'd use the above for getting highestuid value for all mailboxes > (there should be only one record per mailbox (each mailbox has separate > uid values -> separate highestuid value)) so I can look at the returned > highestuid values to see what mailboxes aren't fully indexed yet. gotcha. It is an interesting use of SOLR, i must say... I for one am not used to having to deal with up to the second update needs. good luck, B _ {Beto|Norberto|Numard} Meijome "Never offend people with style when you can offend them with substance." Sam Brown I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Using Solr for indexing emails
On Mon, 24 Nov 2008 20:21:17 +0200 Timo Sirainen <[EMAIL PROTECTED]> wrote: > I think I gave enough reasons above for why I don't like this > solution. :) I also don't like adding new shared global state databases > just for Solr. Solr should be the one shared global state database.. fair enough - it makes more sense to me now :) [...] > Store the per-mailbox highest indexed UID in a new unique field created > like "//". Always update it by deleting the > old one first and then adding the new one. you mean delete, commit, add, commit? if you replace the record, simply submitting the new document and committing would do (of course, you must ensure the value of the uniqueKey field matches, so SOLR replaces the old doc). > So to find out the highest > indexed UID for a mailbox just look it up using its unique field. For > finding the highest indexed UID for a user's all mailboxes do a single > query: > > - fl=highestuid > - q=highestuid:[* TO *] > - fq=user: would it be faster to say q=user: AND highestuid:[ * TO *] ? ( and i guess you'd sort DESC and return 1 record only). > If messages are being simultaneously indexed by multiple processes the > highest-uid value may sometimes (rarely) be set too low, but that > doesn't matter. The next search will try to re-add some of the messages > that were already in index, but because they'll have the same unique IDs > than what already exists they won't get added again. The highest-uid > gets updated and all is well. B _ {Beto|Norberto|Numard} Meijome Mind over matter: if you don't mind, it doesn't matter I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Using Solr for indexing emails
On Sun, 23 Nov 2008 16:02:16 +0200 Timo Sirainen <[EMAIL PROTECTED]> wrote: > Hi, Hi Timo, > [...] > The main problem is that before doing the search, I first have to check > if there are any unindexed messages and then add them to Solr. This is > done using a query like: > - fl=uid > - rows=1 > - sort=uid desc > - q=uidv: box: user: So, if I understand correctly, the process is : 1. user sends search query Q to search interface 2. interface checks highest indexed uidv in SOLR 3. checks in IMAP store for mailbox if there are any objects ('emails') newer than uidv from 2. 4. anything found in 3. is processed, submitted to SOLR, committed. 5. interface submits search query Q to index, gets results 6. results are presented / returned to user It strikes me that this may work ok in some situations but may not scale. I would decouple the {find new documents / submit / commit } process from the { search / presentation} layer - SPECIALLY if you plan to have several mailboxes in play now. > So it returns the highest IMAP UID field (which is an always-ascending > integer) for the given mailbox (you can ignore the uidvalidity). I can > then add all messages with higher UIDs to Solr before doing the actual > search. > > When searching multiple mailboxes the above query would have to be sent > to every mailbox separately. hmm...not sure what you mean by "query would have to be sent to every MAILBOX" ... > That really doesn't seem like the best > solution, especially when there are a lot of mailboxes. But I don't > think Solr has a way to return "highest uid field for each > box:"? hmmm... maybe you can use facets on 'box' ... ? though you'd still have to query for each box, i think... > Is that above query even efficient for a single mailbox? i don't think so. >I did consider > using separate documents for storing the highest UID for each mailbox, > but that causes annoying desynchronization possibilities. Especially > because currently I can just keep sending documents to Solr without > locking and let it drop duplicates automatically (should be rare). With > per-mailbox highest-uid documents I can't really see a way to do this > without locking or allowing duplicate fields to be added and later some > garbage collection deleting all but the one highest value (annoyingly > complex). I have a feeling the issues arise from serialising the whole process (as I described above... ). It makes more sense (to me) to implement something similar to DIH, where you load data as needed (even a 'delta query', which would only return new data... I am not sure whether you could use DIH ( RSS feed from IMAP store? ) > I could of course also keep track of what's indexed on Dovecot's side, > but that could also lead to desynchronization issues and I'd like to > avoid them. > > I guess the ideal solution would be if it was somehow possible to create > a SQL-like trigger that updates the per-mailbox highest-uid document > whenever adding a new document with a higher UID value. I am not sure how much effort you want to put into this...but I would think that writing a lean app that periodically (for a period that makes sense for your hardware and user's expectation... 5 minutes? 10? 1? ) crawls the IMAP stores for UID, processes them and submits to SOLR, and keeps its own state ( dbm or sqlite ) may be a more flexible approach. Or, if dovecot support this, a 'plugin / hook ' that sends a msg to your indexing app everytime a new document is created. I am interested to hear what you decide to go with, and why. cheers, B _ {Beto|Norberto|Numard} Meijome "All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use hammer." IBM maintenance manual, 1975 I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: [VOTE] Community Logo Preferences
On Sun, 23 Nov 2008 11:59:50 -0500 Ryan McKinley <[EMAIL PROTECTED]> wrote: > Please submit your preferences for the solr logo. https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg https://issues.apache.org/jira/secure/attachment/12394263/apache_solr_a_blue.jpg https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg thanks!! B _ {Beto|Norberto|Numard} Meijome "Tell a person you're the Metatron and they stare at you blankly. Mention something out of a Charleton Heston movie and suddenly everyone's a Theology scholar!" Dogma I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i protect the SOLR Cores?
On Wed, 19 Nov 2008 22:58:52 -0800 (PST) RaghavPrabhu <[EMAIL PROTECTED]> wrote: > Im using multiple cores and all i need to do is,to make the each core in > secure manner. If i am accessing the particular core via url,it should ask > and validate the credentials say Username & Password for each core. You should be able to handle this @ the servlet container level. What I did, using Jetty + starting from the example app, was : 1) modify web.xml (part of the sources of solr.war, which you'll have to rebuild) to define the authentication constraints you want. [...] Default / AllowedQueries /core1/select/* /core2/select/* /core3/select/* Admin /admin/* /core1/admin/* /core2/admin/* /core3/admin/* /_test_/* Admin-role FullAccess-role RW /core1/dataimport /core2/dataimport /core3/dataimport /core1/update/* /core2/update/* /core3/update/* RW-role FullAccess-role BASIC SearchSvc Admin-role FullAccess-role RW-role [...] 2) in Jetty's jetty.xml (or in a context...i just used jetty.xml), define where to get the AUTH details from : [...] SearchSvc /etc/searchsvc_access.properties [...] 3) Read in jetty's documentation how to create the .properties file with the auth info... I am not sure if this is the BEST way to do it ( i didn't have access to any stronger auth method than basic at the time), but it works exactly as intended. b _ {Beto|Norberto|Numard} Meijome "I was born not knowing and have had only a little time to change that here and there." Richard Feynman I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Use SOLR like the "MySQL LIKE"
On Tue, 18 Nov 2008 14:26:02 +0100 "Aleksander M. Stensby" <[EMAIL PROTECTED]> wrote: > Well, then I suggest you index the field in two different ways if you want > both possible ways of searching. One, where you treat the entire name as > one token (in lowercase) (then you can search for avera* and match on for > instance "average joe" etc.) And then another field where you tokenize on > whitespace for instance, if you want/need that possibility aswell. Look at > the solr copy fields and try it out, it works like a charm :) You should also make extensive use of analysis.jsp to see how data in your field (1) is tokenized, filtered and indexed, and how your search terms are tokenized, filtered and matched against (1). Hint 1 : check all the checkboxes ;) Hint 2: you don't need to reindex all your data, just enter test data in the form and give it a go. You will of course have to tweak schema.xml and restart your service when you do this. good luck, B _ {Beto|Norberto|Numard} Meijome "Intellectual: 'Someone who has been educated beyond his/her intelligence'" Arthur C. Clarke, from "3001, The Final Odyssey", Sources. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Core Size limit
On Tue, 11 Nov 2008 20:39:32 -0800 (PST) Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > With Distributed Search you are limited to # of shards * Integer.MAX_VALUE. yeah, makes sense. And i would suspect since this is PER INDEX , it applies to each core only ( so you could have n cores in m shards for n * m * integer.MAX_VALUE docs). _ {Beto|Norberto|Numard} Meijome "The more I see the less I know for sure." John Lennon I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Core Size limit
On Tue, 11 Nov 2008 10:25:07 -0800 (PST) Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Doc ID gaps are zapped during segment merges and index optimization. > thanks Otis :) b _ {Beto|Norberto|Numard} Meijome "I didn't attend the funeral, but I sent a nice letter saying I approved of it." Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Core Size limit
On Mon, 10 Nov 2008 10:24:47 -0800 (PST) Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > I don't think there is a limit other than your hardware and the internal Doc > ID which limits you to 2B docs on 32-bit machines. Hi Otis, just curious is this internal doc ID reused when an optimise happens? or gaps left and re-filled when 2B is reached ? cheers, b _ {Beto|Norberto|Numard} Meijome "Whenever you find that you are on the side of the majority, it is time to reform." Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How to use multicore feature in JBOSS
On Tue, 4 Nov 2008 23:45:40 -0800 (PST) con <[EMAIL PROTECTED]> wrote: > But for the first question, I am still not clear. > I think to use the multicore feature we should inform the server. In the > Jetty server, we are starting the server using: java > -Dsolr.solr.home=multicore -jar start.jar > Once the server is started I think it will take the parameters from > multicore/solr.xml. > > But I am confused on how and where to pass this argument to JBOSS. Con, Sorry, i don't have a jboss available to test... what happens if you use the standard configuration ( with solr.xml @ the top level of your solr directory, NOT in multicore/ ) launch it, look @ the debug messages , see which cores are picked up (from the admin page ). FWIW, by having {solr_installation_directory}/solr.xml , I never had to tell jetty where solr.xml was. IIRC, multicore/solr.xml is the layout in the example app , because the default config is 1-core only. b _ {Beto|Norberto|Numard} Meijome "We must openly accept all ideologies and systems as means of solving humanity's problems. One country, one nation, one ideology, one system is not sufficient." Dalai Lama. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How to use multicore feature in JBOSS
On Tue, 4 Nov 2008 09:55:38 -0800 (PST) con <[EMAIL PROTECTED]> wrote: > 1) Which all files do I need to edit to use the multicore feature? > 2) Also, where can I specify the index directly so that we can point the > indexed documents to a custom folder instead of jboss/bin? Con, please check the wiki - the answers should be there ( 1) = solr.xml ( previously multicore.xml) 2) look in solrconfig.xml for each core ) _ {Beto|Norberto|Numard} Meijome Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: DIH and rss feeds
On Thu, 30 Oct 2008 20:46:16 -0700 "Lance Norskog" <[EMAIL PROTECTED]> wrote: > Now: a few hours later there are a different 100 "lastest" documents. How do > I add those to the index so I will have 200 documents? 'full-import' throws > away the first 100. 'delta-import' is not implemented. What is the special > trick here? I'm using the Solr-1.3.0 release. > Lance, 1) DIH has a "clean" parameter that, when set to true ( default, i think), will delete all existing docs in the index. 2) ensure your new documents have different values in your field defined as key ( schema.xml) . let us know how it goes, B _ {Beto|Norberto|Numard} Meijome Lack of planning on your part does not constitute an emergency on ours. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Searching on other fields which are not in query
On Thu, 30 Oct 2008 15:50:58 -0300 "Jorge Solari" <[EMAIL PROTECTED]> wrote: > > > in the schema file. or use Dismax query handler. b _ {Beto|Norberto|Numard} Meijome Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: solr1.3 - testing language ?
On Mon, 20 Oct 2008 08:16:50 -0700 (PDT) sunnyfr <[EMAIL PROTECTED]> wrote: > ok so straight by the admin part ! Hi Johanna - not sure what you mean by 'the admin part'. > it should work .. so it doesn't if you tell us what you did (what url you called) , what you expect to receive back (sample of your indexed data) and what you get instead , we may be able to offer better answers... b _ {Beto|Norberto|Numard} Meijome Two things have come out of Berkeley, Unix and LSD. It is uncertain which caused the other. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Advice on analysis/filtering?
On Thu, 16 Oct 2008 16:09:17 +0200 Jarek Zgoda <[EMAIL PROTECTED]> wrote: > They came to such expectations seeing Solr's own Spellcheck at work - > if it can suggest correct versions, it should be able to sanitize > broken words in documents and search them using sanitized input. For > me, this seemed reasonable request (of course, if this can be achieved > reasonably abusing solr's spellcheck component). don't forget that the solr spellchecker finds its suggestions based on your corpus. so if you don't have a correctly spelt version of wordA , you won't receive back wordA as a 'spellchecked' version of that word. I think that's how it works by default (which is all I've needed so far). I *think* there is a way to use an external spellchecker (component or list) - so you could have your full list of Polish words in a file, i guess I agree playing with analysis.jsp is the best approach to solving these problems ( tick all the boxes and see how the changes to your terms take place). good luck - let us know what you come up with :) B _ {Beto|Norberto|Numard} Meijome "You can discover what your enemy fears most by observing the means he uses to frighten you." Eric Hoffer (1902 - 1983) I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: solr1.3 - testing language ?
On Mon, 20 Oct 2008 06:25:09 -0700 (PDT) sunnyfr <[EMAIL PROTECTED]> wrote: > I implemented multi language search, but I didn't finished the website in > PHP, how can I check it works properly? maybe by sending to SOLR the queries you plan your PHP frontend to generate ? _ {Beto|Norberto|Numard} Meijome "Always do right. This will gratify some and astonish the rest." Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: query parsing issue + behavior as OR (solr 1.4-dev)
On Mon, 20 Oct 2008 06:21:06 -0700 (PDT) Sunil Sarje <[EMAIL PROTECTED]> wrote: > I am working with nightly build of Oct 17, 2008 and found the issue that > something wrong with LuceneQParserPlugin; It takes + as OR Sunil, please do not hijack the thread : http://en.wikipedia.org/wiki/Thread_hijacking thanks, B _ {Beto|Norberto|Numard} Meijome He could be a poster child for retroactive birth control. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Sorting performance
On Mon, 20 Oct 2008 16:28:23 +0300 christophe <[EMAIL PROTECTED]> wrote: > Hum. this mean I have to wait before I index new documents and avoid > indexing when they are created (I have about 50 000 new documents > created each day and I was planning to make those searchable ASAP). you can always index + optimize out of band in a 'master' / RW server , and then send the updated index to your slave (the one actually serving the requests). This *will NOT* remove the need to refresh your cache, but it will remove any delay introduced by commit/indexing + optimise. > Too bad there is no way to have a centralized cache that can be shared > AND updated when new documents are created. hmm not sure it makes sense like that... but maybe along the lines of having an active cache that is used to serve queries, and new ones being prepared, and then swapped when ready. Speaking of which (or not :P) , has anyone thought about / done any work on using memcached for these internal solr caches? I guess it would make sense for setups with several slaves ( or even a master updating memcached too...)...though for a setup with shards it would be slightly more involved (although it *could* be used to support several slaves per 'data shard' ). All the best, B _ {Beto|Norberto|Numard} Meijome RTFM and STFW before anything bad happens. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Synonym format not working
On Mon, 20 Oct 2008 00:08:07 -0700 (PDT) prerna07 <[EMAIL PROTECTED]> wrote: > > > The issue with synonym arise when i have number in synonym defination: > > ccc =>1,2 gives following result in debugQuery= true : > MultiPhraseQuery(all:" (1 ) (2 ccc ) > 3") > all:" (1 ) (2 ccc ) 3" > > However fooaaa=> fooaaa, baraaa,bazaaa gives correct synonym results: > > all:fooaaa all:baraaa all:bazaaa > all:fooaaa all:baraaa all:bazaaa > > Any pointers to solve the issue with numbers in synonyms? Prerna, in your first email you show your field type has : [...] [..] generateNumberParts=1 will, AFAIK, generate a different token on a number. so ccc1 will be indexed as "ccc", "1" . If you use admin/analsys.jsp you can see the step by step process taken by the tokenizer + filters for your data type - you can then tweak it as necessary until you are happy with the results. b _ {Beto|Norberto|Numard} Meijome Immediate success shouldn't be necessary as a motivation to do the right thing. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: search not working correctly
On Mon, 20 Oct 2008 03:24:36 -0700 (PDT) prerna07 <[EMAIL PROTECTED]> wrote: > Yes, We want search on these incomplete words. Look into the NGram token factory . works a treat - I don't think it's explained a lot in the wiki, but has been discussed in this list in the past, and you also have JavaDoc and the source itself. FWIW, I had problems getting it to work properly with minNgram != maxNGram - analysis.jsp shows a match, but it didn't work in the QH . It could *definitely* have been myself or code @ the time I tested it (pre 1.3 release)... I'll test again to see if it is happening and log a bug if needed. B _ {Beto|Norberto|Numard} Meijome "There are two kinds of stupid people. One kind says,'This is old and therefore good'. The other kind says, 'This is new, and therefore better.'" John Brunner, 'The Shockwave Rider'. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: SolrJ + HTTP caching
On Wed, 15 Oct 2008 11:11:07 -0700 Matthew Runo <[EMAIL PROTECTED]> wrote: > We've been using Varnish (http://varnish.projects.linpro.no/) in front > of our Solr servers, and have been seeing about a 70% hit rate for the > queries. We're using SolrJ, and have seen no bad effects of the cache. FWIW : We also use Varnish in front of SOLR - we refresh the index daily, so we have a fairly long TTL, but clear it at the end of the script which calls DIH. The web app also caches rendered results (webpages :P) in memcached. B _ {Beto|Norberto|Numard} Meijome "Build a system that even a fool can use, and only a fool will want to use it." George Bernard Shaw I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Problem in using Unique key
On Wed, 8 Oct 2008 03:45:20 -0700 (PDT) con <[EMAIL PROTECTED]> wrote: > But in that case, while doing a full-import I am getting the following > error: > > org.apache.solr.common.SolrException: QueryElevationComponent requires the > schema to have a uniqueKeyField Con, if you don't use the Query Elevation component, you can disable it in solrconfig.xml . Not sure why uniqueField is needed for it though. b _ {Beto|Norberto|Numard} Meijome "First they ignore you, then they laugh at you, then they fight you, then you win." Mahatma Gandhi. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: dismax and long phrases
On Tue, 07 Oct 2008 09:27:30 -0700 Jon Drukman <[EMAIL PROTECTED]> wrote: > > Yep, you can "fake" it by only using fieldsets (qf) that have a > > consistent set of stopwords. > > does that mean changing the query or changing the schema? Jon, - you change schema.xml to define which type each field is. The fieldType says whether you have stopwords or not. - you change solrconfig.xml to define which fields will dismax query on. i dont think you should have to change your query. b _ {Beto|Norberto|Numard} Meijome "Mix a little foolishness with your serious plans; it's lovely to be silly at the right moment." Horace I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Dismax , "query phrases"
On Tue, 30 Sep 2008 11:43:57 -0700 (PDT) Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : That's why I was wondering how Dismax breaks it all apart. It makes > sense...I : suppose what I'd like to have is a way to tell dismax which > fields NOT to : tokenize the input for. For these fields, it would pass the > full q instead of : each part of it. Does this make sense? would it be useful > at all? > > the *goal* makes sense, but the implementation would be ... problematic. > > you have to remember the DisMax parser's whole way of working is to make > each "chunk" of input match against any qf field, and find the highest > scoring field for each chunk, with this input... > > q = some phase & qf = a b c > > ...you get... > > ( (a:some | b:some | c:some) (a:phrase | b:phrase | c:phrase) ) > > ...even if dismax could tell that "c" was a field that should only support > exact matches, thanks Hoss, it would by a configuration option. > how would it fit c:"some phrase" into that structure? does this make sense? ( (a:some | b:some ) (a:phrase | b:phrase) ( c:"some phrase") ) > I've already kinda forgotten how this thread started ... trying to get *exact* matches to always score higher using dismax - keeping in mind that I have multiple exact fields, with different boosts... > but would it make > sense to just use your "exact" fields in the pf, and have inexact versions > of them in the qf? then docs that match your input exactly should score > at the top, but less exact matches will also still match. aha! right, i think that makes sense...i obviously haven't got my head properly around all the different functionality of dismax. I will try it when I'm back @ work... right now, i seem to have solved the problem by using shingles -the fields are artists, song & albumtitles ,so high matching on shingles is quite approximate to exact matching - except that I had to remove stopwords, so that impacts on performance. Thanks again :) B _ {Beto|Norberto|Numard} Meijome Which is worse: ignorance or apathy? Don't know. Don't care. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Create Indexes
On Fri, 26 Sep 2008 18:58:14 +0530 Dinesh Gupta <[EMAIL PROTECTED]> wrote: > Please tell me where to upload the files. anywhere you have access to... your own website, somewhere anyone on the list can access the files >you< want to share to address your problems :) b _ {Beto|Norberto|Numard} Meijome "Science Fiction...the only genuine consciousness expanding drug" Arthur C. Clarke I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Dismax , "query phrases"
On Fri, 26 Sep 2008 10:42:42 -0700 (PDT) Chris Hostetter <[EMAIL PROTECTED]> wrote: > : : class="solr.KeywordTokenizerFactory" />
Re: How to select one entity at a time?
On Fri, 26 Sep 2008 02:35:18 -0700 (PDT) con <[EMAIL PROTECTED]> wrote: > What you meant is correct only. Please excuse for that I am new to solr. :-( Con, have a read here : http://www.ibm.com/developerworks/java/library/j-solr1/ it helped me pick up the basics a while back. it refers to 1.2, but the core concepts are relevant to 1.3 too. b _ {Beto|Norberto|Numard} Meijome Hildebrant's Principle: If you don't know where you are going, any road will get you there. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How to select one entity at a time?
On Fri, 26 Sep 2008 02:35:18 -0700 (PDT) con <[EMAIL PROTECTED]> wrote: > What you meant is correct only. Please excuse for that I am new to solr. :-( hi Con, nothing to be excused for..but you may want to read the wiki , as it provides quite a lot of information that should answer your questions. DIH is great, but I wouldn't go near it until you understand how to create your own schema.xml and solrconfig.xml . http://wiki.apache.org/solr/FrontPage is the wiki ( everyone else ... is there a guide on getting started on SOLR ? step by step, taking the example and changing it for your own use? ) > I want to index all the query results. (I think this will be done by the > data-config.xml) hmm...terminology :-) you index documents (similar to records in a database). when you send a query to Solr, you will get results if your query > Now while accessing this indexed data, i need this filtering. ie. Either > user or manager. > I tried your suggestion: > http://localhost:8983/solr/select/?q=user:bob&version=2.2&start=0&rows=10&indent=on&wt=json the url LOOKS ok. do you have any document in your index with field user containing 'bob; ? try this to get all results ( xml format, first 3 results only... http://localhost:8983/solr/select/?q=*:*&rows=3 then, find a field with a value , then search for that value and see if you get that document back - it should work...(with lots of caveats, yes).. If you send us the result we can help u understand better why it isn't working as you intend.. b _ {Beto|Norberto|Numard} Meijome "First they ignore you, then they laugh at you, then they fight you, then you win." Mahatma Gandhi. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Create Indexes
On Fri, 26 Sep 2008 16:32:05 +0530 Dinesh Gupta <[EMAIL PROTECTED]> wrote: > Is it OK to create whole index by Solr web-app? > If not than ,How can I create index? > > I have attached some file that create index now. > Dinesh, you sent the same email 2 1/2 hours ago. sending it again will not give you more answers. If you have a file you want to share, you should upload it to a webserver and share the URL - most mailing lists drop any file attachments. _ {Beto|Norberto|Numard} Meijome Never take Life too seriously, no one gets out alive anyway. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How to select one entity at a time?
On Fri, 26 Sep 2008 00:46:07 -0700 (PDT) con <[EMAIL PROTECTED]> wrote: > To be more specific: > I have the data-config.xml just like: > > > > > > > > > > > > > Con, I may be confused here...are you asking how to load only data from your USERS SQL table into SOLR, or how to search in your SOLR index for data about 'USERS'. data-config.xml is only relevant for the Data Import Handler...but your following question: > > I have 3 search conditions. when the client wants to search all the users, > only the entity, 'user' must be executed. And if he wants to search all > managers, the entity, 'manager' must be executed. > > How can i accomplish this through url? *seems* to indicate you want to search on this . If you want to search on a particular field from your SOLR schema, DIH is not involved. If you use the standard QH, you say ?q=user:Bob If I misunderstood your question, please explain... cheers, b _ {Beto|Norberto|Numard} Meijome "Everything is interesting if you go into it deeply enough" Richard Feynman I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Shingles , min size?
hi guys, I may have missed it ,but is it possible to tell the solr.ShingleFilterFactory the minimum number of grams to generate per shingle? Similar to NGramTokenizerFactory's minGramSize="3" maxGramSize="3" thanks! B _ {Beto|Norberto|Numard} Meijome "Ask not what's inside your head, but what your head's inside of." J. J. Gibson I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Dismax , "query phrases"
On Wed, 24 Sep 2008 08:34:57 -0700 (PDT) Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > What happens if you change ps from 100 to 1 and comment out that ord function? > > Otis, I think what I am after is what Hoss described in his last paragraph in his reply to your email last year : http://www.nabble.com/DisMax-and-REQUIRED-OR-REQUIRED-query-rewrite-td13395349.html#a13395349 ie, I want everything that Dismax does, BUT , on certain fields, I want it to search for all the terms in my q= , as a phrase. I am thinking of modifying dismax to allow this to be passed as a configuration ( eg, fieldsSearchExact=artist_exact, title_exact), but if I can avoid it that'd be great :). any other ideas, anyone?? thanks! B _ {Beto|Norberto|Numard} Meijome "Nature doesn't care how smart you are. You can still be wrong." Richard Feynman I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Defining custom schema
On Wed, 24 Sep 2008 04:42:42 -0700 (PDT) con <[EMAIL PROTECTED]> wrote: > In the table we will be having various column names like CUSTOMER_NAME, > CUSTOMER_PHONE etc. If we use the default schema.xml, we have to map these > values to some the default values like cat, features etc. this will cause > difficulty when we need to process the output. > Instead can we set the column name and column type dynamically to the > schema.xml so that the output will show something like, > markrmiller Con, the "default" schema you refer to is from the example application. You should definitely edit it and define your own fields. b _ {Beto|Norberto|Numard} Meijome "In my opinion, we don't devote nearly enough scientific research to finding a cure for jerks." Calvin I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: help required: how to design a large scale solr system
On Wed, 24 Sep 2008 11:45:34 -0400 Mark Miller <[EMAIL PROTECTED]> wrote: > Nothing to stop you from breaking up the tsv/csv files into multiple > tsv/csv files. Absolutely agreeing with you ... in one system where I implemented SOLR, I have a process run through the file system and lazily pick up new files as they come in.. if something breaks (and it will,as the files are user generated in many cases...), report it / leave it for later...move on. b _ {Beto|Norberto|Numard} Meijome I used to hate weddings; all the Grandmas would poke me and say, "You're next sonny!" They stopped doing that when i started to do it to them at funerals. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Dismax , "query phrases"
On Wed, 24 Sep 2008 08:34:57 -0700 (PDT) Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > What happens if you change ps from 100 to 1 and comment out that ord function? > > > Otis Hi Otis, no luck - without " " : smashing pumpkins smashing pumpkins +((DisjunctionMaxQuery((genre:smash^0.2 | title_ngram2:"sm ma as sh hi in ng"^0.1 | artist_ngram2:"sm ma as sh hi in ng"^0.1 | title_ngram3:"sma mas ash shi hin ing"^4.5 | title:smash^6.0 | artist_ngram3:"sma mas ash shi hin ing"^3.5 | artist:smash^4.0 | artist_exact:smashing^100.0 | title_exact:smashing^200.0)~0.01) DisjunctionMaxQuery((genre:pumpkin^0.2 | title_ngram2:"pu um mp pk ki in ns"^0.1 | artist_ngram2:"pu um mp pk ki in ns"^0.1 | title_ngram3:"pum ump mpk pki kin ins"^4.5 | title:pumpkin^6.0 | artist_ngram3:"pum ump mpk pki kin ins"^3.5 | artist:pumpkin^4.0 | artist_exact:pumpkins^100.0 | title_exact:pumpkins^200.0)~0.01))~2) DisjunctionMaxQuery((title:"smash pumpkin"~1^2.0 | artist:"smash pumpkin"~1^0.8)~0.01) ___ +(((genre:smash^0.2 | title_ngram2:"sm ma as sh hi in ng"^0.1 | artist_ngram2:"sm ma as sh hi in ng"^0.1 | title_ngram3:"sma mas ash shi hin ing"^4.5 | title:smash^6.0 | artist_ngram3:"sma mas ash shi hin ing"^3.5 | artist:smash^4.0 | artist_exact:smashing^100.0 | title_exact:smashing^200.0)~0.01 (genre:pumpkin^0.2 | title_ngram2:"pu um mp pk ki in ns"^0.1 | artist_ngram2:"pu um mp pk ki in ns"^0.1 | title_ngram3:"pum ump mpk pki kin ins"^4.5 | title:pumpkin^6.0 | artist_ngram3:"pum ump mpk pki kin ins"^3.5 | artist:pumpkin^4.0 | artist_exact:pumpkins^100.0 | title_exact:pumpkins^200.0)~0.01)~2) (title:"smash pumpkin"~1^2.0 | artist:"smash pumpkin"~1^0.8)~0.01 Still OK if I include " "... I am trying on another setup, with same data, to work with shingles rather than on 'exact' ... dismax seems to handle it much better...but it may be that I haven't added to that config all the ngram3 &ngram3 fields for substring matching... the resulting params were : 2<-1 5<-2 6<90% true true 0.01 store_albums.xsl ___ title_exact^200.0 artist_exact^100.0 title^6.0 title_ngram3^4.5 artist^4.0 artist_ngram3^3.5 title_ngram2^0.1 artist_ngram2^0.1 genre^0.2 *:* true xml dismax 10 true title^2.0 artist^0.8 all *,score 1 1 true all xml smashing pumpkins thanks, B _ {Beto|Norberto|Numard} Meijome "Don't remember what you can infer." Harry Tennant I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: help required: how to design a large scale solr system
On Wed, 24 Sep 2008 07:46:57 -0400 Mark Miller <[EMAIL PROTECTED]> wrote: > Yes. You will def see a speed increasing by avoiding http (especially > doc at a time http) and using the direct csv loader. > > http://wiki.apache.org/solr/UpdateCSV and the obvious reason that if, for whatever reason, something breaks while you are indexing directly from memory, can you restart the import? it may be just easier to keep in disk and keep track of where you are up to adding to the index... B _ {Beto|Norberto|Numard} Meijome Sysadmins can't be sued for malpractice, but surgeons don't have to deal with patients who install new versions of their own innards. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Dismax , "query phrases"
Hello, I've seen references to this in the list, but not completely explained...my apologies if this is FAQ (and for the length of the email). I am using dismax across a number of fields on an index with data about music albums & songs - the fields are quite full of stop words. I am trying to boost 'exact' matches - ie, if you search for 'The Doors', those documents with 'The Doors' should be first. I've created the following fieldType and I use it for fields artist_exact and title_exact: I then give artist_exact and title_exact pretty high boosts ( title_exact^200.0 artist_exact^100.0 ) Now, when I search with ?q=the doors , all the terms in my q= aren't used together to build the dismaxQuery , so I never get a match on the _exact fields: (there are a few other fields involved...pretty self explanatory) the doors the doors ___ +((DisjunctionMaxQuery((title_ngram2:"th he"^0.1 | artist_ngram2:"th he"^0.1 | title_ngram3:the^4.5 | artist_ngram3:the^3.5 | artist_exact:the^100.0 | title_exact:the^200.0)~0.01) DisjunctionMaxQuery((genre:door^0.2 | title_ngram2:"do oo or rs"^0.1 | artist_ngram2:"do oo or rs"^0.1 | title_ngram3:"doo oor ors"^4.5 | title:door^6.0 | artist_ngram3:"doo oor ors"^3.5 | artist:door^4.0 | artist_exact:doors^100.0 | title_exact:doors^200.0)~0.01))~2) DisjunctionMaxQuery((title:door^2.0 | artist:door^0.8)~0.01) FunctionQuery((ord(release_year))^0.5) +(((title_ngram2:"th he"^0.1 | artist_ngram2:"th he"^0.1 | title_ngram3:the^4.5 | artist_ngram3:the^3.5 | artist_exact:the^100.0 | title_exact:the^200.0)~0.01 (genre:door^0.2 | title_ngram2:"do oo or rs"^0.1 | artist_ngram2:"do oo or rs"^0.1 | title_ngram3:"doo oor ors"^4.5 | title:door^6.0 | artist_ngram3:"doo oor ors"^3.5 | artist:door^4.0 | artist_exact:doors^100.0 | title_exact:doors^200.0)~0.01)~2) (title:door^2.0 | artist:door^0.8)~0.01 (ord(release_year))^0.5 but, if I build my search as ?q="the doors" +DisjunctionMaxQuery((genre:door^0.2 | title_ngram2:"th he e d do oo or rs"^0.1 | artist_ngram2:"th he e d do oo or rs"^0.1 | title_ngram3:"the he e d do doo oor ors"^4.5 | title:door^6.0 | artist_ngram3:"the he e d do doo oor ors"^3.5 | artist:door^4.0 | artist_exact:the doors^100.0 | title_exact:the doors^200.0)~0.01) DisjunctionMaxQuery((title:door^2.0 | artist:door^0.8)~0.01) FunctionQuery((ord(release_year))^0.5) +(genre:door^0.2 | title_ngram2:"th he e d do oo or rs"^0.1 | artist_ngram2:"th he e d do oo or rs"^0.1 | title_ngram3:"the he e d do doo oor ors"^4.5 | title:door^6.0 | artist_ngram3:"the he e d do doo oor ors"^3.5 | artist:door^4.0 | artist_exact:the doors^100.0 | title_exact:the doors^200.0)~0.01 (title:door^2.0 | artist:door^0.8)~0.01 (ord(release_year))^0.5 I've tried with other queries that don't include stopwords (smashing pumpkins, for example), and in all cases, if I don't use " ", only the LAST word is used with my _exact fields ( tried with 1, 2 and 3 words, always the same against my _exact fields..) What is the reason for this behaviour? my full dismax config is : 2<-1 5<-2 6<90% true true 0.01 title_exact^200.0 artist_exact^100.0 title^6.0 title_ngram3^4.5 artist^4.0 artist_ngram3^3.5 title_ngram2^0.1 artist_ngram2^0.1 genre^0.2 *:* true dismax true 10 title^2.0 artist^0.8 all *,score ord(release_year)^0.5 1 100 TIA! B _ {Beto|Norberto|Numard} Meijome "Never offend people with style when you can offend them with substance." Sam Brown I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Using Shingles to Increase Phrase Search Performance
On Sat, 16 Aug 2008 15:39:44 -0700 "Chris Harris" <[EMAIL PROTECTED]> wrote: [...] > So finally I modified the Lucene ShingleFilter class to add an > "outputUnigramIfNoNgram option". Basically, if you set that option, > and also set outputUnigrams=false, then the filter will tokenize just > as in Exhibit B, except that if the query is only one word long, it > will return a corresponding single token, rather than zero tokens. In > other words, > > [Exhibit C] > "please" -> > "please" > > Things were still zippy. And, so far, I think I have seriously > improved my phrase search performance without ruining anything. hi Chris, is this change part of 1.3 ? I've tried but analysis.jsp shows no tokens generated when there is only 1 word. thanks! B _ {Beto|Norberto|Numard} Meijome I sense much NT in you. NT leads to Bluescreen. Bluescreen leads to downtime. Downtime leads to suffering. NT is the path to the darkside. Powerful Unix is. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Any way to extract most used keywords from an index (or a random set)
On Mon, 22 Sep 2008 15:46:54 +0530 "Jacob Singh" <[EMAIL PROTECTED]> wrote: > Hi, > > I'm trying to write a testing suite to gauge the performance of solr > searches. To do so, I'd like to be able to find out what keywords > will get me search results. Is there anyway to programaticaly do this > with luke? I'm trying to figure out what all it exposes, but I'm not > seeing this. > Hi Jacob, are you after something that the following URLs don't provide ? http://host/solr/core/admin/luke?wt=xslt&tr=luke.xsl but I actually prefer the schema browser ( 1.3 ) to see the top n terms per field... b _ {Beto|Norberto|Numard} Meijome If it's there, and you can see it, it's real. If it's not there, and you can see it, it's virtual. If it's there, and you can't see it, it's transparent. If it's not there, and you can't see it, you erased it. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Special character matching 'x' ?
On Thu, 18 Sep 2008 10:53:39 +0530 "Sanjay Suri" <[EMAIL PROTECTED]> wrote: > One of my field values has the name "R__ikk__nen" which contains a special > characters. > > Strangely, as I see it anyway, it matches on the search query 'x' ? > > Can someone explain or point me to the solution/documentation? hi Sanjay, Akshay should have given you an answer for this. In a more general way, if you want to know WHY something is matching the way it is, run the query with debugQuery=true . There are a few pages in the wiki which explain other debugging techniques. b _ {Beto|Norberto|Numard} Meijome "Ask not what's inside your head, but what your head's inside of." J. J. Gibson I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: about boost weight
On Sat, 13 Sep 2008 16:17:12 + zzh <[EMAIL PROTECTED]> wrote: >I think this is a stupid method, because the search conditions is too > long, and the search efficiency will be low, we hope you can help me to solve > this problem. Hi, IMHO,a long set of conditions doesn't make it stupid. You may not be going the best way about it though. You may find http://wiki.apache.org/solr/DisMaxRequestHandler an interesting and useful read :) B _ {Beto|Norberto|Numard} Meijome "Quality is never an accident, it is always the result of intelligent effort." John Ruskin (1819-1900) I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Regarding Indexing
On Fri, 29 Aug 2008 02:37:10 -0700 (PDT) sanraj25 <[EMAIL PROTECTED]> wrote: > I want to store two independent datas in solr index. so I decided to create > two index.But that's not possible.so i go for multicore concept in solr > .can u give me step by step procedure to create multicore in solr Hi, without specific questions, i doubt myself or others can give you any other information than the documentation, which can be found at : http://wiki.apache.org/solr/CoreAdmin Please make sure you are using (a recent version of ) 1.3. B _ {Beto|Norberto|Numard} Meijome Your reasoning is excellent -- it's only your basic assumptions that are wrong. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Regarding Indexing
On Fri, 29 Aug 2008 00:31:13 -0700 (PDT) sanraj25 <[EMAIL PROTECTED]> wrote: > But still i cant maintain two index. > please help me how to create two cores in solr What specific problem do you have ? B _ {Beto|Norberto|Numard} Meijome "Always listen to experts. They'll tell you what can't be done, and why. Then do it." Robert A. Heinlein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Storing two different files
On Thu, 28 Aug 2008 02:01:05 -0700 (PDT) sanraj25 <[EMAIL PROTECTED]> wrote: > I want to index two different files in solr.(for ex) I want to store > two tables like, job_post and job_profile in solr. But now both are stored > in same place in solr.when i get data from job_post, data come from > job_profile also.So i want to maintain the data of job_post and job_profile > separately. hi :) you need to have 2 separate schemas, and therefore 2 separate indexes. You should read about MultiCore in the wiki. B _ {Beto|Norberto|Numard} Meijome "Unix is very simple, but it takes a genius to understand the simplicity." Dennis Ritchie I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Question about search suggestion
On Tue, 26 Aug 2008 15:15:21 +0300 Aleksey Gogolev <[EMAIL PROTECTED]> wrote: > > Hello. > > I'm new to solr and I need to make a search suggest (like google > suggestions). > Hi Aleksey, please search the archives of this list for subjects containing 'autocomplete' or 'auto-suggest'. that should give you a few ideas and starting points. best, B _ {Beto|Norberto|Numard} Meijome "The more I see the less I know for sure." John Lennon I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: dataimporthandler and mysql connector jar
On Mon, 25 Aug 2008 17:11:47 +0200 Walter Ferrara <[EMAIL PROTECTED]> wrote: > Launching a multicore solr with dataimporthandler using a mysql driver, > (driver="com.mysql.jdbc.Driver") works fine if the mysql connector jar > (mysql-connector-java-5.0.7-bin.jar) is in the classpath, either jdk > classpath or inside the solr.war lib dir. > While putting the mysql-connector-java-5.0.7-bin.jar in core0/lib > directory, or in the multicore shared lib dir (specified in sharedLib > attribute in solr.xml) result in exception, even if the jar is correctly > loaded by the classloader: Hi Walter, As at nightly build of August 19th, the DIH failing to connect to the data source on SOLR's startup does *not* kill SOLR anymore. I haven't tested yesterday's ...it could be a regression bug, but i doubt it - the error used to be different to yours (about connectivity, not failure in document). for what is worth,i only have 1 copy of the jdbc jar (MS SQL in my case), in the SOLR's lib directory, used by several cores's own DIH. You can check if it's picked up by SOLR's classpath in the Java Info page under admin/ You may also want to try with a valid but empty document definition in data-config.xml to rule out syntax issues. B _ {Beto|Norberto|Numard} Meijome "Any society that would give up a little liberty to gain a little security will deserve neither and lose both." Benjamin Franklin I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: "Multicore" and snapshooter / snappuller
On Fri, 22 Aug 2008 12:21:53 -0700 "Lance Norskog" <[EMAIL PROTECTED]> wrote: > Apparently the ZFS (Silicon Graphics > originally) is great for really huge files. hi Lance, You may be confusing Sun's ZFS with SGI's XFS. The OP referred, i think, to ZFS. B _ {Beto|Norberto|Numard} Meijome "The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well-meaning but without understanding." Justice Louis D. Brandeis I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Querying Question
On Thu, 21 Aug 2008 18:09:11 -0700 "Jake Conk" <[EMAIL PROTECTED]> wrote: > I thought if I used to copy my string field to a text > field then I can search for words within it and not limited to the > entire content. Did I misunderstand that? but you need to search on the fields that are defined as fieldType=text...it seems you are searching on the string fields. B _ {Beto|Norberto|Numard} Meijome "He has the attention span of a lightning bolt." Robert Redford I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: hello, a question about solr.
On Wed, 20 Aug 2008 10:58:50 -0300 "Alexander Ramos Jardim" <[EMAIL PROTECTED]> wrote: > A tiny but really explanation can be found here > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters thanks Alexander - indeed, quite short, and focused on shingles ... which , if I understand correctly, are groups of terms of n size... the ngramtokizer creates tokens of n-characters from your input. Searching for ngram or n-gram in the archives should bring more relevant information up, which isnt in the wiki yet. B _ {Beto|Norberto|Numard} Meijome "All that is necessary for the triumph of evil is that good men do nothing." Edmund Burke I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Clarification on facets
On Tue, 19 Aug 2008 10:18:12 +1200 "Gene Campbell" <[EMAIL PROTECTED]> wrote: > Is this interpreted as meaning, there are 10 documents that will match > with 'car' in the title, and likewise 6 'boat' and 2 'bike'? Correct. > If so, is there any way to get counts for the *number times* a value > is found in a document. I'm looking for a way to determine the number > of times 'car' is repeated in the title, for example Not sure - i would suggest that a field with a term repeated several times would receive a higher score when searching for that term, but not sure how you could get the information you seek...maybe with the Luke handler ? ( but on a per-document basis...slow... ? ) B _ {Beto|Norberto|Numard} Meijome Computers are like air conditioners; they can't do their job properly if you open windows. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: .wsdl for example....
On Tue, 19 Aug 2008 11:23:48 +1000 Norberto Meijome <[EMAIL PROTECTED]> wrote: > On Mon, 18 Aug 2008 19:08:24 -0300 > "Alexander Ramos Jardim" <[EMAIL PROTECTED]> wrote: > > > Do you wanna a full web service for SOLR example? How a .wsdl will help you? > > Why don't you use the HTTP interface SOLR provides? > > > > Anyways, if you need to develop a web service (SOAP compliant) to access > > SOLR, just remember to use an embedded core on your webservice. > > On Mon, 18 Aug 2008 15:37:24 -0400 > Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > WSDL? surely you jest. > > > > Erik > > :D I obviously said something terribly stupid, oh well, not the first time > and most likely wont be the last one either. > > Anyway, the reason for my asking is : > - I've put together a SOLR search service with a few cores. Nothing fancy, > it works great as is. > - the .NET developer I am working with on this asked for a .wsdl (or > .asmx) file to import into Visual Studio ... yes, he can access the service > directly, but he seems to prefer a more 'well defined' interface (haven't > really decided whether it is worth the effort, but that is another question > altogether) > > The way I see it, SOLR is a RESTful service. I am not looking into wrapping > the whole thing behind SOAP ( I actually much prefer REST than SOAP, but that > is entering into quasi-religious grounds...) - which should be able to be > defined with a .wsdl ( v 1.1 should suffice as only GET + POST are supported > in SOLR anyway). > > Am I missing anything here ? > > thanks in advance for your time + thoughts , > B To be clear, i don't suggest we should have a .wsdl for example, simply asking if there would be any use in having one. but given the responses I got, I'm curious now to understand what I have gotten wrong :) Best, B _ {Beto|Norberto|Numard} Meijome I sense much NT in you. NT leads to Bluescreen. Bluescreen leads to downtime. Downtime leads to suffering. NT is the path to the darkside. Powerful Unix is. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: .wsdl for example....
On Mon, 18 Aug 2008 19:08:24 -0300 "Alexander Ramos Jardim" <[EMAIL PROTECTED]> wrote: > Do you wanna a full web service for SOLR example? How a .wsdl will help you? > Why don't you use the HTTP interface SOLR provides? > > Anyways, if you need to develop a web service (SOAP compliant) to access > SOLR, just remember to use an embedded core on your webservice. On Mon, 18 Aug 2008 15:37:24 -0400 Erik Hatcher <[EMAIL PROTECTED]> wrote: > WSDL? surely you jest. > > Erik :D I obviously said something terribly stupid, oh well, not the first time and most likely wont be the last one either. Anyway, the reason for my asking is : - I've put together a SOLR search service with a few cores. Nothing fancy, it works great as is. - the .NET developer I am working with on this asked for a .wsdl (or .asmx) file to import into Visual Studio ... yes, he can access the service directly, but he seems to prefer a more 'well defined' interface (haven't really decided whether it is worth the effort, but that is another question altogether) The way I see it, SOLR is a RESTful service. I am not looking into wrapping the whole thing behind SOAP ( I actually much prefer REST than SOAP, but that is entering into quasi-religious grounds...) - which should be able to be defined with a .wsdl ( v 1.1 should suffice as only GET + POST are supported in SOLR anyway). Am I missing anything here ? thanks in advance for your time + thoughts , B _ {Beto|Norberto|Numard} Meijome "He has no enemies, but is intensely disliked by his friends." Oscar Wilde I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: hello, a question about solr.
On Mon, 18 Aug 2008 23:07:19 +0800 "finy finy" <[EMAIL PROTECTED]> wrote: > because i use chinese character, for example "ibm___" > solr will parse it into a term "ibm" and a phraze "_ __" > can i use solr to query with a term "ibm" and a term "_" and a term > "__"? Hi finy, you should look into n-gram tokenizers. Not sure if it is documented in the wiki, but it has been discussed in the mailing list quite a few times. in short, an n-gram tokenizer breaks your input into blocks of characters of size n , which are then used to compare in the index. I think for Chinese , bi-gram is the favoured approach. good luck, B _ {Beto|Norberto|Numard} Meijome I used to hate weddings; all the Grandmas would poke me and say, "You're next sonny!" They stopped doing that when i started to do it to them at funerals. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
.wsdl for example....
hi :) does anyone have a .wsdl definition for the example bundled with SOLR? if nobody has it, would it be useful to have one ? cheers, B _ {Beto|Norberto|Numard} Meijome Intelligence: Finding an error in a Knuth text. Stupidity: Cashing that $2.56 check you got. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: hello, a question about solr.
On Mon, 18 Aug 2008 15:33:02 +0800 "finy finy" <[EMAIL PROTECTED]> wrote: > the name field is text,which is analysed, i use the query > "name:ibmT63notebook" why do you search with no spaces? is this free text entered by a user, or is it part of a link which you control ? PS: please dont top-post _ {Beto|Norberto|Numard} Meijome Commitment is active, not passive. Commitment is doing whatever you can to bring about the desired result. Anything less is half-hearted. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: DIH - commit / optimize
On Mon, 18 Aug 2008 09:34:56 +0530 "Shalin Shekhar Mangar" <[EMAIL PROTECTED]> wrote: > Actually we have commit and optimize as separate request parameters > defaulting to true for both full-import and delta-import. You can add a > request parameter optimize=false for delta-import if you want to commit but > not to optimize the index. ah , now it makes perfect sense :) sorry, i should have checked the src myself. thanks so much again :) B _ {Beto|Norberto|Numard} Meijome What you are afraid to do is a clear indicator of the next thing you need to do. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: DIH - commit / optimize
On Mon, 18 Aug 2008 10:14:32 +0800 "finy finy" <[EMAIL PROTECTED]> wrote: > i use solr for 3 months, and i find some question follow: Please do not hijack mail threads. http://en.wikipedia.org/wiki/Thread_hijacking _ {Beto|Norberto|Numard} Meijome "Ask not what's inside your head, but what your head's inside of." J. J. Gibson I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
DIH - commit / optimize
Hi again, I see in the DIH wiki page : [...] full-import [..] commit: (default 'true'). Tells whether to commit+optimize after the operation [...] but nothing for delta-import... I think it would be useful , a 'commit' (default=true) , 'optimize' (default=false) for the delta-import - these should most probably be separate ones, i think. - for full-import , wouldn't it make sense to split commit + optimize into 2 different options? Granted, if I do a clean=true, i'd probably want (need!) an optimize... even then, optimize may be too slow / use too much memory at that point in time... ? ( not too sure about this argument..) cheers, B _ {Beto|Norberto|Numard} Meijome Never take Life too seriously, no one gets out alive anyway. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: DIH - calling spellchecker rebuild...
On Sun, 17 Aug 2008 20:22:26 +0530 "Shalin Shekhar Mangar" <[EMAIL PROTECTED]> wrote: > If it is only SpellCheckComponent that you are interested in, then see > SOLR-622. > > You can add this to your SCC config to rebuild SCC after every commit: > true ah great stuff , thanks Shalin. B _ {Beto|Norberto|Numard} Meijome "Truth has no special time of its own. Its hour is now -- always." Albert Schweitzer I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
DIH - calling spellchecker rebuild...
Guys + gals, just a question of form - would DIH itself be the right place to implement a "URLS to call after successfully completing a DIH full or partial load" - for example, to rebuild spellchecker when new items have been added? Or should that be part of my external process (cron -> shell script, for example ) that calls DIH in the first place ? cheers B _ {Beto|Norberto|Numard} Meijome If you find a solution and become attached to it, the solution may become your next problem. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
DataImportHandler : more forgiving initialisation possible?
hi guys, First of all, thanks for DIH - it's great :) One thing I noticed during my tests ( nightly, 2008-08-16) is that, if the DB is not available during SOLR startup time, the whole core won't initialise .- the error is shown below. I was wondering, 1) would it be possible to have DIH bomb out in this situation, but not bring down the whole core from running? I think it would be desirable , with a big warning , possibly... thoughts ? 2) How hard would it be to handle this more gracefully - for example, in case of error, leave the handler in an non-init state, and when being accessed, repeat the whole init process (and bomb out if it fails again ,of course)... Thanks for your time on this email + DIH + all other features :) B [...] Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.DataImportHandler processConfiguration INFO: Processing configuration from solrconfig.xml: {config=data-config.xml} Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity an_artist with URL: jdbc:sqlserver://a.b.c.d:1433;databaseName=DBNAME;user=usrname;password=magicpassword;responseBuffering=adaptive; Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.DataImportHandler inform SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to initialize DataSource: null Processing Documemt # at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:306) at org.apache.solr.handler.dataimport.DataImporter.addDataSource(DataImporter.java:273) at org.apache.solr.handler.dataimport.DataImporter.initEntity(DataImporter.java:228) at org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:98) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:294) at org.apache.solr.core.SolrCore.(SolrCore.java:473) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:295) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:107) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1220) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:513) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:222) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:977) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to create database connection Processing Documemt # at org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:67) at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:303) ... 34 more Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host has failed. java.net.ConnectException: Connection refused at com.micros
[SOLVED...]Re: Problems using saxon for XSLT transforms
On Tue, 12 Aug 2008 23:36:32 +1000 Norberto Meijome <[EMAIL PROTECTED]> wrote: > hi :) > I'm trying to use SAXON instead of the default XSLT parser. I was pretty sure > i > had it running fine on 1.2, but when I repeated the same steps (as per the > wiki) on latest nightly build, i cannot see any sign of it being loaded or > use, > although the classpath seems to be pointing to them (see below) > [...] well, although no explicit information is present about whether it IS using saxon, it obviously dies when saxon isn't present- I moved lib/saxon* out of the way, and any transformation dies with : HTTP ERROR: 500 Provider net.sf.saxon.TransformerFactoryImpl not found javax.xml.transform.TransformerFactoryConfigurationError: Provider net.sf.saxon.TransformerFactoryImpl not found at javax.xml.transform.TransformerFactory.newInstance(TransformerFactory.java:108) at org.apache.solr.util.xslt.TransformerProvider.(TransformerProvider.java:45) at org.apache.solr.util.xslt.TransformerProvider.(TransformerProvider.java:43) at org.apache.solr.request.XSLTResponseWriter.getTransformer(XSLTResponseWriter.java:117) at org.apache.solr.request.XSLTResponseWriter.getContentType(XSLTResponseWriter.java:65) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:250) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1088) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:829) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:450) RequestURI=/solr/tracks/select/ I guess not as clear as what I'd had hoped for, but should do for now :) cheers, B _ {Beto|Norberto|Numard} Meijome Computers are like air conditioners; they can't do their job properly if you open windows. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Best way to index without diacritics
On Thu, 14 Aug 2008 11:34:47 -0400 "Steven A Rowe" <[EMAIL PROTECTED]> wrote: [...] > The kind of filter Walter is talking about - a generalized language-aware > character normalization Solr/Lucene filter - does not yet exist. My guess is > that if/when it does materialize, both the Solr and the Lucene projects will > want to have it. Historically, most functionality shared by Solr and Lucene > is eventually hosted by Lucene, since Solr has a Lucene dependency, but not > vice-versa. > > So, yes, Solr would be responsible for hosting configuration for such a > filter, but the responsibility for doing something with the configuration > would be Lucene's responsibility, assuming that Lucene would (eventually) > host the filter and Solr would host a factory over the filter. > > Steve thanks for the thorough explanation ,Steve . B _ {Beto|Norberto|Numard} Meijome "Throughout the centuries there were [people] who took first steps down new paths armed only with their own vision." Ayn Rand I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Best way to index without diacritics
( 2 in 1 reply) On Wed, 13 Aug 2008 09:59:21 -0700 Walter Underwood <[EMAIL PROTECTED]> wrote: > Stripping accents doesn't quite work. The correct translation > is language-dependent. In German, o-dieresis should turn into > "oe", but in English, it shoulde be "o" (as in "co__perate" or > "M__tley Cr__e"). In Swedish, it should not be converted at all. Hi Walter, understood. This goes back to the question of language-specific field definitions / parsers... more on this below. > > There are other character-to-string conversions: ae-ligature > to "ae", "__" to "ss", and so on. Luckily, those are independent > of language. > > wunder > > On 8/13/08 9:16 AM, "Steven A Rowe" <[EMAIL PROTECTED]> wrote: > > > Hi Norberto, > > > > https://issues.apache.org/jira/browse/LUCENE-1343 hi Steve, thanks for the pointer. this is a Lucene entry... I thought the Latin-filter was a SOLR feature? I, for one, definitely meant a SOLR filter. Given what Walter rightly pointed out about differences in language, I suspect it would be a SOLR-level thing - fieldType name="textDE" language="DE" would apply the filter of unicode chars to {ascii?} with the appropriate mapping for German, etc. Or is this that Lucene would / should take care of ? B _ {Beto|Norberto|Numard} Meijome "I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause." Dostoevsky I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Spellcheker and Dismax both
On Thu, 14 Aug 2008 12:21:13 +0530 "Shalin Shekhar Mangar" <[EMAIL PROTECTED]> wrote: > The SpellCheckerRequestHandler is now deprecated with Solr 1.3 and it has > been replaced by SpellCheckComponent. > > http://wiki.apache.org/solr/SpellCheckComponent which works quite well with dismax. B _ {Beto|Norberto|Numard} Meijome Never attribute to malice what can adequately be explained by incompetence. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Searching Questions
On Tue, 12 Aug 2008 13:26:26 -0700 "Jake Conk" <[EMAIL PROTECTED]> wrote: > 1) I want to search only within a specific field, for instance > `category`. Is there a way to do this? of course. Please see http://wiki.apache.org/solr/SolrQuerySyntax (in particular, follow the link to Lucene syntax..) > > 2) When searching for multiple results are the following identical > since "*_facet" and "*_facet_mv" have their type's both set to string? > > /select?q=tag_facet:%22John+McCain%22+OR+tag_facet:%22Barack+Obama%22 > /select?q=tag_facet_mv:%22John+McCain%22+OR+tag_facet_mv:%22Barack+Obama%22 Erik H. already answered this question , in another of your emails. Check your mailbox or the lists archives. > 3) If I'm searching for something that is in a text field but I > specify it as a facet string rather than a text type would it still > search within text fields or would it just limit the search to string > fields? I am not sure what you mean by 'a facet string' . You facet on fields, SOLR automatically creates facets on those fields based on the results to your query . > 4) Is there a page that will show me different querying combinations > or can someone post some more examples? Have you check the wiki ? which page do you suggest needs more examples? > 5) Anyone else notice returning back the data in php (&wt=phps) > doesn't unserialize? I am using PHP 5.3 w/ a nightly copy of Solr from > last week. sorry, haven't used PHP + SOLR cheers, B _ {Beto|Norberto|Numard} Meijome "All that is necessary for the triumph of evil is that good men do nothing." Edmund Burke I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Best way to index without diacritics
On Tue, 12 Aug 2008 11:44:42 -0400 "Steven A Rowe" <[EMAIL PROTECTED]> wrote: > Solr is Unicode aware. The ISOLatin1AccentFilterFactory handles diacritics > for the ISO Latin-1 section of the Unicode character set. UTF (do you mean > UTF-8?) is a (set of) Unicode serialization(s), and once Solr has > deserialized it, it is just Unicode characters (Java's in-memory UTF-16 > representation). > > So as long as you're only concerned about removing diacritics from the set of > Unicode characters that overlaps ISO Latin-1, and not about other Unicode > characters, then ISOLatin1AccentFilterFactory should work for you. hi, do you know if anyone has implemented a similar filter using icu and mapping (a lot more of) UTF-8 to ascii ? B _ {Beto|Norberto|Numard} Meijome "He has the attention span of a lightning bolt." Robert Redford I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: adds / delete within same 'transaction'..
On Tue, 12 Aug 2008 20:53:12 -0400 "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On Tue, Aug 12, 2008 at 1:48 AM, Norberto Meijome <[EMAIL PROTECTED]> wrote: > > What happens if I issue: > > > > 1 > > 1new > > > > > > will delete happen first, and then the add, or could it be that the add > > happens before delete > > Doesn't matter... it's an implementation detail. Solr used to buffer > deletes, and if it crashed at the right time one could get duplicates. > Now, Lucene does the buffering of deletes (internally lucene does the > adds first and buffers the deletes until a segment flush) and it > should be impossible to see more than one "1" or no "1" at all. Thanks Yonik. I wasn't asking about the specific details, but of the consequence. I seem to remember (incorrectly , or v1.2 only maybe) , that if one wanted assurances that the case above happened in the right order, one had to commit after the deletes, and once more after the adds. This not being the case, I am happy :) Thanks again, B _ {Beto|Norberto|Numard} Meijome "He has Van Gogh's ear for music." Billy Wilder I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: adds / delete within same 'transaction'..
On Tue, 12 Aug 2008 11:21:50 -0700 Mike Klaas <[EMAIL PROTECTED]> wrote: > > will delete happen first, and then the add, or could it be that the > > add happens before delete, in which case i end up with no more doc > > id=1 ? > > As long as you are sending these requests on the same thread, they > will occur in order. > > -Mike right, that is GREAT to know then :) cheers, b _ {Beto|Norberto|Numard} Meijome Life is not measured by the number of breaths we take, but by the moments that take our breath away. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Problems using saxon for XSLT transforms
hi :) I'm trying to use SAXON instead of the default XSLT parser. I was pretty sure i had it running fine on 1.2, but when I repeated the same steps (as per the wiki) on latest nightly build, i cannot see any sign of it being loaded or use, although the classpath seems to be pointing to them (see below) In my logs,i see : INFO: created xslt: org.apache.solr.request.XSLTResponseWriter Aug 12, 2008 11:20:07 PM org.apache.solr.request.XSLTResponseWriter init INFO: xsltCacheLifetimeSeconds=5 which is the RH itself, then, on a hit that triggers the transform : Aug 12, 2008 11:21:25 PM org.apache.solr.util.xslt.TransformerProvider WARNING: The TransformerProvider's simplistic XSLT caching mechanism is not appropriate for high load scenarios, unless a single XSLT transform is used and xsltCacheLifetimeSeconds is set to a sufficiently high value. This is where I would expect to see saxon...right? I'm running SOLR 1.3, nightly from 2008-08-11, under FreeBSD 7 (stable), JDK 1.6.. I have 4 cores defined in this test environment. I start my service with : java -Xms64m -Xmx1024m -server -Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl -jar start.jar the /admin/get-properties.jsp shows [] javax.xml.transform.TransformerFactory = net.sf.saxon.TransformerFactoryImpl java.specification.version = 1.6 [...] java.class.path = /solrhome:/solrhome/lib/saxon9-s9api.jar:/solrhome/lib/jetty-6.1.11.jar:/solrhome/lib/saxon9-jdom.jar:/solrhome/lib/saxon9-sql.jar:/solrhome/lib/servlet-api-2.5-6.1.11.jar:/solrhome/lib/saxon9-xqj.jar:/solrhome/lib/saxon9.jar:/solrhome/lib/jetty-util-6.1.11.jar:/solrhome/lib/saxon9-xom.jar:/solrhome/lib/saxon9-dom4j.jar:/solrhome/lib/saxon9-xpath.jar:/solrhome/lib/saxon9-dom.jar:/solrhome/lib/jsp-2.1/core-3.1.1.jar:/solrhome/lib/jsp-2.1/ant-1.6.5.jar:/solrhome/lib/jsp-2.1/jsp-2.1.jar:/solrhome/lib/jsp-2.1/jsp-api-2.1.jar:/solrhome/lib/management/jetty-management-6.1.11.jar:/solrhome/lib/naming/jetty-naming-6.1.11.jar:/solrhome/lib/naming/activation-1.1.jar:/solrhome/lib/naming/mail-1.4.jar:/solrhome/lib/plus/jetty-plus-6.1.11.jar:/solrhome/lib/xbean/jetty-xbean-6.1.11.jar:/solrhome/lib/annotations/geronimo-annotation_1.0_spec-1.0.jar:/solrhome/lib/annotations/jetty-annotations-6.1.11.jar:/solrhome/lib/ext/jetty-java5-threadpool-6.1.11.jar:/solrhome/lib/ext/jetty-sslengine-6 .1.11.jar:/solrhome/lib/ext/jetty-servlet-tester-6.1.11.jar:/solrhome/lib/ext/jetty-ajp-6.1.11.jar:/solrhome/lib/ext/jetty-setuid-6.1.11.jar:/solrhome/lib/ext/jetty-client-6.1.11.jar:/solrhome/lib/ext/jetty-html-6.1.11.jar [...] Any pointers to where I should check to confirm saxon is being used, or to address the problem will be greatly appreciated. TIA, B _ {Beto|Norberto|Numard} Meijome "Nature doesn't care how smart you are. You can still be wrong." Richard Feynman I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
adds / delete within same 'transaction'..
Hello :) I *think* i know the answer, but i'd like to confirm : Say I have 1old already indexed and commited (ie, 'live' ) What happens if I issue: 1 1new will delete happen first, and then the add, or could it be that the add happens before delete, in which case i end up with no more doc id=1 ? thanks!! B _ {Beto|Norberto|Numard} Meijome Anyone who isn't confused here doesn't really understand what's going on. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Can't Delete Record
On Mon, 11 Aug 2008 06:48:05 -0700 (PDT) Vj Ali <[EMAIL PROTECTED]> wrote: > i also sends tag as well. maybe you need instead of ? _ {Beto|Norberto|Numard} Meijome "With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead." [RFC1925 - section 2, subsection 3] I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: unique key
On Wed, 6 Aug 2008 12:25:34 +1000 Norberto Meijome <[EMAIL PROTECTED]> wrote: > On Tue, 5 Aug 2008 14:41:08 -0300 > "Scott Swan" <[EMAIL PROTECTED]> wrote: > > > I currently have multiple documents that i would like to index but i would > > like to combine two fields to produce the unique key. > > > > the documents either have 1 or the other fields so by combining the two > > fields i will get a unique result. > > > > is this possible in the solr schema? > > > > Hi Scott, > you can't do that by the schema - you need to do it when you generate your > document, before posting it to SOLR. Hi again, after reading the DataImportHandler documentation, you could do this too with specific configuration in DIH itself. Of course, you have to be using DIH to load data into your SOLR ;) B _ {Beto|Norberto|Numard} Meijome "Intellectual: 'Someone who has been educated beyond his/her intelligence'" Arthur C. Clarke, from "3001, The Final Odyssey", Sources. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Still no results after removing from stopwords
On Sun, 10 Aug 2008 19:58:24 -0700 (PDT) SoupErman <[EMAIL PROTECTED]> wrote: > I needed to run a search with a query containing the word "not", so I removed > "not" from the stopwords.txt file. Which seemed to work, at least as far as > parsing the query. It was now successfully searching for that keyword, as > noted in the query debugger. However it isn't returning any results where > "not" is in the query, which suggests "not" hasn't been indexed. However > looking at the listing for a particular item, "not" is listed as one of the > keywords, so it should be finding it? Hi Michael, did you reindex your documents after 1) changing your settings and 2) restarting SOLR (to allow your settings to come into effect)? B _ {Beto|Norberto|Numard} Meijome Real Programmers don't comment their code. If it was hard to write, it should be hard to understand and even harder to modify. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: HTML Standard Strip filter word boundary bug
On Thu, 7 Aug 2008 00:50:59 -0700 (PDT) matt connolly <[EMAIL PROTECTED]> wrote: > Where do I file a bug report? https://issues.apache.org/jira thanks! B _ {Beto|Norberto|Numard} Meijome Contrary to popular belief, Unix is user friendly. It just happens to be very selective about who it decides to make friends with. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: case preserving for data but not for indexing
On Wed, 6 Aug 2008 21:35:47 -0700 (PDT) Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > > > > 2 Tokenizers? i wondered about that too, but didn't have the time to test... B _ {Beto|Norberto|Numard} Meijome "Always listen to experts. They'll tell you what can't be done, and why. Then do it." Robert A. Heinlein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: case preserving for data but not for indexing
On Wed, 6 Aug 2008 20:21:28 -0400 "Ian Connor" <[EMAIL PROTECTED]> wrote: > In order to preserve case for the data, but not for indexing, I have > created two fields. One is type Author that is defined as: > > sortMissingLast="true" omitNorms="true"> > > > > > > > > and the other is just string: > > sortMissingLast="true" omitNorms="true"/> Hi Ian, the analyzers + filters apply to the data indexed (and to queries on the field,of course), NOT what is stored. IOW, you don't have to do anything to have SOLR return the data in your fields untouched. > this is used then for the author lists: > omitNorms="true" multiValued="true"/> > stored="true" omitNorms="true" multiValued="true"/> > > Is there any other way than to have two fields like this? One for > searching and one for displaying. Of course, you can do this but, for the reason you explained, it isn't needed. As a matter of fact, you will be indexing and storing both... If you wanted to have one field for indexing/search on and the other for retrieving, you'd have to set the values of the indexed and stored properties accordingly. > People's names can be vary case > sensitive for display purpose (eg McDonald. DeBros) but I don't want > people to miss results because they search for "lee" instead of "Lee". your definition of typeField author: > sortMissingLast="true" omitNorms="true"> > > > > > > should do that - it is telling SOLR (lucene?) that, each piece of data stored in a field of this type, to tokenize it., and then to change to lower case - both at indexing and query time. > > Also, can anyone see danger is using StandardTokenizerFactory for > people's names? I don't know, give it a try :) you can use the analysis page in /admin/ to see how your date would be treated both at index and query time... good luck, B _ {Beto|Norberto|Numard} Meijome "As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality." Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Logo thought
On Tue, 05 Aug 2008 16:02:51 -0400 Stephen Weiss <[EMAIL PROTECTED]> wrote: > My issue with the logos presented was they made solr look like a > school project instead of the powerful tool that it is. The tricked > out font or whatever just usually doesn't play well with the business > types... they want serious-looking software. First impressions are > everything. While the fiery colors are appropriate for something > named Solr, you can play with that without getting silly - take a look > at: couldn't agree more. current logo needs improvement, but I think it can be done much better... In particular thinking of small icons, print,etc... > http://www.ascsolar.com/images/asc_solar_splash_logo.gif > http://www.logostick.com/images/EOS_InvestmentingLogo_lg.gif > > (Luckily there are many businesses that do solar energy!) > > They have the same elements but with a certain simplicity and elegance. > > I know probably some people don't care if it makes the boss or client > happy, but, these are the kinds of seemingly insignificant things that Indeed - the way I see it, if you don't care either way, then you should be happy to have a professional looking one :P B _ {Beto|Norberto|Numard} Meijome "Caminante no hay camino, se hace camino al andar" Antonio Machado I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Sum of one field
On Tue, 05 Aug 2008 18:58:42 -0300 Leonardo Dias <[EMAIL PROTECTED]> wrote: > So I'm looking for a Ferrari. CarStore says that there are 5 ads for > Ferrari, but one ad has 2 Ferraris being sold, the other ad has 3 > Ferraris and all the others have 1 Ferrari each, meaning that there are > 5 ads and 8 Ferraris. And yes, I'm doing an example with Fibonacci > numbers. ;) why not create one separate document per car? It'll make it easier (for the client) to manage too when one of the cars is sold but not the other 4 B _ {Beto|Norberto|Numard} Meijome "With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead." [RFC1925 - section 2, subsection 3] I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: unique key
On Tue, 5 Aug 2008 14:41:08 -0300 "Scott Swan" <[EMAIL PROTECTED]> wrote: > I currently have multiple documents that i would like to index but i would > like to combine two fields to produce the unique key. > > the documents either have 1 or the other fields so by combining the two > fields i will get a unique result. > > is this possible in the solr schema? > Hi Scott, you can't do that by the schema - you need to do it when you generate your document, before posting it to SOLR. btw, please don't hijack topic threads. http://en.wikipedia.org/wiki/Thread_hijacking thanks!! B _ {Beto|Norberto|Numard} Meijome Law of Conservation of Perversity: we can't make something simpler without making something else more complex I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Diagnostic tools
On Tue, 5 Aug 2008 11:43:44 -0500 "Kashyap, Raghu" <[EMAIL PROTECTED]> wrote: > Hi, Hi Kashyap, please don't hijack topic threads. http://en.wikipedia.org/wiki/Thread_hijacking thanks!! B _ {Beto|Norberto|Numard} Meijome Software QA is like cleaning my cat's litter box: Sift out the big chunks. Stir in the rest. Hope it doesn't stink. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: solr 1.3 ??
On Mon, 4 Aug 2008 21:13:09 -0700 (PDT) Vicky_Dev <[EMAIL PROTECTED]> wrote: > Can we get solr 1.3 release as soon as possible? Otherwise some interim > release (1.2.x) containing DataImportHandler will also a good option. > > Any Thoughts? have you tried one of the nightly builds? I've been following it every so often...sometimes there is a problem, but hardly ever... you can find a build you are comfortable with, and it'll be far closer to the actual 1.3 when released than 1.2 . B _ {Beto|Norberto|Numard} Meijome Quantum Logic Chicken: The chicken is distributed probabalistically on all sides of the road until you observe it on the side of your course. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Logo thought
On Mon, 4 Aug 2008 09:29:30 -0700 Ryan McKinley <[EMAIL PROTECTED]> wrote: > > > > If there is a still room for new log design for Solr and the > > community is > > open for it then I can try to come up with some proposal. Doing logo > > for > > Mahout was really interesting experience. > > > > In my opinion, yes I'd love to see more effort put towards the > logo. I have stayed out of this discussion since I don't really think > any of the logos under consideration are complete. (I begged some > friends to do two of the three logos under consideration) I would > love to refine them, but time... oooh time. +1 If we are going to change what we have, i'd love to see some more options , or better quality - no offence meant , but those "logos" aren't really a huge improvement or departure from the current one. I think whatever we change to we'll be wanting to use it for a long time. B _ {Beto|Norberto|Numard} Meijome If you find a solution and become attached to it, the solution may become your next problem. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.