Re: Using DIH's special commands....Help needed
The accepted logLevel values are error, deubug,warn,trace,info 2009/10/18 Noble Paul നോബിള് नोब्ळ् : > On Sun, Oct 18, 2009 at 4:16 AM, Lance Norskog wrote: >> I had this problem also, but I was using the Jetty exampl. I fail at >> logging configurations about 90% of the time, so I assumed it was my >> fault. > did you set the logLevel atribute also in the entity? if you set > logLevel="severe" it should definitely be printed >> >> 2009/10/17 Noble Paul നോബിള് नोब्ळ् : >>> It is strange that LogTransformer did not log the data. . >>> >>> On Fri, Oct 16, 2009 at 5:54 PM, William Pierce >>> wrote: Folks: Continuing my saga with DIH and use of its special commands. I have verified that the script functionality is indeed working. I also verified that '$skipRow' is working. But I don't think that '$deleteDocById' is working. My script now looks as follows: The theory is that rows whose 'IndexingStatus' value is 4 should be deleted from solr index. Just to be sure that javascript syntax was correct and checked out, I intentionally overwrite a field called 'Col1' in my schema with primary key of the document to be deleted. On a clean and empty index, I import 47 rows from my dummy db. Everything checks out correctly since IndexingStatus for each row is 1. There are no rows to delete. I then go into the db and set one row with the IndexingStatus = 4. When I execute the dataimport, I find that all 47 documents are imported correctly. However, for the row for which 'IndexingStatus' was set to 4, the Col1 value is set correctly by the script transformer to be the primary key value for that row/document. However, I should not be seeing that document since the '$deleteDocById should have deleted this from solr. Could this be a bug in solr? Or, am I misunderstanding how $deleteDocById works? By the way, Noble, I tried to set the LogTransformer, and add logging per your suggestion. That did not work either. I set logLevel="debug", and also turned on solr logging in the admin console to be the max value (finest) and still no output. Thanks, - Bill -- From: "Noble Paul ??? ??" Sent: Thursday, October 15, 2009 10:05 PM To: Subject: Re: Using DIH's special commandsHelp needed > use LogTransformer to see if the value is indeed set > > logTemplate="${post}" > query=" select Id, a, b, c, IndexingStatus from prod_table > where (IndexingStatus = 1 or IndexingStatus = 4) "> > > this should print out the entire row after the transformations > > > > On Fri, Oct 16, 2009 at 3:04 AM, William Pierce > wrote: >> >> Thanks for your reply! I tried your suggestion. No luck. I have >> verified >> that I have version 1.6.0_05-b13 of java installed. I am running with >> the >> nightly bits of October 7. I am pretty much out of ideas at the present >> timeI'd appreciate any tips/pointers. >> >> Thanks, >> >> - Bill >> >> -- >> From: "Shalin Shekhar Mangar" >> Sent: Thursday, October 15, 2009 1:42 PM >> To: >> Subject: Re: Using DIH's special commandsHelp needed >> >>> On Fri, Oct 16, 2009 at 12:46 AM, William Pierce >>> wrote: >>> Thanks for your help. Here is my DIH config fileI'd appreciate any help/pointers you may give me. No matter what I do the documents are not getting deleted from the index. My db has rows whose 'IndexingStatus' field has values of either 1 (which means add it to solr), or 4 (which means delete the document with the primary key from SOLR index). I have two transformers running. Not sure what I am doing wrong.
Is Relational Mapping (foreign key) possible in solr ??
Hi i browsed through the solr docs and user forums and what i infer is we cant use solr to store Relational Mapping(foreign key) in solr . but just want to know if any chances of doing the same. I have two tables User table (with 1,00,000 entries ) and project table with (200 entries ). User table columns : userid , name ,country , location , etc. Project tables Columns : project name , description , business unit , project type . Here User Location , Country , Project Name , Project business unit , project type are faceted A user can be mapped to multiple projects. In solr i store the details like this [ { userId:1234; userName:ABC; Country:US; Location:NY; Project Name:Project1,Project2; Project Description:Project1,Project2; Project business unit:unit1,unit2; Project type:Type1,Type2 } ] With this structure i could get faceted details about both user data and project data . But here i face 2 Problems . 1.A project can be mapped to many users say 10,000 Users . So if i change a project name then i end up indexing 10,000 Records which is a very time consuming work. 2.for Fields like Project Description i could not find any proper delimiter . for other fields comma (,) is okay but being description i could not use any specific delimiter .This is not faceted but still in search results i need to take this out and show the project details in tabular format. and i use delimiter to split it .For other project fields like Project Name and Type i could do it but not for this Project Description field So i expect is there any way of storing the data as relational records like in user details where we will have field called project Id and data will be 1,2 which refers to project records primary key in solr and still preserve the faceted approach. As for my knowledge my guess is it cant be done ??? Am i correct ??? If so then how we can achieve the solutions to my problem?? Pls if someone could share some ideas it will be useful. -- View this message in context: http://www.nabble.com/Is-Relational-Mapping-%28foreign-key%29-possible-in-solrtp25955068p25955068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is Relational Mapping (foreign key) possible in solr ??
Hi, here's what you could do: * Use multivalued fields instead of 'comma separated values', so you won't need a separator. * Store project identifiers in the user index. Denormalised projects informations in a user entry will fatally need re-indexing lot of user entries when project info changes. * You could have a mixed index with user and project entries in the same index, so if you search for a name, you'd find users and projects matching that name. Jerome. 2009/10/19 ashokcz : > > Hi i browsed through the solr docs and user forums and what i infer is we > cant use solr to store Relational > Mapping(foreign key) in solr . > > but just want to know if any chances of doing the same. > > I have two tables User table (with 1,00,000 entries ) and project table > with (200 entries ). > User table columns : userid , name ,country , location , etc. > Project tables Columns : project name , description , business unit , > project type . > Here User Location , Country , Project Name , Project business unit , > project type are faceted > A user can be mapped to multiple projects. > In solr i store the details like this > [ > { > userId:1234; > userName:ABC; > Country:US; > Location:NY; > Project Name:Project1,Project2; > Project Description:Project1,Project2; > Project business unit:unit1,unit2; > Project type:Type1,Type2 > } > ] > > With this structure i could get faceted details about both user data and > project data . > > But here i face 2 Problems . > > 1.A project can be mapped to many users say 10,000 Users . So if i change a > project name then i end > up indexing 10,000 Records which is a very time consuming work. > > 2.for Fields like Project Description i could not find any proper delimiter > . for other fields comma (,) is > > okay but being description i could not use any specific delimiter .This is > not faceted but still in search results i need to take this out and show the > project details in tabular format. and i use delimiter to split it .For > other project fields like Project Name and Type i could do it but not for > this Project Description field > > So i expect is there any way of storing the data as relational records like > in user details where we will have field called project Id and data will be > 1,2 which refers to project records primary key in solr and still preserve > the faceted approach. > > As for my knowledge my guess is it cant be done ??? > Am i correct ??? > If so then how we can achieve the solutions to my problem?? > Pls if someone could share some ideas it will be useful. > -- > View this message in context: > http://www.nabble.com/Is-Relational-Mapping-%28foreign-key%29-possible-in-solrtp25955068p25955068.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Terms truncation
Hi, I'm using the terms component for an autosuggest feature and it works well but i've hit an issue with truncation: Take the following query: http://localhost:8983/solr/terms?terms.fl=meta_name_t&terms.prefix=switch This is the response: 0 1 35 7 In this case the word 'switchov' is returned where i expected 'switchover'. The word 'switchov' doesn't exist by itself I'm puzzled with the truncation. The handles are all standard for 1.4. Are other factors affecting the response? I couldn't see an appropriate option for the query to adjust the length of the returned string... Thanks in advance, Paul Forsyth
Re: Terms truncation
On Oct 19, 2009, at 6:23 AM, Paul Forsyth wrote: Hi, I'm using the terms component for an autosuggest feature and it works well but i've hit an issue with truncation: Take the following query: http://localhost:8983/solr/terms?terms.fl=meta_name_t&terms.prefix=switch This is the response: 0 1 35 7 In this case the word 'switchov' is returned where i expected 'switchover'. The word 'switchov' doesn't exist by itself I'm guessing you are asking for terms on a field that is stemmed. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Concatening two fields
Hello, firstly sorry for my english :) Since last Friday I try to define in shema.xml a new field that is the concatenation of two other fields. So in schemal.xml I have these fields : field3 In my .csv file date are stored like that : field1 ; field2 toto ; titi In my mind field3 should store the string "toto titi". When I make the query "toto titi" I want solr to return the correct result but Solr returns nothing. Please could you help me to find what is uncorrect. Thanks in advance -- View this message in context: http://www.nabble.com/Concatening-two-fields-tp25956649p25956649.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using DIH's special commands....Help needed
Lance, Noble: I set logLevel="debug" in my dihconfig.xml at the entity level. Got no output! I then gave up digging into this further because I was pressed for time to dig into how to increase the speed of importing into solr with dih... Cheers, - Bill -- From: "Noble Paul നോബിള് नोब्ळ्" Sent: Monday, October 19, 2009 1:05 AM To: Subject: Re: Using DIH's special commandsHelp needed The accepted logLevel values are error, deubug,warn,trace,info 2009/10/18 Noble Paul നോബിള് नोब्ळ् : On Sun, Oct 18, 2009 at 4:16 AM, Lance Norskog wrote: I had this problem also, but I was using the Jetty exampl. I fail at logging configurations about 90% of the time, so I assumed it was my fault. did you set the logLevel atribute also in the entity? if you set logLevel="severe" it should definitely be printed 2009/10/17 Noble Paul നോബിള് नोब्ळ् : It is strange that LogTransformer did not log the data. . On Fri, Oct 16, 2009 at 5:54 PM, William Pierce wrote: Folks: Continuing my saga with DIH and use of its special commands. I have verified that the script functionality is indeed working.I also verified that '$skipRow' is working.But I don't think that '$deleteDocById' is working. My script now looks as follows: The theory is that rows whose 'IndexingStatus' value is 4 should be deleted from solr index. Just to be sure that javascript syntax was correct and checked out, I intentionally overwrite a field called 'Col1' in my schema with primary key of the document to be deleted. On a clean and empty index, I import 47 rows from my dummy db. Everything checks out correctly since IndexingStatus for each row is 1. There are no rows to delete.I then go into the db and set one row with the IndexingStatus = 4. When I execute the dataimport, I find that all 47 documents are imported correctly. However, for the row for which 'IndexingStatus' was set to 4, the Col1 value is set correctly by the script transformer to be the primary key value for that row/document. However, I should not be seeing that document since the '$deleteDocById should have deleted this from solr. Could this be a bug in solr? Or, am I misunderstanding how $deleteDocById works? By the way, Noble, I tried to set the LogTransformer, and add logging per your suggestion. That did not work either. I set logLevel="debug", and also turned on solr logging in the admin console to be the max value (finest) and still no output. Thanks, - Bill -- From: "Noble Paul ??? ??" Sent: Thursday, October 15, 2009 10:05 PM To: Subject: Re: Using DIH's special commandsHelp needed use LogTransformer to see if the value is indeed set this should print out the entire row after the transformations On Fri, Oct 16, 2009 at 3:04 AM, William Pierce wrote: Thanks for your reply! I tried your suggestion. No luck. I have verified that I have version 1.6.0_05-b13 of java installed. I am running with the nightly bits of October 7. I am pretty much out of ideas at the present timeI'd appreciate any tips/pointers. Thanks, - Bill -- From: "Shalin Shekhar Mangar" Sent: Thursday, October 15, 2009 1:42 PM To: Subject: Re: Using DIH's special commandsHelp needed On Fri, Oct 16, 2009 at 12:46 AM, William Pierce wrote: Thanks for your help. Here is my DIH config fileI'd appreciate any help/pointers you may give me. No matter what I do the documents are not getting deleted from the index. My db has rows whose 'IndexingStatus' field has values of either 1 (which means add it to solr), or 4 (which means delete the document with the primary key from SOLR index). I have two transformers running. Not sure what I am doing wrong. query=" select Id, a, b, c, IndexingStatus from prod_table where (IndexingStatus = 1 or IndexingStatus = 4) "> One thing I'd try is to use '4' for comparison rather than the number 4 (the type would depend on the sql type). Also, for javascript transformers to work, you mu
MoreLikeThis support Dismax parameters
>From what I've read/found, MoreLikeThis doesn't support the dismax parameters that are available in the StandardRequestHandler (such as bq). Is it possible that we might get support for those parameters some time? What are the issues with MLT Handler inheriting from the StandardRequestHandler instead of RequestHandlerBase? Nick Spacek
Re: Terms truncation
Thanks Grant, I'm still a bit of a newbie with Solr :) I was able to add a new non-stemming field along with a copyfield, and that seems to have done the trick :) Until i tried this i didnt quite realise what copyfields did... Thanks again, Paul On 19 Oct 2009, at 11:23, Paul Forsyth wrote: Hi, I'm using the terms component for an autosuggest feature and it works well but i've hit an issue with truncation: Take the following query: http://localhost:8983/solr/terms?terms.fl=meta_name_t&terms.prefix=switch This is the response: 0 1 35 7 In this case the word 'switchov' is returned where i expected 'switchover'. The word 'switchov' doesn't exist by itself I'm puzzled with the truncation. The handles are all standard for 1.4. Are other factors affecting the response? I couldn't see an appropriate option for the query to adjust the length of the returned string... Thanks in advance, Paul Forsyth
Boost with wildcard.
The boost (index time) does not work when i am searching for a word with a wildcard appended to the end. I stumbled on to this "feature" and its pretty much a show stopper for me. I am implementing a live search feature where i always have an wildcard in the last word that is currently being written by the user. Will this be fixed anytime soon or does anyone have a workaround? Example : "playstation*" gives an result with unboosted items but "playstation" gives the correct one.
Re: how can I use debugQuery if I have extended QParserPlugin?
awesome. Thanks for figuring this out guys wojtekpia wrote: > > Good catch. I was testing on a nightly build from mid-July. I just tested > on a similar deployment with nightly code from Oct 5th and everything > seems to work. > > My mid-July deployment breaks on sints, integers, sdouble, doubles, slongs > and longs. My more recent deployment works with tints, sints, integers, > tdoubles, sdoubles, doubles, tlongs, slongs, and longs. (I don't have any > floats in my schema so I didn't test those). Sounds like another reason to > upgrade to 1.4. > > Wojtek > -- View this message in context: http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25959707.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Concatening two fields
On Oct 19, 2009, at 7:21 AM, sophSophie wrote: Hello, firstly sorry for my english :) Since last Friday I try to define in shema.xml a new field that is the concatenation of two other fields. So in schemal.xml I have these fields : field3 In my .csv file date are stored like that : field1 ; field2 toto ; titi In my mind field3 should store the string "toto titi". When I make the query "toto titi" I want solr to return the correct result but Solr returns nothing. How is Field3 defined? And is that a phrase query or are you just using quotes for separation/emphasis? -Grant
Solr commits before documents are added
Hi, My application indexes huge number of documents(like in millions). Below is the snapshot of my code where I add all documents to Solr, and then at last issue commit command. I use Solrj. I find that last few documents are not committed to Solr. Is this because adding documents to Solr took longer time and it reached commit command even before it finished adding documents? Is there are way to ensure that solr waits for all documents to be added and then commits? Please advise me how to solve this issue. For loop solrServer.add(doc); // Add document to Solr End for loop solrServer.commit(); // Commit to Solr Thanks, Sharmila
RE: Solr commits before documents are added
A few questions to help the troubleshooting. Solr version #? Is there just 1 commit through Solrj for the millions of documents? Or do you do it on a regular interval (every 100k documents for example) and then one at the end to be sure? How are you observing that the last few didn't make it in? Are you looking at a slave or master? -Todd -Original Message- From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu] Sent: Monday, October 19, 2009 9:19 AM To: solr-user@lucene.apache.org Subject: Solr commits before documents are added Hi, My application indexes huge number of documents(like in millions). Below is the snapshot of my code where I add all documents to Solr, and then at last issue commit command. I use Solrj. I find that last few documents are not committed to Solr. Is this because adding documents to Solr took longer time and it reached commit command even before it finished adding documents? Is there are way to ensure that solr waits for all documents to be added and then commits? Please advise me how to solve this issue. For loop solrServer.add(doc); // Add document to Solr End for loop solrServer.commit(); // Commit to Solr Thanks, Sharmila
Shards param accepts spaces between commas?
It seems like no, and should be an easy change. I'm putting newlines after the commas so the large shards list doesn't scroll off the screen.
Filter query optimization
If a filter query matches nothing, then no additional query should be performed and no results returned? I don't think we have this today?
Wordnet dictionary integration with Solr - help
I have been trying to integrate wordnet dictionary with solr. I used below link to generate indexes using prolog package from wordnet. http://chencer.com/techno/java/lucene/wordnet.html And here are the changes I did in solr : Schema.xml changes: word dict solr.IndexBasedSpellChecker word UTF-8 ./syn_index ./spellchekerFile1 But with above changes wordnet dictionary doesn't seems to be working. 1. Does anybody know whats wrong in my configuration. Any other change required on the solrconfig? 2. Is there any ohter way to import wordnet data in solr and use ? 3. If there is another way to import wordnet as a simple text then I can as well use it in my existing (default) sysname dictinary. Appreciate your help on answering this. Thanks. -- View this message in context: http://www.nabble.com/Wordnet-dictionary-integration-with-Solr---help-tp25963682p25963682.html Sent from the Solr - User mailing list archive at Nabble.com.
ArrayIndexOutOfBoundsException during indexing
I was wondering if anyone might have any insight on the following problem. I'm using the latest Solr code from SVN and indexing around 17m XML records via DIH. With perfect replicability, the following exception is thrown on the same aggregate file (#236, and each XML file has ~50k records), although not necessarily the same exact record. Oddly, it doesn't appear to be due to anything in the file - if I change the ordering or just index the file alone, it works fine. java.lang.ArrayIndexOutOfBoundsException: -65536 at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:479) at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:502) at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:130) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:467) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.ask_geo.update.GeoUpdateProcessor.processAdd(GeoUpdateProcessor.java:75) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) The related Lucene code is a bit thick and I'm having a hard time figuring out what could be going on here. I've added a bit of debug output to some of the intermediary classes and it looks like the exception is generally being thrown while processing one of my dynamic fields (type=tdouble, indexed=t, stored=f). The GeoUpdateProcessor code referenced above is my own, but essentially is the same as the LocalSolr update processor; it just contains a few lines of code that calculates a double value from two document fields and then stores that value in one of these dynamic fields. It hasn't caused any previous problems, only interacts with the underlying framework via cmd.geSolrInputDocument(), doc.getFieldValue(string), doc.addField(string, double), and next.processAdd(cmd), and I've generated a number of indexes with it in the past, so I don't -think- that's a likely culprit. I've tried a run without the update processor and the problem seemed to go away (it made it past the above file, at least), but then this changes so many other factors that I don't know how much that really tells me (reduces field count by ~13 fields, eliminates all dynamic fields, etc.). The only other thing worth mentioning is that I've replaced the Solr trunk Lucene jars with my own compiled versions, based off 2.9.0. The only thing different versus the 'stable' release is that it includes a few additional libraries (no core or contrib classes were modified). I haven't heard of any check-ins between 2.9.0 and 2.9.1-dev that should affect this... Has anyone else run into a problem like this before? Thanks, Aaron
Re: Filter query optimization
On Mon, Oct 19, 2009 at 2:55 PM, Jason Rutherglen wrote: > If a filter query matches nothing, then no additional query should be > performed and no results returned? I don't think we have this today? No, but this is a fast operation anyway (In Solr 1.4 at least). Another thing to watch out for is to not try this with filters that you don't know the size of (or else you may force a popcount on a BitDocSet that would not otherwise have been needed). It could also potentially complicate warming queries - need to be careful that the combination of filters you are warming with matches something, or it would cause the fieldCache entries to not be populated. -Yonik http://www.lucidimagination.com
Re: ArrayIndexOutOfBoundsException during indexing
Thanks for the report Aaron, this definitely looks like a Lucene bug, and I've opened https://issues.apache.org/jira/browse/LUCENE-1995 Can you follow up there (I asked about your index settings). -Yonik http://www.lucidimagination.com On Mon, Oct 19, 2009 at 3:04 PM, Aaron McKee wrote: > I was wondering if anyone might have any insight on the following problem. > I'm using the latest Solr code from SVN and indexing around 17m XML records > via DIH. With perfect replicability, the following exception is thrown on > the same aggregate file (#236, and each XML file has ~50k records), although > not necessarily the same exact record. Oddly, it doesn't appear to be due to > anything in the file - if I change the ordering or just index the file > alone, it works fine. > > java.lang.ArrayIndexOutOfBoundsException: -65536 > at > org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:479) > at > org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:502) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:130) > at > org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:467) > at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) > at > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583) > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) > at > org.apache.solr.ask_geo.update.GeoUpdateProcessor.processAdd(GeoUpdateProcessor.java:75) > at > org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) > at > org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) > > The related Lucene code is a bit thick and I'm having a hard time figuring > out what could be going on here. I've added a bit of debug output to some of > the intermediary classes and it looks like the exception is generally being > thrown while processing one of my dynamic fields (type=tdouble, indexed=t, > stored=f). The GeoUpdateProcessor code referenced above is my own, but > essentially is the same as the LocalSolr update processor; it just contains > a few lines of code that calculates a double value from two document fields > and then stores that value in one of these dynamic fields. It hasn't caused > any previous problems, only interacts with the underlying framework via > cmd.geSolrInputDocument(), doc.getFieldValue(string), doc.addField(string, > double), and next.processAdd(cmd), and I've generated a number of indexes > with it in the past, so I don't -think- that's a likely culprit. I've tried > a run without the update processor and the problem seemed to go away (it > made it past the above file, at least), but then this changes so many other > factors that I don't know how much that really tells me (reduces field count > by ~13 fields, eliminates all dynamic fields, etc.). > > The only other thing worth mentioning is that I've replaced the Solr trunk > Lucene jars with my own compiled versions, based off 2.9.0. The only thing > different versus the 'stable' release is that it includes a few additional > libraries (no core or contrib classes were modified). I haven't heard of any > check-ins between 2.9.0 and 2.9.1-dev that should affect this... > > Has anyone else run into a problem like this before? > > Thanks, > Aaron > >
Re: stats page slow in latest nightly
: I won't have access to the code until monday, but i'm pretty sure this : should be a fairly trivial change (just un-set the estimator on the : CacheEntry objects) done, see notes in SOLR-1292 -Hoss
RE: Solr commits before documents are added
Solr version is 1.3 I am indexing total of 1.4 million documents. Yes, I commit(waitFlush="true" waitSearcher="true") every 100k documents and then one at the end. I have a counter next to addDoc(SolrDocument) statement to keep track of number of documents added. When I query Solr after commit, the total number of documents returned does not match the number of documents added. This happens only when I index millions of documents and not when I index like 500 documents. In this case, I know its the last 20 documents which are not committed because each document has a field 'RECORD_ID' which is assigned sequential number(in java code). When I query Solr using Solr admin interface, the documents with last 20 RECORD_ID are missing.(example the last id is 999,980 instead of 1,000,000) - Sharmila Feak, Todd wrote: > > A few questions to help the troubleshooting. > > Solr version #? > > Is there just 1 commit through Solrj for the millions of documents? > > Or do you do it on a regular interval (every 100k documents for example) > and then one at the end to be sure? > > How are you observing that the last few didn't make it in? Are you looking > at a slave or master? > > -Todd > > -Original Message- From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu] Sent: Monday, October 19, 2009 9:19 AM To: solr-user@lucene.apache.org Subject: Solr commits before documents are added Hi, My application indexes huge number of documents(like in millions). Below is the snapshot of my code where I add all documents to Solr, and then at last issue commit command. I use Solrj. I find that last few documents are not committed to Solr. Is this because adding documents to Solr took longer time and it reached commit command even before it finished adding documents? Is there are way to ensure that solr waits for all documents to be added and then commits? Please advise me how to solve this issue. For loop solrServer.add(doc); // Add document to Solr End for loop solrServer.commit(); // Commit to Solr Thanks, Sharmila -- View this message in context: http://www.nabble.com/Solr-commits-before-documents-are-added-tp25961191p25964770.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filter query optimization
Yonik, > this is a fast operation anyway Can you elaborate on why this is a fast operation? Basically there's a distributed query with a filter, where on a number of the servers, the filter query isn't matching anything, however I'm seeing load on those servers (where nothing matches), so I'm assuming the filter is generated (and cached) which is fine, then the user query is being performed on a filter where no documents match. I could misinterpreting the data, however, I want to find out about this use case regardless as it likely will crop up again for us. -J On Mon, Oct 19, 2009 at 12:07 PM, Yonik Seeley wrote: > On Mon, Oct 19, 2009 at 2:55 PM, Jason Rutherglen > wrote: >> If a filter query matches nothing, then no additional query should be >> performed and no results returned? I don't think we have this today? > > No, but this is a fast operation anyway (In Solr 1.4 at least). > > Another thing to watch out for is to not try this with filters that > you don't know the size of (or else you may force a popcount on a > BitDocSet that would not otherwise have been needed). > > It could also potentially complicate warming queries - need to be > careful that the combination of filters you are warming with matches > something, or it would cause the fieldCache entries to not be > populated. > > -Yonik > http://www.lucidimagination.com >
Version 0.9.3 of the PECL extension for solr has just been released
Version 0.9.3 of the PECL extension for solr has just been released. Some of the methods have been updated and more get* methods have been added to the Query builder classes. The user level documentation was also updated to make the installation instructions a lot clearer. The latest documentation and source code are available from the project home page http://pecl.php.net/package/solr -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Filter query optimization
On Mon, Oct 19, 2009 at 4:45 PM, Jason Rutherglen wrote: > Yonik, > >> this is a fast operation anyway > > Can you elaborate on why this is a fast operation? The scorers will never really be used. The query will be weighted and scorers will be created, but the filter will be checked first and return NO_MORE_DOCS. -Yonik http://www.lucidimagination.com > Basically there's a distributed query with a filter, where on a > number of the servers, the filter query isn't matching anything, > however I'm seeing load on those servers (where nothing > matches), so I'm assuming the filter is generated (and cached) > which is fine, then the user query is being performed on a > filter where no documents match. I could misinterpreting the > data, however, I want to find out about this use case regardless > as it likely will crop up again for us. > > -J > > On Mon, Oct 19, 2009 at 12:07 PM, Yonik Seeley > wrote: >> On Mon, Oct 19, 2009 at 2:55 PM, Jason Rutherglen >> wrote: >>> If a filter query matches nothing, then no additional query should be >>> performed and no results returned? I don't think we have this today? >> >> No, but this is a fast operation anyway (In Solr 1.4 at least). >> >> Another thing to watch out for is to not try this with filters that >> you don't know the size of (or else you may force a popcount on a >> BitDocSet that would not otherwise have been needed). >> >> It could also potentially complicate warming queries - need to be >> careful that the combination of filters you are warming with matches >> something, or it would cause the fieldCache entries to not be >> populated. >> >> -Yonik >> http://www.lucidimagination.com >> >
Core/shard preference
I have a small core performing deltas quickly (core00), and a large core performing deltas slowly (core01), both on the same set of documents. The delta core is cleaned nightly. As you can imagine, at times there are two versions of a document, one in each core. When I execute a query that matches this document, sometimes it will come from the delta core, and some times it will come from the large core. It almost seems random. Here is my query: http://porsche:8181/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP When the delta documents from core00 are returned as desired the access logs show: 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select HTTP/1.1 200 293 1 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select HTTP/1.1 200 506 1 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select HTTP/1.1 200 1151 1 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select HTTP/1.1 200 2597 1 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP HTTP/1.1 200 11881 9 When the documents are returned from core01 the access logs show: 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select HTTP/1.1 200 289 1 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select HTTP/1.1 200 506 1 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select HTTP/1.1 200 3390 1 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP HTTP/1.1 200 11873 9 Any ideas on why there is a difference in the requests made? Is there a way I can tell Solr to prefer the documents in core00? Mark -- View this message in context: http://www.nabble.com/Core-shard-preference-tp25966791p25966791.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boost with wildcard.
> The boost (index time) does not work > when i am searching for a word with a wildcard appended to > the end. > I stumbled on to this "feature" and its pretty much a show > stopper for me. > I am implementing a live search feature where i always have > an wildcard in the last word that is currently being written > by the user. > Will this be fixed anytime soon or does anyone have a > workaround? > > Example : > "playstation*" gives an result with unboosted items but > "playstation" gives the correct one. Javadoc of SolrQueryParser says: * This class also deviates from the Lucene QueryParser by using * ConstantScore versions of RangeQuery and PrefixQuery to prevent * TooManyClauses exceptions. * If you want to disable this behavior you can modify protected Query getPrefixQuery(String field, String termStr) method of SolrQueryParser. But for this to work you also need to write a class that extends QParserPlugin and uses you new SolrQueryParser. You need to define your new QParserPlugin in solrconfig.xml as described here [1] [1] http://wiki.apache.org/solr/SolrPlugins#QParserPlugin But prefix queries can easily cause TooManyBooleanClauses if the prefix is short like a*. Since it is going to OR all terms starting with a. Performance can be killer in such cases. I think thats why Solr uses ConstantScore versions. If you want you can increase this number (default is 1024) in solrconfig.xml 1024 I do not know how to solve your problem without writing custom code. Hope this helps.
Re: Core/shard preference
Distributed Search is designed only for disjoint cores. The document list from each core is returned sorted by the relevance score. The distributed searcher merges these sorted lists. Solr does not implement "distributed IDF", which essentially means distributed coordinated scoring. All scoring happens inside each core, relative to that core's contents. The resulting score numbers are not coordinated with each other, and you will get random results. There is no way to say "use this core's results" because the searches are not compared all at once. Only the page of results fetched is compared, so there's no way to suppress a result in the second page if it was already found in the first. On Mon, Oct 19, 2009 at 3:30 PM, markwaddle wrote: > > I have a small core performing deltas quickly (core00), and a large core > performing deltas slowly (core01), both on the same set of documents. The > delta core is cleaned nightly. As you can imagine, at times there are two > versions of a document, one in each core. When I execute a query that > matches this document, sometimes it will come from the delta core, and some > times it will come from the large core. It almost seems random. Here is my > query: > > http://porsche:8181/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP > > When the delta documents from core00 are returned as desired the access logs > show: > > 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select > HTTP/1.1 200 293 1 > 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select > HTTP/1.1 200 506 1 > 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select > HTTP/1.1 200 1151 1 > 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select > HTTP/1.1 200 2597 1 > 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET > /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP > HTTP/1.1 200 11881 9 > > When the documents are returned from core01 the access logs show: > 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select > HTTP/1.1 200 289 1 > 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select > HTTP/1.1 200 506 1 > 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select > HTTP/1.1 200 3390 1 > 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET > /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP > HTTP/1.1 200 11873 9 > > Any ideas on why there is a difference in the requests made? Is there a way > I can tell Solr to prefer the documents in core00? > > Mark > -- > View this message in context: > http://www.nabble.com/Core-shard-preference-tp25966791p25966791.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com
Re: Solr commits before documents are added
commit(waitFlush="true", waitSearcher="true") waits for the entire operation and when it finishes, all 1 million documents should be searchable. Please try this same test with Solr 1.4 and post your results. To make it easier, here is the first release candidate: http://people.apache.org/~gsingers/solr/1.4.0-RC/ On Mon, Oct 19, 2009 at 1:06 PM, SharmilaR wrote: > > Solr version is 1.3 > I am indexing total of 1.4 million documents. Yes, I commit(waitFlush="true" > waitSearcher="true") every 100k documents and then one at the end. > I have a counter next to addDoc(SolrDocument) statement to keep track of > number of documents added. When I query Solr after commit, the total number > of documents returned does not match the number of documents added. This > happens only when I index millions of documents and not when I index like > 500 documents. In this case, I know its the last 20 documents which are not > committed because each document has a field 'RECORD_ID' which is assigned > sequential number(in java code). When I query Solr using Solr admin > interface, the documents with last 20 RECORD_ID are missing.(example the > last id is 999,980 instead of 1,000,000) > > - Sharmila > > > Feak, Todd wrote: >> >> A few questions to help the troubleshooting. >> >> Solr version #? >> >> Is there just 1 commit through Solrj for the millions of documents? >> >> Or do you do it on a regular interval (every 100k documents for example) >> and then one at the end to be sure? >> >> How are you observing that the last few didn't make it in? Are you looking >> at a slave or master? >> >> -Todd >> >> > -Original Message- > From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu] > Sent: Monday, October 19, 2009 9:19 AM > To: solr-user@lucene.apache.org > Subject: Solr commits before documents are added > > Hi, > > My application indexes huge number of documents(like in millions). Below > is the snapshot of my code where I add all documents to Solr, and then > at last issue commit command. I use Solrj. I find that last few > documents are not committed to Solr. Is this because adding documents > to Solr took longer time and it reached commit command even before it > finished adding documents? Is there are way to ensure that solr waits > for all documents to be added and then commits? Please advise me how to > solve this issue. > > > > For loop > > solrServer.add(doc); // Add document to Solr > > End for loop > > solrServer.commit(); // Commit to Solr > > > > > > Thanks, > > Sharmila > > > > > > -- > View this message in context: > http://www.nabble.com/Solr-commits-before-documents-are-added-tp25961191p25964770.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com
Index Corruption (possibly during commit)
We have an indexing script which has been running for a couple of weeks now without problems. It indexes documents and then periodically commit (which is a tad redundant I suppose) both via the HTTP interface. All documents are indexed to a master and a slave rsyncs them off using the standard 1.3.0 replication. Recently the indexing script got into problems when the commit was taking longer than the request timeout. I killed the script, did a commit by hand (using bin/commit) and then started to index again and it still wouldn't commit. We then tried to go to the stats page and got the error org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _mib: fieldsReader shows 1 but segmentInfo shows 718 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960) at org.apache.solr.core.SolrCore.(SolrCore.java:470) at This is a stock 1.3.0 running off tomcat 6.0.20 with java version "1.6.0_13" Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode) Linux solr.local 2.6.18-128.1.10.el5 #1 SMP Thu May 7 10:35:59 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux Plenty of RAM and disk space (usage is 31% - 353G used from 534G) CheckIndex says Opening index @ index/ Segments file=segments_c8z numSegments=28 version=FORMAT_HAS_PROX [Lucene 2.4] Checking only these segments: _mib: 22 of 28: name=_mib docCount=718 compound=false hasProx=true numFiles=9 size (MB)=0.029 has deletions [delFileName=_mib_1.del] test: open reader.FAILED WARNING: fixIndex() would remove reference to this segment; full exception: org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _mib: fieldsReader shows 1 but segmentInfo shows 718 at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:282) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:591) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:491) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903) WARNING: 1 broken segments (containing 718 documents) detected WARNING: would write new segments file, and 718 documents would be lost, if -fix were specified Any ideas? We can restore from back ups and back fill but really we'd love to know what caused this so we can avoid a repetition. Simon
Re: Filter query optimization
Ok, thanks, new Lucene 2.9 features. On Mon, Oct 19, 2009 at 2:33 PM, Yonik Seeley wrote: > On Mon, Oct 19, 2009 at 4:45 PM, Jason Rutherglen > wrote: >> Yonik, >> >>> this is a fast operation anyway >> >> Can you elaborate on why this is a fast operation? > > The scorers will never really be used. > The query will be weighted and scorers will be created, but the filter > will be checked first and return NO_MORE_DOCS. > > -Yonik > http://www.lucidimagination.com > >> Basically there's a distributed query with a filter, where on a >> number of the servers, the filter query isn't matching anything, >> however I'm seeing load on those servers (where nothing >> matches), so I'm assuming the filter is generated (and cached) >> which is fine, then the user query is being performed on a >> filter where no documents match. I could misinterpreting the >> data, however, I want to find out about this use case regardless >> as it likely will crop up again for us. >> >> -J >> >> On Mon, Oct 19, 2009 at 12:07 PM, Yonik Seeley >> wrote: >>> On Mon, Oct 19, 2009 at 2:55 PM, Jason Rutherglen >>> wrote: If a filter query matches nothing, then no additional query should be performed and no results returned? I don't think we have this today? >>> >>> No, but this is a fast operation anyway (In Solr 1.4 at least). >>> >>> Another thing to watch out for is to not try this with filters that >>> you don't know the size of (or else you may force a popcount on a >>> BitDocSet that would not otherwise have been needed). >>> >>> It could also potentially complicate warming queries - need to be >>> careful that the combination of filters you are warming with matches >>> something, or it would cause the fieldCache entries to not be >>> populated. >>> >>> -Yonik >>> http://www.lucidimagination.com >>> >> >
Retrieve Matching Term
Hi, Is it possible to get the matching terms from your query for each document returned without using highlighting. For example if you have the query "aaa bbb ccc" and one of the documents has the term "aaa" and another document has the term "bbb" and "ccc". To have Solr return: Document 1: "aaa" Document 2: "bbb ccc" I was told this is possible using Term Vectors. I have not been able to find a way to do this using Term Vectors. The only reason I am against using highlighting is for performance reasons. Thanks. -- View this message in context: http://www.nabble.com/Retrieve-Matching-Term-tp25967886p25967886.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr commits before documents are added
On Mon, Oct 19, 2009 at 7:39 PM, Lance Norskog wrote: > commit(waitFlush="true", waitSearcher="true") waits for the entire > operation and when it finishes, all 1 million documents should be > searchable. That waits for the commit to complete, but not any adds that may be happening in parallel (that's pretty much impossible). If the client uses multiple threads to do adds, it's currently the responsibility of the client to wait for all of the pending adds to complete before calling commit. -Yonik http://www.lucidimagination.com > Please try this same test with Solr 1.4 and post your results. To make > it easier, here is the first release candidate: > > http://people.apache.org/~gsingers/solr/1.4.0-RC/ > > > On Mon, Oct 19, 2009 at 1:06 PM, SharmilaR > wrote: >> >> Solr version is 1.3 >> I am indexing total of 1.4 million documents. Yes, I commit(waitFlush="true" >> waitSearcher="true") every 100k documents and then one at the end. >> I have a counter next to addDoc(SolrDocument) statement to keep track of >> number of documents added. When I query Solr after commit, the total number >> of documents returned does not match the number of documents added. This >> happens only when I index millions of documents and not when I index like >> 500 documents. In this case, I know its the last 20 documents which are not >> committed because each document has a field 'RECORD_ID' which is assigned >> sequential number(in java code). When I query Solr using Solr admin >> interface, the documents with last 20 RECORD_ID are missing.(example the >> last id is 999,980 instead of 1,000,000) >> >> - Sharmila >> >> >> Feak, Todd wrote: >>> >>> A few questions to help the troubleshooting. >>> >>> Solr version #? >>> >>> Is there just 1 commit through Solrj for the millions of documents? >>> >>> Or do you do it on a regular interval (every 100k documents for example) >>> and then one at the end to be sure? >>> >>> How are you observing that the last few didn't make it in? Are you looking >>> at a slave or master? >>> >>> -Todd >>> >>> >> -Original Message- >> From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu] >> Sent: Monday, October 19, 2009 9:19 AM >> To: solr-user@lucene.apache.org >> Subject: Solr commits before documents are added >> >> Hi, >> >> My application indexes huge number of documents(like in millions). Below >> is the snapshot of my code where I add all documents to Solr, and then >> at last issue commit command. I use Solrj. I find that last few >> documents are not committed to Solr. Is this because adding documents >> to Solr took longer time and it reached commit command even before it >> finished adding documents? Is there are way to ensure that solr waits >> for all documents to be added and then commits? Please advise me how to >> solve this issue. >> >> >> >> For loop >> >> solrServer.add(doc); // Add document to Solr >> >> End for loop >> >> solrServer.commit(); // Commit to Solr >> >> >> >> >> >> Thanks, >> >> Sharmila >> >> >> >> >> >> -- >> View this message in context: >> http://www.nabble.com/Solr-commits-before-documents-are-added-tp25961191p25964770.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Lance Norskog > goks...@gmail.com >
Re: Core/shard preference
Although shards should be disjoint, Solr "tolerates" duplication (won't return duplicates in the main results list, but doesn't make any effort to correct facet counts, etc). Currently, whichever shard responds first wins. The relevant code is around line 420 in QueryComponent.java: String prevShard = uniqueDoc.put(id, srsp.getShard()); if (prevShard != null) { // duplicate detected numFound--; // For now, just always use the first encountered since we can't currently // remove the previous one added to the priority queue. If we switched // to the Java5 PriorityQueue, this would be easier. continue; // make which duplicate is used deterministic based on shard // if (prevShard.compareTo(srsp.shard) >= 0) { // TODO: remove previous from priority queue // continue; // } } So it's certainly possible to make it deterministic, we just haven't done it yet. -Yonik http://www.lucidimagination.com On Mon, Oct 19, 2009 at 7:30 PM, Lance Norskog wrote: > Distributed Search is designed only for disjoint cores. > > The document list from each core is returned sorted by the relevance > score. The distributed searcher merges these sorted lists. Solr does > not implement "distributed IDF", which essentially means distributed > coordinated scoring. All scoring happens inside each core, relative to > that core's contents. The resulting score numbers are not coordinated > with each other, and you will get random results. > > There is no way to say "use this core's results" because the searches > are not compared all at once. Only the page of results fetched is > compared, so there's no way to suppress a result in the second page if > it was already found in the first. > > On Mon, Oct 19, 2009 at 3:30 PM, markwaddle wrote: >> >> I have a small core performing deltas quickly (core00), and a large core >> performing deltas slowly (core01), both on the same set of documents. The >> delta core is cleaned nightly. As you can imagine, at times there are two >> versions of a document, one in each core. When I execute a query that >> matches this document, sometimes it will come from the delta core, and some >> times it will come from the large core. It almost seems random. Here is my >> query: >> >> http://porsche:8181/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP >> >> When the delta documents from core00 are returned as desired the access logs >> show: >> >> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select >> HTTP/1.1 200 293 1 >> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select >> HTTP/1.1 200 506 1 >> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select >> HTTP/1.1 200 1151 1 >> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select >> HTTP/1.1 200 2597 1 >> 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET >> /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP >> HTTP/1.1 200 11881 9 >> >> When the documents are returned from core01 the access logs show: >> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select >> HTTP/1.1 200 289 1 >> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select >> HTTP/1.1 200 506 1 >> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select >> HTTP/1.1 200 3390 1 >> 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET >> /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP >> HTTP/1.1 200 11873 9 >> >> Any ideas on why there is a difference in the requests made? Is there a way >> I can tell Solr to prefer the documents in core00? >> >> Mark >> -- >> View this message in context: >> http://www.nabble.com/Core-shard-preference-tp25966791p25966791.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Lance Norskog > goks...@gmail.com >
Re: Retrieve Matching Term
If you query looks like this - q=(myField:aaa myField:bbb myField:ccc) you would get the desired results for any tokenized field (e.g. text) called myField. Cheers Avlesh On Tue, Oct 20, 2009 at 6:28 AM, angry127 wrote: > > Hi, > > Is it possible to get the matching terms from your query for each document > returned without using highlighting. > > For example if you have the query "aaa bbb ccc" and one of the documents > has > the term "aaa" and another document has the term "bbb" and "ccc". > > To have Solr return: > > Document 1: "aaa" > Document 2: "bbb ccc" > > I was told this is possible using Term Vectors. I have not been able to > find > a way to do this using Term Vectors. The only reason I am against using > highlighting is for performance reasons. > > Thanks. > -- > View this message in context: > http://www.nabble.com/Retrieve-Matching-Term-tp25967886p25967886.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Is Relational Mapping (foreign key) possible in solr ??
Hi Jerome , thanks for your response. I never knew about multivalued fields. Will give a try about it and see if that suits my need. But i dont understand this * You could have a mixed index with user and project entries in the same index, so if you search for a name, you'd find users and projects matching that name. Could you please tell me in detail as how i can do that. Jérôme Etévé wrote: > > Hi, > > here's what you could do: > > * Use multivalued fields instead of 'comma separated values', so you > won't need a separator. > * Store project identifiers in the user index. > > Denormalised projects informations in a user entry will fatally need > re-indexing lot of user entries when project info changes. > > * You could have a mixed index with user and project entries in the > same index, so if you search for a name, you'd find users and projects > matching that name. > > Jerome. > > 2009/10/19 ashokcz : >> >> Hi i browsed through the solr docs and user forums and what i infer is we >> cant use solr to store Relational >> Mapping(foreign key) in solr . >> >> but just want to know if any chances of doing the same. >> >> I have two tables User table (with 1,00,000 entries ) and project table >> with (200 entries ). >> User table columns : userid , name ,country , location , etc. >> Project tables Columns : project name , description , business unit , >> project type . >> Here User Location , Country , Project Name , Project business unit , >> project type are faceted >> A user can be mapped to multiple projects. >> In solr i store the details like this >> [ >> { >> userId:1234; >> userName:ABC; >> Country:US; >> Location:NY; >> Project Name:Project1,Project2; >> Project Description:Project1,Project2; >> Project business unit:unit1,unit2; >> Project type:Type1,Type2 >> } >> ] >> >> With this structure i could get faceted details about both user data and >> project data . >> >> But here i face 2 Problems . >> >> 1.A project can be mapped to many users say 10,000 Users . So if i change >> a >> project name then i end >> up indexing 10,000 Records which is a very time consuming work. >> >> 2.for Fields like Project Description i could not find any proper >> delimiter >> . for other fields comma (,) is >> >> okay but being description i could not use any specific delimiter .This >> is >> not faceted but still in search results i need to take this out and show >> the >> project details in tabular format. and i use delimiter to split it .For >> other project fields like Project Name and Type i could do it but not >> for >> this Project Description field >> >> So i expect is there any way of storing the data as relational records >> like >> in user details where we will have field called project Id and data will >> be >> 1,2 which refers to project records primary key in solr and still >> preserve >> the faceted approach. >> >> As for my knowledge my guess is it cant be done ??? >> Am i correct ??? >> If so then how we can achieve the solutions to my problem?? >> Pls if someone could share some ideas it will be useful. >> -- >> View this message in context: >> http://www.nabble.com/Is-Relational-Mapping-%28foreign-key%29-possible-in-solrtp25955068p25955068.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Jerome Eteve. > http://www.eteve.net > jer...@eteve.net > > -- View this message in context: http://www.nabble.com/Is-Relational-Mapping-%28foreign-key%29-possible-in-solrtp25955068p25969540.html Sent from the Solr - User mailing list archive at Nabble.com.