Re: delete changed?
: curl http://192.168.7.6:8080/solr0/update --data-binary : 'deletequerynodeid:20/query/delete' : : i remember it is ok when i use solr 1.1 ... : HTTP Status 400 - missing content stream please note the Upgrading from Solr 1.1 section of the 1.2 CHANGES.txt file, which states... The Solr Request Handler framework has been updated in two key ways: First, if a Request Handler is registered in solrconfig.xml with a name starting with / then it can be accessed using path-based URL, instead of using the legacy /select?qt=name URL structure. Second, the Request Handler framework has been extended making it possible to write Request Handlers that process streams of data for doing updates, and there is a new-style Request Handler for XML updates given the name of /update in the example solrconfig.xml. Existing installations without this /update handler will continue to use the old update servlet and should see no changes in behavior. For new-style update handlers, errors are now reflected in the HTTP status code, Content-type checking is more strict, and the response format has changed and is controllable via the wt parameter. -Hoss
Re: Multi-language indexing and searching
Hi Hoss. I've tried that yesterday using the same approach you just said (I've created the base fields for any language with basic analyzers) and it worked alright. Thanks again for you time. Regards, Daniel On 20/6/07 21:00, Chris Hostetter [EMAIL PROTECTED] wrote: : So far it sounds good for my needs, now I'm going to try if my other : features still work (I'm worried about highlighting as I'm going to return a : different field)... i'm not really a highlighting guy so i'm not sure ... but if you're okay with *simple* highlighting you can probably just highlight your title field (using a whitespace analyzer or something) and get decent results without needing to worry about the fact that you are using differnet langauges. -Hoss http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: Multiple doc types in schema
SOLR-215 support multiple indices on a single Solr instance. It does *not* support searching of multiple indices at once (e.g. parallel search) and merging of results. This has nothing to do with NFS, though. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: James liu [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, June 21, 2007 3:45:06 AM Subject: Re: Multiple doc types in schema I see SOLR-215 from this mail. Does it now really support multi index and search it will return merged data? for example: i wanna search: aaa, and i have index1, index2, index3, index4it should return the result from index1,index2,index3, index4 and merge result by score, datetime, or other thing. Does it support NFS and how its performance? 2007/6/21, Otis Gospodnetic [EMAIL PROTECTED]: This sounds like a potentially good use-case for SOLR-215! See https://issues.apache.org/jira/browse/SOLR-215 Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Chris Hostetter [EMAIL PROTECTED] To: solr-user@lucene.apache.org; Jack L [EMAIL PROTECTED] Sent: Wednesday, June 6, 2007 6:58:10 AM Subject: Re: Multiple doc types in schema : This is based on my understanding that solr/lucene does not : have the concept of document type. It only sees fields. : : Is my understanding correct? it is. : It seems a bit unclean to mix fields of all document types : in the same schema though. Or, is there a way to allow multiple : document types in the schema, and specify what type to use : when indexing and searching? it's really just an issue of semantics ... the schema.xml is where you list all of the fields you need in your index, any notion of doctype is entire artificial ... you could group all of the fields relating to doctypeA in one section of the schema.xml, then have a big !-- ##...## -- line and then list the fields in doctypeB, etc... but wat if there are fields you use in both doctypes ? .. how much you mix them is entirely up to you. -Hoss -- regards jl
Re: problems getting data into solr index
Hi Mike, Brian Thanks for helping with this, and for clearing up my misunderstanding. Solr the python module and Solr the package being two different things, I've got you. The issues I have are compounded by the fact that we're hovering between using the Unicode branch of Django and the older branch that has newforms, both of which have an impact on what I'm trying to do. It's getting closer to being resolved, and it's down to your advice, so thanks again. -- View this message in context: http://www.nabble.com/problems-getting-data-into-solr-index-tf3915542.html#a11230922 Sent from the Solr - User mailing list archive at Nabble.com.
Re: All facet.fields for a given facet.query?
: Faceting on manufacturers and categories first and than present the : corresponding facets might be used under some circumstances, but in my case : the category structure is quite deep, detailed and complex. So when : the user enters a query I like to say to him Look, here are the : manufacturers and categories with matches to your query, choose one if you : want, but maybe there is another one with products that better fit your : needs or products that you didn't even know about. So maybe you like to : filter based on the following attributes. Something like this ;o) categories was just an example i used because it tends to be a common use case ... my point is the decision about which facet qualifies for the maybe there is another one with products that better fit your needs part of the response either requires computing counts for *every* facet constraint and then looking at them to see which ones provide good distribution, or by knowing something more about your metadata (ie: having stats that show the majority of people who search on the word canon want to facet on megapixels) .. this is where custom biz logic comes in, becuase in a lot of situations computing counts for every possible facet may not be practical (even if the syntax to request it was easier) I get your point, but how to know where additional metadata is of value if not just trying? Currently I start with a generic approach to see what really is in the product data, to get an overview of the quality of the data and what happens if I use the data in the new search solution. Then I can decide what is to do to optimize the system, i.e. try to reduce the count of attributes, get the marketing to split somewhat generic attributes into more detailed ones, find a way to display the most relevant facets for the current query first and so on... Tom
Re: Multiple doc types in schema
I used Solr with indexes on NFS and I do not recommend it. It was either 100 or 1000 times slower than local disc for indexing, I forget which. Unusable. This is not a problem with Solr/Lucene, I have seen the same NFS performance cost with other search engines. wunder On 6/21/07 3:22 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: SOLR-215 support multiple indices on a single Solr instance. It does *not* support searching of multiple indices at once (e.g. parallel search) and merging of results. This has nothing to do with NFS, though. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: James liu [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, June 21, 2007 3:45:06 AM Subject: Re: Multiple doc types in schema I see SOLR-215 from this mail. Does it now really support multi index and search it will return merged data? for example: i wanna search: aaa, and i have index1, index2, index3, index4it should return the result from index1,index2,index3, index4 and merge result by score, datetime, or other thing. Does it support NFS and how its performance? 2007/6/21, Otis Gospodnetic [EMAIL PROTECTED]: This sounds like a potentially good use-case for SOLR-215! See https://issues.apache.org/jira/browse/SOLR-215 Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Chris Hostetter [EMAIL PROTECTED] To: solr-user@lucene.apache.org; Jack L [EMAIL PROTECTED] Sent: Wednesday, June 6, 2007 6:58:10 AM Subject: Re: Multiple doc types in schema : This is based on my understanding that solr/lucene does not : have the concept of document type. It only sees fields. : : Is my understanding correct? it is. : It seems a bit unclean to mix fields of all document types : in the same schema though. Or, is there a way to allow multiple : document types in the schema, and specify what type to use : when indexing and searching? it's really just an issue of semantics ... the schema.xml is where you list all of the fields you need in your index, any notion of doctype is entire artificial ... you could group all of the fields relating to doctypeA in one section of the schema.xml, then have a big !-- ##...## -- line and then list the fields in doctypeB, etc... but wat if there are fields you use in both doctypes ? .. how much you mix them is entirely up to you. -Hoss
Re: Multiple doc types in schema
Otis, Thanks for the link and the work ! Maybe around september, I will need this patch, if it's not already commit to the Solr sources. I will also need multiple indexes searches, but understand that there is no simple, fast and genereric solution in solr context. Maybe I should lose solr caching, but it seems not an impossible work to design its own custom request handler to query different indexes, like lucene allow it. SOLR-215 support multiple indices on a single Solr instance. It does *not* support searching of multiple indices at once (e.g. parallel search) and merging of results. -- Frédéric Glorieux École nationale des chartes direction des nouvelles technologies et de l'informatique
Re: Multiple doc types in schema
On 6/21/07, Frédéric Glorieux [EMAIL PROTECTED] wrote: I will also need multiple indexes searches, Do you mean: 1) Multiple unrelated indexes with different schemas, that you will search separately... but you just want them in the same JVM for some reason. 2) Multiple indexes with different schemas, search will search across all or some subset and combine the results (federated search) 3) Multiple indexes with the same schema, each index is a shard that contains part of the total collection. Search will merge results across all shards to give appearance of a single large collection (distributed search). -Yonik
Re: Recent updates to Solrsharp
great, thanks Yonik. On 6/20/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 6/21/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: As an aside, it would be nice to record these issues more granularly in JIRA. Could we get a component created for our client library, similar to java/php/ruby? Done. -Yonik
Re: DismaxRequestHandler reports sort by score as invalid
Because score desc is the default Lucene Solr behavior when no explicit sort is specified, QueryParsing.parseSort() returns a null sort so that the non-sort versions of the query execution routines get called. However the caller SolrPluginUtils.parseSort issues that warning whenever it gets a null sort. Perhaps that interaction should be altered, or perhaps it should be left in as a sort of are you sure you want to tell me what I already know?, er, warning. But as it stands you can simply ignore it, or else leave the sort off entirely when it is score desc; if the behavior were different in those two cases it would certainly be a bug, but as you noted that's not the case. - J.J. At 10:50 AM -0400 6/21/07, gerard sychay wrote: Hello all, This is a minor issue and does not affect Solr operation, but I could not find it in the issue tracking. To reproduce: - I set up a Solr server with the example docs indexed by following the Solr tutorial. - I clicked on the following example search under the Sorting section: http://localhost:8983/solr/select/?indent=onq=videosort=score+desc - I added a qt parameter to try out the DisMax Request Handler: http://localhost:8983/solr/select/?indent=onq=videosort=score+descqt=dismax - In the Solr output, I get: WARNING: Invalid sort score desc was specified, ignoring Jun 21, 2007 10:33:37 AM org.apache.solr.core.SolrCore execute INFO: /select/ sort=score+descindent=onqt=dismaxq=video 0 131 The WARNING line is the issue. It does not seem that it should be there. But as I said, it does not appear to affect operation as the results are sorted by score descending anyway (because that is the default?).
Re: DismaxRequestHandler reports sort by score as invalid
A little background: I originally conceived of query operation chains (based on some of my previous hacking in mechanical investing stock screens: select all stocks; take top 10% lowest PE; then take the top 20 highest growth rate; then sort descending by 13 week relative strength). So, I thought that the next thing after a query *might* be a sort, so getSort() shouldn't throw an exception if it wasn't. I think this idea is now outdated (we know when we have a sort spec) and an exception should just be thrown on a syntax error. -Yonik On 6/21/07, J.J. Larrea [EMAIL PROTECTED] wrote: Because score desc is the default Lucene Solr behavior when no explicit sort is specified, QueryParsing.parseSort() returns a null sort so that the non-sort versions of the query execution routines get called. However the caller SolrPluginUtils.parseSort issues that warning whenever it gets a null sort. Perhaps that interaction should be altered, or perhaps it should be left in as a sort of are you sure you want to tell me what I already know?, er, warning. But as it stands you can simply ignore it, or else leave the sort off entirely when it is score desc; if the behavior were different in those two cases it would certainly be a bug, but as you noted that's not the case. - J.J. At 10:50 AM -0400 6/21/07, gerard sychay wrote: Hello all, This is a minor issue and does not affect Solr operation, but I could not find it in the issue tracking. To reproduce: - I set up a Solr server with the example docs indexed by following the Solr tutorial. - I clicked on the following example search under the Sorting section: http://localhost:8983/solr/select/?indent=onq=videosort=score+desc - I added a qt parameter to try out the DisMax Request Handler: http://localhost:8983/solr/select/?indent=onq=videosort=score+descqt=dismax - In the Solr output, I get: WARNING: Invalid sort score desc was specified, ignoring Jun 21, 2007 10:33:37 AM org.apache.solr.core.SolrCore execute INFO: /select/ sort=score+descindent=onqt=dismaxq=video 0 131 The WARNING line is the issue. It does not seem that it should be there. But as I said, it does not appear to affect operation as the results are sorted by score descending anyway (because that is the default?).
Re: Multiple doc types in schema
Hi Sonic, I will also need multiple indexes searches, Do you mean: 2) Multiple indexes with different schemas, search will search across all or some subset and combine the results (federated search) Exactly that. I'm comming from a quite old lucene based project, called SDX http://www.nongnu.org/sdx/docs/html/doc-sdx2/en/presentation/bases.html. Sorry for the link, the project is mainly documented in french. The framework is cocoon base, maybe heavy now. It allows to host multiple applications, with multiple bases, a base is a kind of Solr Schema, in 2000. From this experience, I can say cross search between different schemas is possible, and users may find it important. Take for example a library. They have different collections, lets say : csv records obtained from digitized photos, a light model, no write waited ; and a complex librarian model documented every day. These collections share at least a title and author field, and should be opened behind the same form for public ; but each one should have also its own application, according to its information model. With the SDX framework upper, I know real life applications with 30 lucene indexes. It's possible, because lucene allow it (MultiReader) http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/MultiReader.html. -- Frédéric Glorieux École nationale des chartes direction des nouvelles technologies et de l'informatique 1) Multiple unrelated indexes with different schemas, that you will search separately... but you just want them in the same JVM for some reason. 3) Multiple indexes with the same schema, each index is a shard that contains part of the total collection. Search will merge results across all shards to give appearance of a single large collection (distributed search). -Yonik
Re: Multiple doc types in schema
On 6/21/07, Frédéric Glorieux [EMAIL PROTECTED] wrote: I will also need multiple indexes searches, Do you mean: 2) Multiple indexes with different schemas, search will search across all or some subset and combine the results (federated search) Exactly that. I'm comming from a quite old lucene based project, called SDX http://www.nongnu.org/sdx/docs/html/doc-sdx2/en/presentation/bases.html. Sorry for the link, the project is mainly documented in french. The framework is cocoon base, maybe heavy now. It allows to host multiple applications, with multiple bases, a base is a kind of Solr Schema, in 2000. From this experience, I can say cross search between different schemas is possible, and users may find it important. Take for example a library. They have different collections, lets say : csv records obtained from digitized photos, a light model, no write waited ; and a complex librarian model documented every day. These collections share at least a title and author field, and should be opened behind the same form for public ; but each one should have also its own application, according to its information model. This doesn't sound like true federated search, since you have a number of fields that are the same in each index that you search across, and you treat them all the same. This is functionally equivalent to having a single schema and a single index. You can still have multiple applications that query the single collection differently. Depending on update patterns and index sizes, you can probably get better efficiency with multiple indexes, but not really more functionality (in your case), right? -Yonik
commit script with solr 1.2 response format
I just started running the scripts and The commit script seems to run fine, but it says there was an error. I looked into it, and the scripts expect 1.1 style response: result status=0/result 1.2 /update returns: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime44/int /lst /response ryan
Re: Multiple doc types in schema
Thanks Yonik to share your reflexion, This doesn't sound like true federated search, I'm affraid to not understand federated search, you seems to have a precise idea behind the head. since you have a number of fields that are the same in each index that you search across, and you treat them all the same. This is functionally equivalent to having a single schema and a single index. You can still have multiple applications that query the single collection differently. Before a pointer or a web example from you, what you describe seems to me like implement a complete database with a single table (not easy to understand and maintain, but possible). To my experience, a collection is a schema, with thousands or millions XML documents, could be 10, 20 or more fields, and search configuration is generated from a kind of data schema (there's no real standard for explaining for example, that a title or a subject need one field for exact match, and another for word search). If an index was too big (hopefully I never touch this limit with lucene), I guess there are solutions. My problem is to maintain different collections with each their intellectual logic, some shared FieldNames, like Dublin Core, or at least fulltext, but also specific for each ones. Depending on update patterns and index sizes, you can probably get better efficiency with multiple indexes, but not really more functionality (in your case), right? Maybe let it understandable could be accepted as a functionality ? Perhaps less now, but it was a time when lucene index could become corrupted, so that separate them was important. I guess that those specific problems will not be Solr priorities, but till I have been corrected, I'm still feeling that multiple indexes are useful. -- Frédéric Glorieux École nationale des chartes direction des nouvelles technologies et de l'informatique
Re: Multiple doc types in schema
After further reading, especially http://people.apache.org/~hossman/apachecon2006us/faceted-searching-with-solr.pdf (Thanks Hoss) Depending on update patterns and index sizes, you can probably get better efficiency with multiple indexes, but not really more functionality (in your case), right? Maybe I'm approaching your point of view : Loose Schema with Dynamic Fields, this is probably my solution. There's something strange to me to consider a lucene index as a blob, but if it works for bigger than me, I should follow. So, it means one fieldtype by analyzer, and the datamodel logic is only from the collection side. I think I got my idea for september, but I would be very glad if you have something to add. -- Frédéric Glorieux École nationale des chartes direction des nouvelles technologies et de l'informatique
Re: commit script with solr 1.2 response format
aha,,same question i found few days ago. i m sorry to forget submit it. 2007/6/22, Yonik Seeley [EMAIL PROTECTED]: On 6/21/07, Ryan McKinley [EMAIL PROTECTED] wrote: I just started running the scripts and The commit script seems to run fine, but it says there was an error. I looked into it, and the scripts expect 1.1 style response: result status=0/result 1.2 /update returns: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime44/int /lst /response I guess we should look for 'status=0' ? Or, if you get a response code of 200, it's a success unless you see status=nonzero -Yonik -- regards jl
Re: commit script with solr 1.2 response format
: I guess we should look for 'status=0' ? that wouldn't quite work. : Or, if you get a response code of 200, it's a success unless : you see status=nonzero we could always make it an option in the scripts.conf file -- what substring to match on ... just in case people want to write their own crazy commit handler and still use the script ... but that may be overkill. -Hoss
RE: Faceted Search!
: generating XML feed file and feeding to the Solr server. However, I was : also looking into implementing having sub-categories within the : categories if that make sense. For example, in the shopper.com we have : the categories of by price, manufactures and so on and with in them,they : are sub categories (price is sub-cat into $100, 100-200, 200-300 etc). : I don't have constraint in terms of technology. If I have to implement : db server I won't mind implementing it. Anyway, plz shine a light on : how would you handle this issue. Any suggestion will be appericated. the shopper.com solution is very VERY specialized and specific to the datamodel used to manage the category metadata if i had to do it overagain i would do it a lot differnetly. way way back there was a thread about complex faceting where i included some ideas on a possible facet configuration xml syntax which could then be parsed by a request handler, with different types of faceting (simple query, ranges, based on terms, prefix) delegated to helper classes. there was also the idea of being able groups facets or make facets depend on other facets (ie: don't show the author facet untill a value has been picked from the author_initial facet) nothing ever really came of it, but it's how i'd probably approach trying to tackle something like the shopper.com functionality if CNET threw away our product metadata data model and started from scratch. http://www.nabble.com/metadata-about-result-sets--t1243321.html#a3334244 -Hoss