Re: any difference between using collection vs. shard in URL?
Do keep one thing in mind though. If you are already doing the work of figuring out the right shard leader (through solrJ or otherwise), using that location with just the collection name might be suboptimal if there are multiple shard leaders present in the same instance -- the collection name just goes to *some* shard leader and not necessarily to the one where your document is destined. If it chooses the wrong one, it will lead to a HTTP request to itself. On 5 Nov 2014 15:33, Shalin Shekhar Mangar shalinman...@gmail.com wrote: There's no difference between the two. Even if you send updates to a shard url, it will still be forwarded to the right shard leader according to the hash of the id (assuming you're using the default compositeId router). Of course, if you happen to hit the right shard leader then it is just an internal forward and not an extra network hop. The advantage with using the collection name is that you can hit any SolrCloud node (even the ones not hosting this collection) and it will still work. So for a non Java client, a load balancer can be setup in front of the entire cluster and things will just work. On Wed, Nov 5, 2014 at 8:50 PM, Ian Rose ianr...@fullstory.com wrote: If I add some documents to a SolrCloud shard in a collection alpha, I can post them to /solr/alpha/update. However I notice that you can also post them using the shard name, e.g. /solr/alpha_shard4_replica1/update - in fact this is what Solr seems to do internally (like if you send documents to the wrong node so Solr needs to forward them over to the leader of the correct shard). Assuming you *do* always post your documents to the correct shard, is there any difference between these two, performance or otherwise? Thanks! - Ian -- Regards, Shalin Shekhar Mangar.
solr.xml coreRootDirectory relative to solr home
Hi, I'm trying to configure a different core discovery root directory in solr.xml with the coreRootDirectory setting as described in https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml I'd like to just set it to a subdirectory of solr home (a cores directory to avoid confusion with configsets and other directories). I tried str name=coreRootDirectorycores/str but that's interpreted relative to the current working directory. Other paths such as sharedLib are interpreted relative to Solr Home and I had expected this here too. I do not set solr home via system property but via JNDI so I don't think I can use a ${solr.home}/cores or something like that? It would be nice solr home were available for property substitution even if set via JNDI. Is there another way to set a path relative to solr home here? Regards, Andreas
RE: recovery process - node with stale data elected leader
Hi all, Any idea on my issue below? Thanks Francois -Original Message- From: Grollier, Francois: IT (PRG) Sent: Tuesday, November 04, 2014 6:19 PM To: solr-user@lucene.apache.org Subject: recovery process - node with stale data elected leader Hi, I'm running solrCloud 4.6.0 and I have a question/issue regarding the recovery process. My cluster is made of 2 shards with 2 replicas each. Nodes A1 and B1 are leaders, A2 and B2 followers. I start indexing docs and kill A2. I keep indexing for a while and then kill A1. At this point, the cluster stops serving queries as one shard is completely unavailable. Then I restart A2 first, then A1. A2 gets elected leader, waits a bit for more replicas to be up and once it sees A1 it starts the recovery process. My understanding of the recovery process was that at this point A2 would notice that A1 has a more up to date state and it would sync with A1. It seems to happen like this but then I get: INFO - 2014-11-04 11:50:43.068; org.apache.solr.cloud.RecoveryStrategy; Attempting to PeerSync from http://a1:8111/solr/executions/ core=executions - recoveringAfterStartup=false INFO - 2014-11-04 11:50:43.069; org.apache.solr.update.PeerSync; PeerSync: core=executions url=http://a2:8211/solr START replicas=[http://a1:8111/solr/executions/] nUpdates=100 INFO - 2014-11-04 11:50:43.076; org.apache.solr.update.PeerSync; PeerSync: core=executions url=http://a2:8211/solr Received 98 versions from a1:8111/solr/executions/ INFO - 2014-11-04 11:50:43.076; org.apache.solr.update.PeerSync; PeerSync: core=executions url=http://a2:8211/solr Our versions are newer. ourLowThreshold=1483859630192852992 otherHigh=1483859633446584320 INFO - 2014-11-04 11:50:43.077; org.apache.solr.update.PeerSync; PeerSync: core=executions url=http://a2:8211/solr DONE. sync succeeded And I end up with a different set of documents in each node (actually A1 has all the documents but A2 misses some). Is my understanding wrong and is it a completely nonsense to start A2 before A1? If my understanding right, what could cause the desync? (I can provide more logs) And is there a way to force A2 to index the missing documents? I have try the FORCERECOVERY command but it generates the same result as shown above. Thanks francois ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___
grouping finds result name=doclist numFound=0
Sorry for the basic question q=*:*fq=-sku:2471834fq=FiltroDispo:1fq=has_image:1rows=100fl=descCat3,IDCat3,ranking2group=truegroup.field=IDCat3group.sort=ranking2+descgroup.ngroups=true returns some groups with no results. I'm using solr 4.8.0, the collection has 3 shards Am I missing some parameters? lst name=grouped lst name=IDCat3 int name=matches297254/int int name=ngroups49/int arr name=groups lst int name=groupValue0/intresult name=doclist numFound=0 start=0//lst ... lstint name=groupValue12043/intresult name=doclist numFound=2 start=0docint name=IDCat312043/intstr name=descCat3SSD/strint name=ranking2498/int/doc/result/lst
Re: EarlyTerminatingCollectorException
https://issues.apache.org/jira/browse/SOLR-6710 2014-11-05 21:56 GMT+01:00 Mikhail Khludnev mkhlud...@griddynamics.com: I'm wondered too, but it seems it warmups queryResultCache https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L522 at least this ERRORs broke nothing see https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L165 anyway, here are two usability issues: - of key:org.apache.solr.search.QueryResultKey@62340b01 lack of readable toString() - I don't think regeneration exceptions are ERRORs, they seem WARNs for me or even lower. also for courtesy, particularly EarlyTerminatingCollectorExcepions can be recognized, and even ignored, providing SolrIndexSearcher.java#L522 Would you mind to raise a ticket? On Wed, Nov 5, 2014 at 6:51 PM, Dirk Högemann dhoeg...@gmail.com wrote: Our production Solr-Slaves-Cores (we have about 40 Cores (each has a moderate size about 10K documents to 90K documents)) produce many exceptions of type: 014-11-05 15:06:06.247 [searcherExecutor-158-thread-1] ERROR org.apache.solr.search.SolrCache: Error during auto-warming of key:org.apache.solr.search.QueryResultKey@62340b01 :org.apache.solr.search.EarlyTerminatingCollectorException Our relevant solrconfig is updateHandler class=solr.DirectUpdateHandler2 autoCommit maxTime18/maxTime!-- in ms -- /autoCommit /updateHandler query maxWarmingSearchers2/maxWarmingSearchers filterCache class=solr.FastLRUCache size=8192 initialSize=8192 autowarmCount=4096/ !-- queryResultCache caches results of searches - ordered lists of document ids (DocList) based on a query, a sort, and the range of documents requested. -- queryResultCache class=solr.FastLRUCache size=8192 initialSize=8192 autowarmCount=4096/ !-- documentCache caches Lucene Document objects (the stored fields for each document). Since Lucene internal document ids are transient, this cache will not be autowarmed. -- documentCache class=solr.FastLRUCache size=8192 initialSize=8192 autowarmCount=4096/ /query What exactly does the exception mean? Thank you! -- Dirk -- -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Schemaless configuration using 4.10.2/API returning 404
Hi, it might be a silly question, but are you sure that a Solr core collection1 exists? Or does it have a different name? At least you would get a 404 if no such core exists. Regards, Andreas nbosecker wrote on 11/05/2014 09:12 PM: Hi all, I'm working on updating legacy Solr to 4.10.2 to use schemaless configuration. As such, I have added this snippet to solrconfig.xml per the docs: schemaFactory class=ManagedIndexSchemaFactory bool name=mutabletrue/bool str name=managedSchemaResourceNamemanaged-schema/str /schemaFactory I see that schema.xml is renamed to schema-xml.bak and managed-schema file is present on Solr restart. My Solr Dashboard is accessible via: https://myserver:9943/solr/#/ However, I still cannot access the schema via API - keep receiving 404 [The requested resource (/solr/schema/fields) is not available] error: https://myserver:9943/solr/collection1/schema/fields What am I missing to access the schema API? Much thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869.html Sent from the Solr - User mailing list archive at Nabble.com.
Delete data from stored documents
Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: grouping finds result name=doclist numFound=0
Hi Giovanni, afaik grouping is not completly working with solr cloud. You maybe could check: https://issues.apache.org/jira/browse/SOLR-5046 In addition, documents that should be grouped, need to be in the same shard (You can use router.field=IDCat3 to place all of your documents with the same IDCat3 in the same shard). Maybe someboy else can give some more insight's i am also interested into the topic. Cheers Timo Von: Giovanni Bricconi [giovanni.bricc...@banzai.it] Gesendet: Donnerstag, 6. November 2014 11:43 An: solr-user Betreff: grouping finds result name=doclist numFound=0 Sorry for the basic question q=*:*fq=-sku:2471834fq=FiltroDispo:1fq=has_image:1rows=100fl=descCat3,IDCat3,ranking2group=truegroup.field=IDCat3group.sort=ranking2+descgroup.ngroups=true returns some groups with no results. I'm using solr 4.8.0, the collection has 3 shards Am I missing some parameters? lst name=grouped lst name=IDCat3 int name=matches297254/int int name=ngroups49/int arr name=groups lst int name=groupValue0/intresult name=doclist numFound=0 start=0//lst ... lstint name=groupValue12043/intresult name=doclist numFound=2 start=0docint name=IDCat312043/intstr name=descCat3SSD/strint name=ranking2498/int/doc/result/lst
How to dynamically create Solr cores with schema
Hi, I have a use-case where Java applications need to create Solr indexes dynamically. Schema fields of these indexes differ and should be defined by the Java application upon creation. So I'm trying to use the Core Admin API [1] to create new cores and the Schema API [2] to define fields. When creating a core, I have to specify solrconfig.xml (with enabled ManagedIndexSchemaFactory) and the schema to start with. I thought it would be a good idea to use a named config sets [3] for this purpose: curl 'http://localhost:8082/solr/admin/cores?action=CREATEname=m1instanceDir=cores/m1configSet=myconfigdataDir=data' But when I add a field to the core m1, the field actually gets added to the config set. Is this a bug of feature? curl http://localhost:8082/solr/m1/schema/fields -X POST -H 'Content-type:application/json' --data-binary '[{ name:foo, type:tdate, stored:true }]' All cores created from the config set myconfig will get the new field foo in their schema. So this obviously does not work to create cores with different schema. I also tried to use the config/schema parameters of the CREATE core command (instead of config sets) to specify some existing solrconfig.xml/schema.xml. I tried relative paths here (e.g. some level upwards) but I could not get it to work. The documentation [1] tells me that relative paths are allowed. Should this work? Next thing that would come to my mind is to use dynamic fields instead of a correct managed schema, but that does not sound as nice. Or maybe I should implement a custom CoreAdminHandler which takes list of field definitions, if that's possible somehow...? I don't know. What's your recommended approach? We're using Solr 4.10.1 non-SolrCloud. Would this be simpler or different with SolrCloud? Thank you, Andreas [1] https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-CREATE [2] https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-Modifytheschema [3] https://cwiki.apache.org/confluence/display/solr/Config+Sets
Re: Delete data from stored documents
nope. On Thu, Nov 6, 2014 at 5:19 PM, yriveiro yago.rive...@gmail.com wrote: Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: What's the most efficient way to sort by number of terms matched?
Hi Trey, Not exactly the same but we did something similar with (e)dismax's mm parameter. By autoRelax'ing it. In your example, try with mm=3 if numFound 20 then try with mm=2 etc. Ahmet On Thursday, November 6, 2014 8:41 AM, Trey Grainger solrt...@gmail.com wrote: Just curious if there are some suggestions here. The use case is fairly simple: Given a query like python OR solr OR hadoop, I want to sort results by number of keywords matched first, and by relevancy separately. I can think of ways to do this, but not efficiently. For example, I could do: q=python OR solr OR hadoop p1=python p2=solr p3=hadoop sort=sum(if(query($p1,0),1,0),if(query($p2,0),1,0),if(query($p3,0),1,0)) desc, score desc Other than the obvious downside that this requires me to pre-parse the user's query, it's also somewhat inefficient to run the query function once for each term in the original query since it is re-executing multiple queries and looping through every document in the index during scoring. Ideally, I would be able to do something like the below that could just pull the count of unique matched terms from the main query (q parameter) execution:: q=python OR solr OR hadoopsort=uniquematchedterms() desc,score desc. I don't think anything like this exists, but would love some suggestions if anyone else has solved this before. Thanks, -Trey
Re: SolrCloud shard distribution with Collections API
I've had a bad enough experience with the default shard placement that I create a collection with one shard, add the shards where I want them, then use add/delete replica to move the first one to the right machine/port. Typically this is in a SolrCloud of dozens or hundreds of shards. Our shards are all partitioned by time so there are big performance advantages to optimal placement across JVMs and machines. What sort of situation do you not have trouble with default shard placement? On Wed, Nov 5, 2014 at 5:10 PM, Erick Erickson erickerick...@gmail.com wrote: They should be pretty well distributed by default, but if you want to take manual control, you can use the createNodeSet param on CREATE (with replication factor of 1) and then ADDREPLICA with the node param to put replicas for shards exactly where you want. Best, Erick On Wed, Nov 5, 2014 at 2:12 PM, CTO직속IsabellePhan ip...@coupang.com wrote: Hello, I am testing a small SolrCloud cluster on 2 servers. I started 2 nodes on each server, so that each collection can have 2 shards with replication factor of 2. I am using below command from Collections API to create collection: curl ' http://serveraddress/solr/admin/collections?action=CREATEname=cp_collectionnumShards=2replicationFactor=2collection.configName=cp_config ' Is there a way to ensure that for each shard, leader and replica are on a different server? This command sometimes put them on 2 nodes from the same server. Thanks a lot for your help, Isabelle
Updating an index
Hello, I have [mistakenly] created a SOLR index in which the document IDs contain URIs such as file:///Z:/1933/01/1933_01.png . In a single SOLR update command, how can I: - copy the contents of each document's id field to a new field called 'url', after replacing 'Z:' by 'Y:' - make SOLR generate a new random Id for each document Many thanks. Philippe
Re: Schemaless configuration using 4.10.2/API returning 404
Thanks for the reply! My Solr has 2 cores(collection1/collection2), I can access them via the Solr dashboard with no problem. https://myserver:9943/solr/#/collection1 https://myserver:9943/solr/#/collection2 I can also verify the solrconfig.xml for them contain the schemaless config: https://myserver:9943/solr/collection1/admin/file?file=solrconfig.xmlcontentType=text/xml;charset=utf-8 I'm perplexed, as the managed_schema file has been created and seems to be active, yet the API continue to give 404. Is this the correct format to access? https://myserver:9943/solr/collection1/schema/fields (I've also tried other variations, removing the collection name etc...always 404). -- View this message in context: http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869p4168028.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: create new core based on named config set using the admin page
Yeah, please create a JIRA. There are a couple of umbrella JIRAs that you might want to link it to I'm not sure it quite fits in either, if not just let it hang out there bear: https://issues.apache.org/jira/browse/SOLR-6703 https://issues.apache.org/jira/browse/SOLR-6084 On Wed, Nov 5, 2014 at 11:57 PM, Andreas Hubold andreas.hub...@coremedia.com wrote: Hi, Solr 4.8 introduced named config sets with https://issues.apache.org/jira/browse/SOLR-4478. You can create a new core based on a config set with the CoreAdmin API as described in https://cwiki.apache.org/confluence/display/solr/Config+Sets The Solr Admin page allows the creation of new cores as well. There's a Add Core button in the Core Admin tab. This will open a dialog where you can enter name, instanceDir, dataDir and the names of solrconfig.xml / schema.xml. It would be cool and consistent if one could create a core based on a named config set here as well. I'm asking because I might have overlooked something or maybe somebody is already working on this. But probably I should just create a JIRA issue, right? Regards, Andreas Ramzi Alqrainy wrote on 11/05/2014 08:24 PM: Sorry, I did not get your point, can you please elaborate more -- View this message in context: http://lucene.472066.n3.nabble.com/create-new-core-based-on-named-config-set-using-the-admin-page-tp4167850p4167860.html Sent from the Solr - User mailing list archive at Nabble.com. -- Andreas Hubold Software Architect tel +49.40.325587.519 fax +49.40.325587.999 andreas.hub...@coremedia.com CoreMedia AG content | context | conversion Ludwig-Erhard-Str. 18 20459 Hamburg, Germany www.coremedia.com Executive Board: Gerrit Kolb (CEO), Dr. Klemens Kleiminger (CFO) Supervisory Board: Prof. Dr. Florian Matthes (Chairman) Trade Register: Amtsgericht Hamburg, HR B 76277
Re: solr.xml coreRootDirectory relative to solr home
An oversight I think. If you create a patch, let me know and we can get it committed. Hmmm, not sure though, this'll change the current behavior that people might be counting on On Thu, Nov 6, 2014 at 1:02 AM, Andreas Hubold andreas.hub...@coremedia.com wrote: Hi, I'm trying to configure a different core discovery root directory in solr.xml with the coreRootDirectory setting as described in https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml I'd like to just set it to a subdirectory of solr home (a cores directory to avoid confusion with configsets and other directories). I tried str name=coreRootDirectorycores/str but that's interpreted relative to the current working directory. Other paths such as sharedLib are interpreted relative to Solr Home and I had expected this here too. I do not set solr home via system property but via JNDI so I don't think I can use a ${solr.home}/cores or something like that? It would be nice solr home were available for property substitution even if set via JNDI. Is there another way to set a path relative to solr home here? Regards, Andreas
Re: Schemaless configuration using 4.10.2/API returning 404
Ok, I just booted fresh solr 4.10.2, started example-schemaless and hit http://localhost:8983/solr/collection1/schema/fields - and it worked. So, I suspect the problem is not with Solr but with your setup around it. For example, is your Solr listening on port 9943 directly (and not 8983) or do you have a proxy in between. Maybe the proxy is not configured to forward that URL. Do you have logs? Can you see if that URL is actually being called on Solr's side? If you see other urls (like generic admin stuff), but not this one, then it may not be making it there. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 6 November 2014 13:27, nbosecker nbosec...@gmail.com wrote: Thanks for the reply! My Solr has 2 cores(collection1/collection2), I can access them via the Solr dashboard with no problem. https://myserver:9943/solr/#/collection1 https://myserver:9943/solr/#/collection2 I can also verify the solrconfig.xml for them contain the schemaless config: https://myserver:9943/solr/collection1/admin/file?file=solrconfig.xmlcontentType=text/xml;charset=utf-8 I'm perplexed, as the managed_schema file has been created and seems to be active, yet the API continue to give 404. Is this the correct format to access? https://myserver:9943/solr/collection1/schema/fields (I've also tried other variations, removing the collection name etc...always 404). -- View this message in context: http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869p4168028.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Updating an index
No way that I know of, re-indexing is in order. Solr does not update in place, you have to re-add the document. Well, AtomicUpdates work but iff all fields are stored. And it still wouldn't be a single Solr command. Best, Erick On Thu, Nov 6, 2014 at 8:20 AM, phi...@free.fr wrote: Hello, I have [mistakenly] created a SOLR index in which the document IDs contain URIs such as file:///Z:/1933/01/1933_01.png . In a single SOLR update command, how can I: - copy the contents of each document's id field to a new field called 'url', after replacing 'Z:' by 'Y:' - make SOLR generate a new random Id for each document Many thanks. Philippe
Re: What's the most efficient way to sort by number of terms matched?
Hi Trey, In an application I built few years ago, I had a component that rewrote the input query into a Lucene BooleanQuery and we would set the minimumNumberShouldMatch value for the query. Worked well, but lately we are trying to move away from writing our own custom components since maintaining them across releases becomes a bit of a chore. So lately we simulate this behavior in the client by constructing progressively smaller n-grams and OR'ing them then sending to Solr. For your example, it becomes something like this: (python AND solr AND hadoop) OR (python AND solr) OR (solr AND hadoop) OR (python AND hadoop) OR (python) OR (solr) OR (hadoop). -sujit On Thu, Nov 6, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Trey, Not exactly the same but we did something similar with (e)dismax's mm parameter. By autoRelax'ing it. In your example, try with mm=3 if numFound 20 then try with mm=2 etc. Ahmet On Thursday, November 6, 2014 8:41 AM, Trey Grainger solrt...@gmail.com wrote: Just curious if there are some suggestions here. The use case is fairly simple: Given a query like python OR solr OR hadoop, I want to sort results by number of keywords matched first, and by relevancy separately. I can think of ways to do this, but not efficiently. For example, I could do: q=python OR solr OR hadoop p1=python p2=solr p3=hadoop sort=sum(if(query($p1,0),1,0),if(query($p2,0),1,0),if(query($p3,0),1,0)) desc, score desc Other than the obvious downside that this requires me to pre-parse the user's query, it's also somewhat inefficient to run the query function once for each term in the original query since it is re-executing multiple queries and looping through every document in the index during scoring. Ideally, I would be able to do something like the below that could just pull the count of unique matched terms from the main query (q parameter) execution:: q=python OR solr OR hadoopsort=uniquematchedterms() desc,score desc. I don't think anything like this exists, but would love some suggestions if anyone else has solved this before. Thanks, -Trey
Re: solr.xml coreRootDirectory relative to solr home
On 11/6/2014 12:02 PM, Erick Erickson wrote: An oversight I think. If you create a patch, let me know and we can get it committed. Hmmm, not sure though, this'll change the current behavior that people might be counting on Relative to the solr home sounds like the best option to me. It's what I would expect, since most of the rest of Solr uses directories relative to other directories that may or may not be explicitly defined. I haven't researched in-depth, but I think that the solr home itself is the only thing in Solr that defaults to something relative to the current working directory ... and that seems like a very good policy to keep. Thanks, Shawn
Re: What's the most efficient way to sort by number of terms matched?
Sadly, it seems it wasn't been done so far. It's either custom similarity or function query. On Thu, Nov 6, 2014 at 9:40 AM, Trey Grainger solrt...@gmail.com wrote: Just curious if there are some suggestions here. The use case is fairly simple: Given a query like python OR solr OR hadoop, I want to sort results by number of keywords matched first, and by relevancy separately. I can think of ways to do this, but not efficiently. For example, I could do: q=python OR solr OR hadoop p1=python p2=solr p3=hadoop sort=sum(if(query($p1,0),1,0),if(query($p2,0),1,0),if(query($p3,0),1,0)) desc, score desc Other than the obvious downside that this requires me to pre-parse the user's query, it's also somewhat inefficient to run the query function once for each term in the original query since it is re-executing multiple queries and looping through every document in the index during scoring. Ideally, I would be able to do something like the below that could just pull the count of unique matched terms from the main query (q parameter) execution:: q=python OR solr OR hadoopsort=uniquematchedterms() desc,score desc. I don't think anything like this exists, but would love some suggestions if anyone else has solved this before. Thanks, -Trey -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Schemaless configuration using 4.10.2/API returning 404
I have some level of logging in Tomcat, and I can see that SolrDispatchFilter is being invoked: 2014-11-06 17:23:19,016 [catalina-exec-3] DEBUG SolrDispatchFilter - Closing out SolrRequest: {} But that really isn't terribly helpful. Is there more logging that I could invoke to get more info from the Solr side? Some other logs from admin-type requests look like this: 2014-11-06 17:23:16,547 [catalina-exec-7] INFO SolrDispatchFilter - [admin] webapp=null path=/admin/info/logging params={set=com.scitegic.web.catalog:ALLwt=json} status=0 QTime=4 2014-11-06 17:23:16,551 [catalina-exec-7] DEBUG SolrDispatchFilter - Closing out SolrRequest: {set=com.scitegic.web.catalog:ALLwt=json} I don't have a proxy in between. -- View this message in context: http://lucene.472066.n3.nabble.com/Schemaless-configuration-using-4-10-2-API-returning-404-tp4167869p4168091.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud shard distribution with Collections API
When using Collections API CREATE action, I found that sometimes default shard placement is correct (leader and replica on different servers) and sometimes not. So I was looking for a simple and reliable way to ensure better placement. It seems like I will have to do it manually for best control, as recommended by Erick and you. Thanks, Isabelle PS: I deleted emails from thread history, because my reply keeps being rejected by apache server as spam... On Thu, Nov 6, 2014 at 8:13 AM, ralph tice ralph.t...@gmail.com wrote: I've had a bad enough experience with the default shard placement that I create a collection with one shard, add the shards where I want them, then use add/delete replica to move the first one to the right machine/port. Typically this is in a SolrCloud of dozens or hundreds of shards. Our shards are all partitioned by time so there are big performance advantages to optimal placement across JVMs and machines. What sort of situation do you not have trouble with default shard placement? On Wed, Nov 5, 2014 at 5:10 PM, Erick Erickson erickerick...@gmail.com wrote: They should be pretty well distributed by default, but if you want to take manual control, you can use the createNodeSet param on CREATE (with replication factor of 1) and then ADDREPLICA with the node param to put replicas for shards exactly where you want. Best, Erick
Re: Best practice to setup schemas for documents having different structures
Thanks for the response guys! Appreciate it. *Vishal Sharma** Team Lead, Grazitti Interactive*T: +1 650 641 1754 E: vish...@grazitti.com www.grazitti.com [image: Description: LinkedIn] http://www.linkedin.com/company/grazitti-interactive[image: Description: Twitter] https://twitter.com/grazitti[image: fbook] https://www.facebook.com/grazitti.interactive On Wed, Nov 5, 2014 at 11:09 PM, Ryan Cooke r...@docurated.com wrote: We define all fields as wildcard fields with a suffix indicating field type. Then we can use something like Java annotations to map pojo variables to field types to append the correct suffix. This allows us to use one very generic schema among all of our collections and we rarely need to update it. Our inspiration for this method comes from the ruby library Sunspot. - Ryan --- Ryan Cooke VP of Engineering Docurated (646) 535-4595 On Wed, Nov 5, 2014 at 9:59 AM, Erick Erickson erickerick...@gmail.com wrote: It Depends (tm). You have a lot of options, and it all depends on your data and use-case. In general, there is very little cost involved when a doc does _not_ use a field you've defined in a schema. That is, if you have 100's of fields defined and only use 10, the other 90 don't take up space in each doc. There is some overhead with many many fields, but probably not so you'd notice. 1 you could have a single schema that contains all your fields and use it amongst a bunch of indexes (cores). This is particularly easy in the new configset pattern. 2 You could have a single schema that contains all your fields and use it in a single index. That index could contain all your different docs with, say, a type field to let you search subsets easily. 3 You could have a different schema for each index and put all of the docs in the same index. 1 I don't really like at all. If you're going to have different indexes, I think it's far easier to maintain if there are individual schemas. Between, 2 and 3 it's a tossup. 2 will skew the relevance calculations because all the terms are in a single index. So your relevance calculations for students will be influenced by the terms in courses docs and vice-versa. That said, you may not notice as it's subtle. I generally prefer 3 but I've seen 2 serve as well. Best, Erick On Tue, Nov 4, 2014 at 9:34 PM, Vishal Sharma vish...@grazitti.com wrote: This is something I have been thinking for a long time now. What is the best practice for setting up the Schemas for documents having different fields? Should we just create one schema with lot of fields or multiple schemas for different data structures? Here is an example: I have two objects students and courses: Student: - Student Name - Student Registration number - Course Enrolled for Course: - Course ID - Course Name - Course duration What should the ideal schema setup should look like? Any guidance would is strongly appreciated. *Vishal Sharma** Team Lead, Grazitti Interactive*T: +1 650 641 1754 E: vish...@grazitti.com www.grazitti.com [image: Description: LinkedIn] http://www.linkedin.com/company/grazitti-interactive[image: Description: Twitter] https://twitter.com/grazitti[image: fbook] https://www.facebook.com/grazitti.interactive