Sorting on multivalues field in Solr
Is there any way we can sort multivalued field in Solr. I have two documents with field custom_code and values are as below, Doc 1 : 11, 78, 45, 22 Doc 2 : 56, 74, 62, 10 When I sort it in ascending order the order should be , Doc 2 : 56, 74, 62, 10 Doc 1 : 11, 78, 45, 22 Here Doc 2 will come first because it has smallest element 10 (which is greater that 11 of doc 1). How can we achieve this in Solr. What is the easiest way? -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-on-multivalues-field-in-Solr-tp4204996.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR 4.10.4 - error creating document
Hi Erik, thanks for your concerns and thoughts. There is no XY problem because we decouple input (storing) from, searching, faceting, ... What you see is just the input for storing and output of the original text in the results. There is no need to do any analysis on this. So don't worry, it works like a charm for years now ;-) With the upgrade from 4.6.1 to 4.10.4 it only turned out we never recognized that we were missing 3 or 4 documents within over 70 million because they were silently dropped which has been changed by LUCENE-5472. Regards Bernd Am 12.05.2015 um 00:29 schrieb Erick Erickson: I've got to ask _how_ are you intending to search this field? On the surface, this feels like an XY problem. It's a string type. Therefore, if this is the input: 102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 101, 101, 32, 66, 114 you'll only ever get a match if you search exactly: 102, 111, 114, 32, 97, 32, 114, 101, 118, 105, 101, 119, 32, 115, 101, 101, 32, 66, 114 None of these will match 102 102, 32 32, 119, 32, 115 etc. The idea of doing a match on a single _token_ that's over 32K long is pretty far out there, thus the check. The entire multiValued discussion is _probably_ a red herring and won't help you. multiValued has nothing to do with multiple terms, that's all up to your field type. So back up and tell us _how_ you intend to search this field. I'm guessing you really want to make it a text-based type instead. But that's just a guess. Best, Erick. On Mon, May 11, 2015 at 8:43 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: It turned out that I didn't recognized that dcdescription is not indexed, only stored. So the next in chain ist f_dcperson where dccreator and dcdescription is combined and indexed. And this is why the error shows up on f_dcperson. (delay of error) Thanks for your help, regards. Bernd Am 11.05.2015 um 15:35 schrieb Shawn Heisey: On 5/11/2015 7:19 AM, Bernd Fehling wrote: After reading https://issues.apache.org/jira/browse/LUCENE-5472 one question still remains. Why is it complaining about f_dcperson which is a copyField when the origin problem field is dcdescription which definately is much larger than 32766? I would assume it complains about dcdescription field. Or not? If the value resulting in the error does come from a copyField source that also uses a string type, then my guess here is that Solr has some prioritization that causes the copyField destination to be indexed before the sources. This ordering might make things go a little faster, because if it happens right after copying, all or most of the data for the destination field would already be sitting in one or more of the CPU caches. Cache hits are wonderful things for performance. Thanks, Shawn -- * Bernd FehlingBielefeld University Library Dipl.-Inform. (FH)LibTec - Library Technology Universitätsstr. 25 and Knowledge Management 33615 Bielefeld Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: Sorting on multivalues field in Solr
The easiest way is to have a separate field for sorting. Make it DocValue as well for faster sorting performance. Then, you have an Update Request Processor (URP) chain and in it you clone the field and choose the most appropriate value (smallest). There are URPs for that, e.g. http://www.solr-start.com/info/update-request-processors/#MinFieldValueUpdateProcessorFactory Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 May 2015 at 16:22, nutchsolruser nutchsolru...@gmail.com wrote: Is there any way we can sort multivalued field in Solr. I have two documents with field custom_code and values are as below, Doc 1 : 11, 78, 45, 22 Doc 2 : 56, 74, 62, 10 When I sort it in ascending order the order should be , Doc 2 : 56, 74, 62, 10 Doc 1 : 11, 78, 45, 22 Here Doc 2 will come first because it has smallest element 10 (which is greater that 11 of doc 1). How can we achieve this in Solr. What is the easiest way? -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-on-multivalues-field-in-Solr-tp4204996.html Sent from the Solr - User mailing list archive at Nabble.com.
Transactional Behavior
Hello, I have a business case in which I need to be able for the rollback. When I tried add/commit I was not able to prevent other threads that write to a given Solr core from committing everything. I also tried indexwriter but Solr did not get changes until we restart it. -- Regards, Amr Ali City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt Ext: 278
Re: SolrCloud indexing
Thanks for the reply. Actually in our case we want the timestamp to be populated locally on each node in the SolrCloud cluster. We want to see if there is any delay in the document being distributed within the cluster. Just want to confirm that the timestamp can be use for that purpose. Bill On Sat, May 9, 2015 at 11:37 PM, Shawn Heisey apa...@elyograg.org wrote: On 5/9/2015 8:41 PM, Bill Au wrote: Is the behavior of document being indexed independently on each node in a SolrCloud cluster new in 5.x or is that true in 4.x also? If the document is indexed independently on each node, then if I query the document from each node directly, a timestamp could hold different values since the document is indexed independently, right? field name=timestamp type=date indexed=true stored=true default=NOW / SolrCloud has had that behavior from day one, when it was released in version 4.0. You are correct that it can result in a different timestamp on each replica if the default comes from schema.xml. I am pretty sure that the solution for this problem is to set up an update processor chain that includes TimestampUpdateProcessorFactory to populate the timestamp field before the document is distributed to each replica. https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors Thanks, Shawn
Re: How is the most relevant document of each group chosen when group.truncate is used?
Forgot to mention that I'm using solr 5.0
JARs needed to run SolrJ
Hi Everyone, I am trying to use SolrJ to add docs to Solr. The following line: HttpSolrClient solrServer = new HttpSolrClient( http://localhost:8983/solr;); Is failing with exception: Exception in thread main java.lang.NoClassDefFoundError: org.apache.commons.logging.LogFactory at org.apache.http.impl.client.CloseableHttpClient.init(CloseableHttpClient.java:60) at org.apache.http.impl.client.AbstractHttpClient.init(AbstractHttpClient.java:271) at org.apache.http.impl.client.DefaultHttpClient.init(DefaultHttpClient.java:127) . . . . . . . . . . . Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:665) at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:942) at java.lang.ClassLoader.loadClass(ClassLoader.java:851) . . . . . . . . . . . I pulled in everything from \solr-5.1.0\dist\solrj-lib and I included solr-solrj-5.1.0.jar from \solr-5.1.0\dist. Why I'm getting the above error? Is there an external JAR I need? I want to pull in required JARs only. Google'ing the issue suggest I need to include org-apache-commons-logging.jar (and few other JARs) but this JAR is not part of Solr's distribution so Im not willing to do so blindly. Thanks Steve
Re: SolrJ vs. plain old HTTP post
Thanks Shalin and all for helping with this question. It is much appreciated. Steve On Tue, May 12, 2015 at 1:24 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, May 11, 2015 at 8:20 PM, Steven White swhite4...@gmail.com wrote: Thanks Erik and Emir. snip/ To close the loop on this question, I will need to enable Jetty's SSL (the jetty that comes with Solr 5.1). If I do so, will SolrJ still work, can I assume that SolrJ supports SSL? Yes, SolrJ can work with SSL enabled on the server as long as you pass the same JVM parameters on the client side to enable SSL e.g. -Djavax.net.ssl.keyStore= -Djavax.net.ssl.keyStorePassword= -Djavax.net.ssl.trustStore= -Djavax.net.ssl.trustStorePassword= See https://cwiki.apache.org/confluence/display/solr/Enabling+SSL#EnablingSSL-IndexadocumentusingCloudSolrClient I Google'ed but cannot find the answer. Thanks again. Steve On Mon, May 11, 2015 at 8:39 AM, Erik Hatcher erik.hatc...@gmail.com wrote: Another advantage to SolrJ is with SolrCloud (ZK) awareness, and taking advantage of some routing optimizations client-side so the cluster has less hops to make. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On May 11, 2015, at 8:21 AM, Steven White swhite4...@gmail.com wrote: Hi Everyone, If all that I need to do is send data to Solr to add / delete a Solr document, which tool is better for the job: SolrJ or plain old HTTP post? In other word, what are the advantages of using SolrJ when the need is to push data to Solr for indexing? Thanks, Steve -- Regards, Shalin Shekhar Mangar.
RE: Retrieving list of synonyms and facet field values
Thanks Alessandro, managed resources was exactly what I needed. -Original Message- From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] Sent: Tuesday, May 12, 2015 10:12 AM To: solr-user@lucene.apache.org Subject: Re: Retrieving list of synonyms and facet field values Hi Siamak, 1) You can do that with the managed resources : Take a look to the synonym section. https://cwiki.apache.org/confluence/display/solr/Managed+Resources Specifically : To determine the synonyms for a specific term, you send a GET request for the child resource, such as /schema/analysis/synonyms/english/mad would return [angry,upset]. Lastly, you can delete a mapping by sending a DELETE request to the managed endpoint. 2) you can use the Term Component ( https://cwiki.apache.org/confluence/display/solr/The+Terms+Component) It's quite straightforward to use . If you are talking about the facets, when you send a query to Solr , with the facets enabled, you simply need to parse the resulting Json ( or xml). In the case you are doing it programmatically SolrJ gives great support for the facets. Cheers 2015-05-12 14:43 GMT+01:00 Siamak Rowshan siamak.rows...@softmart.com: Hi all, I'm new to Solr and would appreciate any help with this question. Is there a way, to retrieve the list of synonyms via the API? I also need to retrieve the values of each facet field via API. For example the list of Cat facet includes: fiction, non-fiction, etc. Thanks, Siamak -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
RE: Trying to get AnalyzingInfixSuggester to work in Solr?
Fwiw, we ended up preferring the 4.x spellcheck approach. For starters, it is supported by SolrJ ... :-) But more importantly, we wanted a mix of both terms and field values in our suggestions. We found the Suggester component doesn't do that. We also weren't interested in matching in the middle of words. Partial prefix matching was better and, thus, we used an ngram query. In addition, we liked the Amazon style xyz in Dept X, xyz in Dept Y suggestions, which we used facets in combination with the ngram query to produce. Finally, we needed to make a minor patch to get document frequency information about terms (and collations) provided by the SpellCheckComponent. https://issues.apache.org/jira/browse/SOLR-7144 So, to summarize, we ended up with a 2-pass suggestion approach: pass 1: spellcheck with document frequency and collation using WFSTLookupFactory and org.apache.solr.spelling.suggest.Suggester. pass 2: if spellcheck has corrections?, use 1st correction instead of original term as query for against an ngram field (using copyTo to populate from fields we care about). This query also has a field facet. The facet values are used as ${queryTerm} in ${facet} suggestions. Specified fields from matching docs are used as suggestions (like the suggester component). Please don't take this to mean you should be doing anything like what we are doing. But, rather, I'm urging you to dig deeper into your suggestion functionality and think hard about what really makes sense for your application. It's a major usability issue for search apps. -Original Message- From: O. Olson [mailto:olson_...@yahoo.it] Sent: Thursday, May 07, 2015 4:19 PM To: solr-user@lucene.apache.org Subject: Re: Trying to get AnalyzingInfixSuggester to work in Solr? Thank you Erick. I'm sorry I did not mention this earlier, but I am still on Solr 4.10.3. Once I upgrade to Solr 5.0+ , I would consider your suggestion in your blog post. O. O. Erick Erickson wrote Uh, you mean because I forgot to pate in the URL? Siih... Anyway, the URL is irrelevant now that you've solved your problem, but in case you're interested: http://lucidworks.com/blog/solr-suggester/ Sorry for the confusion. Erick -- View this message in context: http://lucene.472066.n3.nabble.com/Trying-to-get-AnalyzingInfixSuggester-to-work-in-Solr-tp4204163p4204392.html Sent from the Solr - User mailing list archive at Nabble.com. * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF *
Re: JARs needed to run SolrJ
Hi Steve, You can find list of dependencies in its pom: http://central.maven.org/maven2/org/apache/solr/solr-solrj/5.1.0/solr-solrj-5.1.0.pom It would be best if you use some dependency management tool. You can use it in separate project to create all-in-one jar and than include that one in your project, but there is always chance it will collide with other project jars. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com On 12.05.2015 20:33, Steven White wrote: Hi Everyone, I am trying to use SolrJ to add docs to Solr. The following line: HttpSolrClient solrServer = new HttpSolrClient( http://localhost:8983/solr;); Is failing with exception: Exception in thread main java.lang.NoClassDefFoundError: org.apache.commons.logging.LogFactory at org.apache.http.impl.client.CloseableHttpClient.init(CloseableHttpClient.java:60) at org.apache.http.impl.client.AbstractHttpClient.init(AbstractHttpClient.java:271) at org.apache.http.impl.client.DefaultHttpClient.init(DefaultHttpClient.java:127) . . . . . . . . . . . Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:665) at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:942) at java.lang.ClassLoader.loadClass(ClassLoader.java:851) . . . . . . . . . . . I pulled in everything from \solr-5.1.0\dist\solrj-lib and I included solr-solrj-5.1.0.jar from \solr-5.1.0\dist. Why I'm getting the above error? Is there an external JAR I need? I want to pull in required JARs only. Google'ing the issue suggest I need to include org-apache-commons-logging.jar (and few other JARs) but this JAR is not part of Solr's distribution so Im not willing to do so blindly. Thanks Steve -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: JARs needed to run SolrJ
On 5/12/2015 12:33 PM, Steven White wrote: Hi Everyone, I am trying to use SolrJ to add docs to Solr. The following line: HttpSolrClient solrServer = new HttpSolrClient( http://localhost:8983/solr;); Is failing with exception: Exception in thread main java.lang.NoClassDefFoundError: org.apache.commons.logging.LogFactory at org.apache.http.impl.client.CloseableHttpClient.init(CloseableHttpClient.java:60) at org.apache.http.impl.client.AbstractHttpClient.init(AbstractHttpClient.java:271) at org.apache.http.impl.client.DefaultHttpClient.init(DefaultHttpClient.java:127) . . . . . . . . . . . Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:665) at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:942) at java.lang.ClassLoader.loadClass(ClassLoader.java:851) . . . . . . . . . . . I pulled in everything from \solr-5.1.0\dist\solrj-lib and I included solr-solrj-5.1.0.jar from \solr-5.1.0\dist. You need to make a decision about how to do your logging. This decision is intentionally NOT made for you, so you can do whatever you wish. SolrJ uses the slf4j logging API, but slf4j doesn't actually do any logging itself, you must include jars to decide which logging framework will actually do the logging. In the server/lib/ext directory of the Solr binary download, you will find a set of jars. These jars set up the logging intercepts that Solr (and SolrJ) will need for third-party libraries, and configure it to bind the actual logging to log4j. While SolrJ itself uses slf4j directly, some of the third-party libraries use other logging frameworks, which must be intercepted by slf4j for a consistent logging experience. For your error message above, it is the HttpClient library that is trying to load the Apache Commons Logging class. The jcl-over-slf4j jar provides an implementation of the commons logging classes, and directs those logs through slf4j. You will also need the server/resources/log4j.properties file somewhere on your classpath, or specified in a system property on your program startup, in order to configure log4j. You'll probably want to customize that properties file. If you want to use a different logging framework other than log4j for the actual logging, you will need to research how to set up your slf4j jars to accomplish your goal. Some limited information can be found here: http://wiki.apache.org/solr/SolrLogging More comprehensive information, not specific to Solr, can be found here: http://slf4j.org/ Thanks, Shawn
Re: Transactional Behavior
Solr does have a rollback/ command, but it is an expert feature and not so clear how it works in SolrCloud. See: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers and https://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 -- Jack Krupansky On Tue, May 12, 2015 at 12:58 PM, Amr Ali amr_...@siliconexpert.com wrote: Hello, I have a business case in which I need to be able for the rollback. When I tried add/commit I was not able to prevent other threads that write to a given Solr core from committing everything. I also tried indexwriter but Solr did not get changes until we restart it. -- Regards, Amr Ali City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt Ext: 278
Re: Transactional Behavior
Hi Amr, One option is to include transaction id in your documents and do delete in case of failed transaction. It is not cheap option - additional field if you don't have something to use to identify transaction. Assuming rollback will not happen to often deleting is not that big issue. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On 12.05.2015 22:37, Amr Ali wrote: Please check this https://lucene.apache.org/solr/4_1_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback() Note that this is not a true rollback as in databases. Content you have previously added may have been committed due to autoCommit, buffer full, other client performing a commit etc. It is not a real rollback if you have two threads T1 and T2 that are adding. If T1 is adding 500 and T2 is adding 3 then T2 will commit its 3 document PLUS the documents added by T1 (because T2 will finish add/commit before T2 due to the documents number). Solr transactions are server side only. -- Regards, Amr Ali City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt Ext: 278 -Original Message- From: Jack Krupansky [mailto:jack.krupan...@gmail.com] Sent: Tuesday, May 12, 2015 10:24 PM To: solr-user@lucene.apache.org Subject: Re: Transactional Behavior Solr does have a rollback/ command, but it is an expert feature and not so clear how it works in SolrCloud. See: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers and https://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 -- Jack Krupansky On Tue, May 12, 2015 at 12:58 PM, Amr Ali amr_...@siliconexpert.com wrote: Hello, I have a business case in which I need to be able for the rollback. When I tried add/commit I was not able to prevent other threads that write to a given Solr core from committing everything. I also tried indexwriter but Solr did not get changes until we restart it. -- Regards, Amr Ali City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt Ext: 278 -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: Why are these two queries different?
Thanks for your help. I figured it out. Just as you said. Appreciate your help. Somehow forgot to reply your post. On Wed, Apr 29, 2015 at 9:24 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : We did two SOLR qeries and they supposed to return the same results but : did not: the short answer is: if you want those queries to return the same results, then you need to adjust your query time analyzer forthe all_text field to not split intra numberic tokens on , i don't know *why* exactly it's doing that, because you didn't give us the full details of your field/fieldtypes (or other really important info: the full request params -- echoParams=all -- and the documents matched by your second query, etc... https://wiki.apache.org/solr/UsingMailingLists ) ... but that's the reason the queries are different as evident from the parsedquery output. : Query 1: all_text:(US 4,568,649 A) : : parsedquery: (+((all_text:us ((all_text:4 all_text:568 all_text:649 : all_text:4568649)~4))~2))/no_coord, : : Result: numFound: 0, : : Query 2: all_text:(US 4568649) : : parsedquery: (+((all_text:us all_text:4568649)~2))/no_coord, : : : Result: numFound: 2, : : : We assumed the two return the same result. Our default operator is AND. -Hoss http://www.lucidworks.com/
RE: Transactional Behavior
Please check this https://lucene.apache.org/solr/4_1_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback() Note that this is not a true rollback as in databases. Content you have previously added may have been committed due to autoCommit, buffer full, other client performing a commit etc. It is not a real rollback if you have two threads T1 and T2 that are adding. If T1 is adding 500 and T2 is adding 3 then T2 will commit its 3 document PLUS the documents added by T1 (because T2 will finish add/commit before T2 due to the documents number). Solr transactions are server side only. -- Regards, Amr Ali City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt Ext: 278 -Original Message- From: Jack Krupansky [mailto:jack.krupan...@gmail.com] Sent: Tuesday, May 12, 2015 10:24 PM To: solr-user@lucene.apache.org Subject: Re: Transactional Behavior Solr does have a rollback/ command, but it is an expert feature and not so clear how it works in SolrCloud. See: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers and https://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 -- Jack Krupansky On Tue, May 12, 2015 at 12:58 PM, Amr Ali amr_...@siliconexpert.com wrote: Hello, I have a business case in which I need to be able for the rollback. When I tried add/commit I was not able to prevent other threads that write to a given Solr core from committing everything. I also tried indexwriter but Solr did not get changes until we restart it. -- Regards, Amr Ali City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt Ext: 278
How is the most relevant document of each group chosen when group.truncate is used?
Hi all, When I use group.truncate and filtering I'm getting strange faceting results. If I use just grouping without filtering: group=truegroup.field=parent_skugroup.ngroups=truegroup.truncate=truefacet=truefacet.field=color, then I get: facet_fields: { color: [ white, 19742, 19742 white items. However if I filter by white items: group=truegroup.field=parent_skugroup.ngroups=truegroup.truncate=truefacet=truefacet.field=colorfq=color:white, I'm getting 20543 items. The same happens when I use collapse query parser instead of grouping. I would expect those two numbers to be equal. So I assume the most relevant document of each group is chosen somehow differently when filtering is used. How can this be explained? Best regards, Andrii
Re: schema modification issue
Hi Steve, Thanks for paying attention to this. Here is the JIRA issue I reported: https://issues.apache.org/jira/browse/SOLR-7536. Sorry for any inconvenience caused by my unfamiliarness with JIRA. 2015-05-12 0:22 GMT+08:00 Steve Rowe sar...@gmail.com: Hi, Thanks for reporting, I’m working a test to reproduce. Can you please create a Solr JIRA issue for this?: https://issues.apache.org/jira/browse/SOLR/ Thanks, Steve On May 7, 2015, at 5:40 AM, User Zolr zolr.u...@gmail.com wrote: Hi there, I have come accross a problem that when using managed schema in SolrCloud, adding fields into schema would SOMETIMES end up prompting Can't find resource 'schema.xml' in classpath or '/configs/collectionName', cwd=/export/solr/solr-5.1.0/server, there is of course no schema.xml in configs, but 'schema.xml.bak' and 'managed-schema' i use solrj to create a collection: Path tempPath = getConfigPath(); client.uploadConfig(tempPath, name); //customized configs with solrconfig.xml using ManagedIndexSchemaFactory if(numShards==0){ numShards = getNumNodes(client); } Create request = new CollectionAdminRequest.Create(); request.setCollectionName(name); request.setNumShards(numShards); replicationFactor = (replicationFactor==0?DEFAULT_REPLICA_FACTOR:replicationFactor); request.setReplicationFactor(replicationFactor); request.setMaxShardsPerNode(maxShardsPerNode==0?replicationFactor:maxShardsPerNode); CollectionAdminResponse response = request.process(client); and adding fields to schema, either by curl or by httpclient, would sometimes yield the following error, but the error can be fixed by RELOADING the newly created collection once or several times: INFO - [{ responseHeader:{status:500,QTime:5}, errors:[Error reading input String Can't find resource 'schema.xml' in classpath or '/configs/collectionName', cwd=/export/solr/solr-5.1.0/server], error:{msg:Can't find resource 'schema.xml' in classpath or '/configs/collectionName', cwd=/export/solr/solr-5.1.0/server,trace:java.io.IOException: Can't find resource 'schema.xml' in classpath or '/configs/collectionName', cwd=/export/solr/solr-5.1.0/server at org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:98) at org.apache.solr.schema.SchemaManager.getFreshManagedSchema(SchemaManager.java:421) at org.apache.solr.schema.SchemaManager.doOperations(SchemaManager.java:104) at org.apache.solr.schema.SchemaManager.performOperations(SchemaManager.java:94) at org.apache.solr.handler.SchemaHandler.handleRequestBody(SchemaHandler.java:57) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
Re: Solr Multilingual Indexing with one field- Guidance
On 5/7/15, 11:23 AM, Kuntal Ganguly wrote: 1) Is this a correct approach to do it? Or i'm missing something? Does the user wants to see the documents that he/she doesn't understand? The words such as doctor, taxi, etc. are common among many languages in Europe. Would the Spanish user wants to see English documents? Of course this issue can be worked-around by having a separate language field. How do you handle word collision among languages ? kind in German means child in English. If a German user search for articles about children, they will find lots of unrelated English articles about someone being kind. This one too can be worked-around by having a language field. By default, Solr/Lucene hits are sort by the relevancy scores and the score calculation uses IDF. If a search term appears in many documents, the score is low. Because virtually all German documents have die, the particle, the score of the English word die will be low also. 2) Can you give me an example where there will be problem with this above new field type? A use-case/scenario with example will be very helpful. If you have lots of Japanese documents indexed, try searching 京都 (Kyoto). You will find many documents about Tokyo (東京) because the government of the metropolitan Tokyo area is spelled as 東京都 = Tokyo Capital, which generates two bigrams, 東京 and 京都. Kuro
scoreMode ToParentBlockJoinQuery
Hi Is it possible to configure the scoreMode of the Parent block join query parser (ToParentBlockJoinQuery)? It seems it's set to none, while i would require max in this case. What I want is to filter on child documents, but still use the relevance/boost of these child documents in the final score. Gr. -- View this message in context: http://lucene.472066.n3.nabble.com/scoreMode-ToParentBlockJoinQuery-tp4205020.html Sent from the Solr - User mailing list archive at Nabble.com.
Creating a new collection via solrj
Hi, I would like to create programmatically a new collection with a given Schema (the schema.xml file is in my java project under a folder configuration/, for example) However, I did not find a solrj example describing these steps. If one of you could help.. thanks! Benjamin
Beginner problems with solr.ICUCollationField
Hello, I am trying to understand solr 5.1 (trying to overcome some problems I have with solr 3.6) by experimenting with the distributed package, but I am having problems using a solr.ICUCollationField field. Trying to create the collection using bin/create_collection aborts, because it says it does the classloader failed to find solr.ICUCollationField. The documentations says: ## QUOTE ### solr.ICUCollationField is included in the Solr analysis-extras contrib - see solr/contrib/analysis-extras/README.txt for instructions on which jars you need to add to your SOLR_HOME/lib in order to use it. The mentioned README.txt file says: ## QUOTE ### ICU relies upon lucene-libs/lucene-analyzers-icu-X.Y.jar and lib/icu4j-X.Y.jar Well, that's a bid odd in so far as there is no SOLR_HOME/lib directory. When I start solr with the verbose flag it says: SOLR_HOME = (...)/solr-5.1.0/server/solr However, what I did is first symlink and then copy the respective libraries to solr-5.1.0/server/lib/ext. That leads to the server not starting at all. I also tried solr-5.1.0/server/solr-webapp/webapp/WEB-INF/lib, but that does not make any difference at all. Then I had a look at the example configurations how it's done there and notices the lib tags in the respective solrconfig.xml files. But the process still fails, even though the logs indicate that the .jar files *did* load. There is a message: 201946 [qtp1055930828-14] INFO org.apache.solr.core.SolrConfig [booklooker shard2 booklooker_shard2_replica1] – Adding specified lib dirs to ClassLoader 201947 [qtp1055930828-14] INFO org.apache.solr.core.SolrResourceLoader [booklooker shard2 booklooker_shard2_replica1] – Adding 'file:/home/bjoern/solr-5.1.0/contrib/analysis-extras/lib/icu4j-54.1.jar' t o classloader 201948 [qtp1055930828-14] INFO org.apache.solr.core.SolrResourceLoader [booklooker shard2 booklooker_shard2_replica1] – Adding 'file:/home/bjoern/solr-5.1.0/contrib/analysis-extras/lucene-libs/lucene-an alyzers-icu-5.1.0.jar' to classloader and later: 202810 [qtp1055930828-14] ERROR org.apache.solr.core.CoreContainer [booklooker shard2 booklooker_shard2_replica1] – Error creating core [booklooker_shard2_replica1]: (...) Caused by: java.lang.ClassNotFoundException: solr.ICUCollationField So the respective jar files have been added to the class loader, but it does not find the field? Well, the class solr.ICUCollationField itself should be found somewhere in the org.apache.solr tree. Not in the org.apache.lucene tree and certainly not under com.ibm.icu4j. But I can't find the proper jar file for it... It would nice if someone could tell me what's wrong here. signature.asc Description: OpenPGP digital signature
Re: Creating a new collection via solrj
See the CollectionAdminRequest.createCollection etc. Best, Erick On Tue, May 12, 2015 at 3:53 AM, Sznajder ForMailingList bs4mailingl...@gmail.com wrote: Hi, I would like to create programmatically a new collection with a given Schema (the schema.xml file is in my java project under a folder configuration/, for example) However, I did not find a solrj example describing these steps. If one of you could help.. thanks! Benjamin
utility methods to get field values from index
Hi All, Was wondering if there is any class in Solr that provides utility methods to fetch indexed field values for documents using docId. Something simple like getMultiLong(String field, int docId) getLong(String field, int docId) We have written a solr component to return group level stats like avg score, max score etc over a large number of documents (say 5000+) against a query executed using edismax. Need to get the group id fields value to do that, this is a single valued long field. This component also looks at one more field that is a multivalued long field for each document and compute a score based on frequency + document score for each value. Currently we are using stored fields and was wondering if this approach would be faster. Apologies if this is too much to ask for. Parvesh Garg,
Re: Retrieving list of synonyms and facet field values
Hi Siamak, 1) You can do that with the managed resources : Take a look to the synonym section. https://cwiki.apache.org/confluence/display/solr/Managed+Resources Specifically : To determine the synonyms for a specific term, you send a GET request for the child resource, such as /schema/analysis/synonyms/english/mad would return [angry,upset]. Lastly, you can delete a mapping by sending a DELETE request to the managed endpoint. 2) you can use the Term Component ( https://cwiki.apache.org/confluence/display/solr/The+Terms+Component) It's quite straightforward to use . If you are talking about the facets, when you send a query to Solr , with the facets enabled, you simply need to parse the resulting Json ( or xml). In the case you are doing it programmatically SolrJ gives great support for the facets. Cheers 2015-05-12 14:43 GMT+01:00 Siamak Rowshan siamak.rows...@softmart.com: Hi all, I'm new to Solr and would appreciate any help with this question. Is there a way, to retrieve the list of synonyms via the API? I also need to retrieve the values of each facet field via API. For example the list of Cat facet includes: fiction, non-fiction, etc. Thanks, Siamak -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Sorting on multivalues field in Solr
Thanks Alex that was really useful. - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-on-multivalued-field-in-Solr-tp4204996p4205017.html Sent from the Solr - User mailing list archive at Nabble.com.
Retrieving list of synonyms and facet field values
Hi all, I'm new to Solr and would appreciate any help with this question. Is there a way, to retrieve the list of synonyms via the API? I also need to retrieve the values of each facet field via API. For example the list of Cat facet includes: fiction, non-fiction, etc. Thanks, Siamak
Re: scoreMode ToParentBlockJoinQuery
Hi , One year ago or something, it was not possible to have in Solr the results of the Join sorted ( it was not using the lucene sorting) . In solr it was only a filter query with no scoring. I should verify if we are currently in the same scenario. For sure it should not be a big deal to port the lucene feature in Solr. Cheers 2015-05-12 11:11 GMT+01:00 StrW_dev r.j.bamb...@structweb.nl: Hi Is it possible to configure the scoreMode of the Parent block join query parser (ToParentBlockJoinQuery)? It seems it's set to none, while i would require max in this case. What I want is to filter on child documents, but still use the relevance/boost of these child documents in the final score. Gr. -- View this message in context: http://lucene.472066.n3.nabble.com/scoreMode-ToParentBlockJoinQuery-tp4205020.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Beginner problems with solr.ICUCollationField
On 5/12/2015 7:03 AM, Björn Keil wrote: Well, that's a bid odd in so far as there is no SOLR_HOME/lib directory. When I start solr with the verbose flag it says: SOLR_HOME = (...)/solr-5.1.0/server/solr However, what I did is first symlink and then copy the respective libraries to solr-5.1.0/server/lib/ext. That leads to the server not starting at all. I also tried solr-5.1.0/server/solr-webapp/webapp/WEB-INF/lib, but that does not make any difference at all. The ${solr.solr.home}/lib directory does not exist by default in the Solr example, but if you create it and use it for all your contrib/user jars that your Solr config needs, it will work. You should completely remove all lib config elements from solrconfig.xml at the same time, and make sure that any jar you need is in that lib directory. All the jars will be loaded once and available to all cores. There seems to be some kind of problem with the classloader when certain jars (the ICU jars being the one example I'm sure about) are loaded more than once by the same classloader. https://issues.apache.org/jira/browse/SOLR-4852 Thanks, Shawn
Re: scoreMode ToParentBlockJoinQuery
I actually did some digging and changed the default ScoreMode in the source code, which actually allowed me to do what I want. So now I use the parent block join query which propogates the score. With the new child transformer for the return field I can even get the child info in the result :). -- View this message in context: http://lucene.472066.n3.nabble.com/scoreMode-ToParentBlockJoinQuery-tp4205020p4205074.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: scoreMode ToParentBlockJoinQuery
So , have you customised your Solr with a plugin ? Do you have additional info or documentation ? What is the new child transformer ? I never used it ! Cheers 2015-05-12 16:12 GMT+01:00 StrW_dev r.j.bamb...@structweb.nl: I actually did some digging and changed the default ScoreMode in the source code, which actually allowed me to do what I want. So now I use the parent block join query which propogates the score. With the new child transformer for the return field I can even get the child info in the result :). -- View this message in context: http://lucene.472066.n3.nabble.com/scoreMode-ToParentBlockJoinQuery-tp4205020p4205074.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England