[Parent] doc transformer
Hi, Is there in Solr a kind of [parent] doc transformer (like the [child] doc transformer) that can be used to embed parents fields in the response of a query that uses the block join children query parser? Thank you, Aurélien MAZOYER
RE: Issue with scoreNodes stream expression
Hi, Thank you for your advice. It helps me to notice that the exception seems to be thrown when no data is gathered by the gatherNodes expression (not a very explicit error message ). I modified the expression and it works well now. Thank you, Aurélien -Message d'origine- De : Joel Bernstein [mailto:joels...@gmail.com] Envoyé : mercredi 20 septembre 2017 04:11 À : solr-user@lucene.apache.org Objet : Re: Issue with scoreNodes stream expression Have you tried running a very simple expression first. For example does this run: random(gettingstarted, q="*:*", fl="id", rows="200") Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Sep 19, 2017 at 4:56 PM, Aurélien MAZOYER < aurelien.mazo...@francelabs.com> wrote: > Hi, > > > > I wanted to try the new scoreNodes stream expression that is used to > make > recommendations: > > https://cwiki.apache.org/confluence/display/solr/Graph+ > Traversal#GraphTraver > sal-UsingthescoreNodesFunctiontoMakeaRecommendation > > but encountered some issue with it. > > > > The following steps can easily reproduce the problem: > > I started Solr (6.6.1) in cloud mode : > > solr -e cloud -noprompt > > then run the following command in exampledocs to index the sample data : > > java -Dc=gettingstarted -jar post.jar *.xml > > and to finish copy/paste the following expression in the stream tab: > > scoreNodes(top(n=25, > > sort="count(*) desc", > > nodes(gettingstarted, > > random(gettingstarted, q="*:*", > fl="id", rows="200"), > > walk="id->id", > > gather="id", > > count(* > > (yes I now that my stream expression does nothing usefull :-P). > > Anyway, I got the following exception when I run the query: > > "EXCEPTION": "org.apache.solr.client.solrj.SolrServerException: No > collection param specified on request and no default collection has > been set.", > > Any idea of what i did wrong? > > > > Thank you, > > > > Regards, > > > > Aurélien > > > > > > > >
Issue with scoreNodes stream expression
Hi, I wanted to try the new scoreNodes stream expression that is used to make recommendations: https://cwiki.apache.org/confluence/display/solr/Graph+Traversal#GraphTraver sal-UsingthescoreNodesFunctiontoMakeaRecommendation but encountered some issue with it. The following steps can easily reproduce the problem: I started Solr (6.6.1) in cloud mode : solr -e cloud -noprompt then run the following command in exampledocs to index the sample data : java -Dc=gettingstarted -jar post.jar *.xml and to finish copy/paste the following expression in the stream tab: scoreNodes(top(n=25, sort="count(*) desc", nodes(gettingstarted, random(gettingstarted, q="*:*", fl="id", rows="200"), walk="id->id", gather="id", count(* (yes I now that my stream expression does nothing usefull :-P). Anyway, I got the following exception when I run the query: "EXCEPTION": "org.apache.solr.client.solrj.SolrServerException: No collection param specified on request and no default collection has been set.", Any idea of what i did wrong? Thank you, Regards, Aurélien
Re: [E] Re: Stemming
No problem :-) Aurélien Le 16/06/2016 22:36, Jamal, Sarfaraz a écrit : Oh, is this what you meant? content_stemming I changed it to content_stemming and now it seems to work :) - It was _text_ before - Thanks! I will update if I discover anything amiss Thanks again so much =) Sas -Original Message- From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com] Sent: Thursday, June 16, 2016 4:36 PM To: solr-user@lucene.apache.org Subject: Re: [E] Re: Stemming Hi, I was just wondering if you are sure that you query only that field (or fields that use your text_stem analyzer) and not other fields (in your qf for example is you use edismax) that can give you uncorrect results. Regards, Aurélien Le 16/06/2016 22:29, Jamal, Sarfaraz a écrit : Hello =) Just to be safe and make sure it's happening at indexing time AS WELL as QUERYING time - I modified it to be like so: I am re-indexing the files And what do you mean about only querying one field? I am not entirely sure I understand.. Sas -Original Message- From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com] Sent: Thursday, June 16, 2016 4:20 PM To: solr-user@lucene.apache.org Subject: [E] Re: Stemming Hi, Yes you should have the same resultset. Are you sure that you reindex all the data after changing your schema? Are you sure that you put your analyzer both at indexing and querying? Are you sure you query only one field? Regards, Aurélien Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit : Hi Guys, I have enabled stemming: In the Admin Analysis, I type in running or runs and they both break down to run. However when I search for run, runs, or running with an actual query - It brings back three different sets of results. Is that correct? I would imagine that all three would bring back the exact same resultset? Sas
Re: [E] Re: Stemming
Hi, I was just wondering if you are sure that you query only that field (or fields that use your text_stem analyzer) and not other fields (in your qf for example is you use edismax) that can give you uncorrect results. Regards, Aurélien Le 16/06/2016 22:29, Jamal, Sarfaraz a écrit : Hello =) Just to be safe and make sure it's happening at indexing time AS WELL as QUERYING time - I modified it to be like so: I am re-indexing the files And what do you mean about only querying one field? I am not entirely sure I understand.. Sas -Original Message- From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com] Sent: Thursday, June 16, 2016 4:20 PM To: solr-user@lucene.apache.org Subject: [E] Re: Stemming Hi, Yes you should have the same resultset. Are you sure that you reindex all the data after changing your schema? Are you sure that you put your analyzer both at indexing and querying? Are you sure you query only one field? Regards, Aurélien Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit : Hi Guys, I have enabled stemming: In the Admin Analysis, I type in running or runs and they both break down to run. However when I search for run, runs, or running with an actual query - It brings back three different sets of results. Is that correct? I would imagine that all three would bring back the exact same resultset? Sas
Re: Stemming
Hi, Yes you should have the same resultset. Are you sure that you reindex all the data after changing your schema? Are you sure that you put your analyzer both at indexing and querying? Are you sure you query only one field? Regards, Aurélien Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit : Hi Guys, I have enabled stemming: In the Admin Analysis, I type in running or runs and they both break down to run. However when I search for run, runs, or running with an actual query - It brings back three different sets of results. Is that correct? I would imagine that all three would bring back the exact same resultset? Sas
Re: Is it different? q=(field1:value1 OR field2:value2) and q=field1:value1 OR field2:value2
Hi, I think both the two queries are rewrited to the same query. You can use the debugQuery=on parameter to see how the query is rewrited and then compare if you get the same result for each query. Regards, Aurélien Le 26/02/2016 14:27, vitaly bulgakov a écrit : Is there a difference when we put query in brackets? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-different-q-field1-value1-OR-field2-value2-and-q-field1-value1-OR-field2-value2-tp4259976.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deletion Policy in Solr Cloud
Thank you for your answer Marc, Aurélien On 15/06/2015 23:19, Mark Miller wrote: SolrCloud does not really support any form of rollback. On Mon, Jun 15, 2015 at 5:05 PM Aurélien MAZOYER aurelien.mazo...@francelabs.com wrote: Hi all, Is DeletionPolicy customization still available in Solr Cloud? Is there a way to rollback to a previous commit point in Solr Cloud thanks to a specific deletion policy? Thanks, Aurélien -- Aurélien MAZOYER Expert en technologies de recherche France Labs *** Découvrez Datafari v1.0 sur datafari.com *** CEEI Nice Premium 1 boulevard Maître Maurice Slama 06200 Nice Tel : +33 (0) 683366620 www.francelabs.com
Deletion Policy in Solr Cloud
Hi all, Is DeletionPolicy customization still available in Solr Cloud? Is there a way to rollback to a previous commit point in Solr Cloud thanks to a specific deletion policy? Thanks, Aurélien
Re: Order synonyms
Hi, I am afraid you don't use the right component. In your example, you will match apple, darty and boulanger documents, sorted by the default Solr scoring mechanism (TF-IDF) that won't take the order you specified in your synonyms.txt file into account for the scoring. If you want to override the solr scoring mecanism for a query, you can have a look to the solr elevate component: https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component Regards, Aurélien On 20/01/2015 17:28, Antoine REBOUL wrote: Hello, (sorry for my English , I use a translator) I used synonyms in solr . My question is the following: How to order the results list according to the order of synonyms ? My synonyms are written as follows in mysynonyms.txt file : ipad = apple , Darty , Boulanger I want that when you search for ipad the results appear in the following order: 1 / Apple 2 / Darty 3 / Boulanger Unless Apple is not returned first. Do you have an idea to offer me ? Thank you in advance. Merci d'avance. Antoine Reboul Responsable Comparateurs / Plateforme emailing Plebicom - eBuyClub - Cashstore - Checkdeal PLEBICOM – 29 avenue Joannes Masset – 69009 Lyon Tel : 04 72 85 81 49 Fax : 04 78 83 39 74 -- Aurélien MAZOYER Expert en technologies de recherche France Labs CEEI Nice Premium 1 boulevard Maître Maurice Slama 06200 Nice Tel : +33 (0) 683366620 www.francelabs.com
Re: How do I get index size and datasize
Hi, Have a look the 'data' directory in your solr_home. .fdt and fdx. files are used to store the data of stored field. You can consider the size of the other files as the size Solr uses for its index. You can have a look to http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/codecs/lucene49/package-summary.html#file-names to have more information. Regards, Le 25/08/2014 09:40, Ramprasad Padmanabhan a écrit : I have solr working for my stats pages. When I run the index I need to know how much of the size occupied by solr is used for index and how much is used for storing non indexed data
Re: Query regarding URL Analysers
Hi, Maybe I am wrong but I am not that you can find such a tokenizer in solr out-of-the-box. I can suggest to have a look to PatternTokenizer and PathTokenizer. Note that you can also implement your own tokenizer and add it to Solr as a plugin. Regards, Aurélien MAZOYER Le 21/08/2014 14:35, Sathyam a écrit : Hi, I needed to generate tokens out of a URL such that I am able to get hierarchical units of the URL as well as each individual entity as tokens. For example: *Given a URL : * http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10b=20c=30#xyz The tokens that I need are : *Hierarchical subsets of the URL* 1 http:// 2 http://www.google.com/ 3 http://www.google.com/abcd/ 4 http://www.google.com/abcd/efgh/ 5 http://www.google.com/abcd/efgh/ijkl/ 6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php *Individual elements in the path to the resource* 7 abcd 8 efgh 9 ijkl 10 mnop.php *Query Terms* 11 a=10 12 b=20 13 c=30 *Fragment* 14 xyz This comes to a total of 14 tokens for the given URL. Basically a URL analyzer that creates tokens based on the categories mentioned in bold. Also a separate token for port(if mentioned). I would like to know how this can be achieved by using a single analyzer that uses a combination of the tokenizers and filters provided by solr. Also curious to know why there is a restriction of only *one *tokenizer to be used in an analyzer. Looking forward to a response from your side telling the best possible way to achieve the closest to what I need. Thanks.
Re: sample Cell schema question
indexed means you can search it, stored means you can return the value to the user or highlight it. Both consum disk space. A copyfield is not a kind of special field : it is a directive that copies one field values to another field. They are many use cases for using copy fields. In the example, we use a specific field, text, as a default field where use will perform the searches. That is why we copy all fields that we want to search in that specific field text(note that there are other way to search multiple fields : have a look to http://wiki.apache.org/solr/ExtendedDisMax) For exemple, the field contentis copied to the text field (that is indexed) for searching. As we will use the field text to perform our search, we don't need to index the content field too, and we don't, you save some disk space. Regards, Aurélien Le 19/08/2014 13:05, Aman Tandon a écrit : I have a question, does storing the data in copyfields save space? With Regards Aman Tandon On Tue, Aug 19, 2014 at 3:02 PM, jmlucjav jmluc...@gmail.com wrote: ok, I had not noticed text contains also the other metadata like keywords, description etc, nevermind! On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav jmluc...@gmail.com wrote: In the sample schema.xml I can see this: !-- Main body of document extracted by SolrCell. NOTE: This field is not indexed by default, since it is also copied to text using copyField below. This is to save space. Use this field for returning and highlighting document content. Use the text field to search the content. -- field name=content type=text_general indexed=false stored=true multiValued=true/ I am wondering, how does having this split in two fields text/content save space?
Re: logging in solr
Hi, Are you using tomcat or jetty? If you use the default jetty, have a look to : http://wiki.apache.org/solr/LoggingInDefaultJettySetup Regards, Aurélien Le 18/08/2014 22:43, M, Arjun (NSN - IN/Bangalore) a écrit : Hi, Currently in my component Solr is logging to catalina.out. What is the configuration needed to redirect those logs to some custom logfile eg: Solr.log. Thanks... --Arjun
Re: logging in solr
Sorry, outdated link. And I suppose you use tomcat if you are talking about catalina.out The correct link is : http://wiki.apache.org/solr/SolrLogging#Solr_4.3_and_above Le 18/08/2014 23:06, Aurélien MAZOYER a écrit : Hi, Are you using tomcat or jetty? If you use the default jetty, have a look to : http://wiki.apache.org/solr/LoggingInDefaultJettySetup Regards, Aurélien Le 18/08/2014 22:43, M, Arjun (NSN - IN/Bangalore) a écrit : Hi, Currently in my component Solr is logging to catalina.out. What is the configuration needed to redirect those logs to some custom logfile eg: Solr.log. Thanks... --Arjun
Re: Selectively setting the number of returned SOLR rows per field based on field value
I am afraid you can't. I think your problem is linked to this issue that is still unresolved : https://issues.apache.org/jira/browse/SOLR-1093 Aurélien Le 17/08/2014 23:16, talt a écrit : I have a field in my SOLR index, let's call it book_title. A query returns 15 rows with book_title:The Kite Runner, 13 rows with book_title:The Stranger, and 8 rows with book_title:The Ruby Way. Is there a way to return only the first row of The Kite Runner and The Stranger, but all of the The Ruby Way rows from the previous query result? This would result in 10 rows altogether. Is this possible at all, using a single query? -- View this message in context: http://lucene.472066.n3.nabble.com/Selectively-setting-the-number-of-returned-SOLR-rows-per-field-based-on-field-value-tp4153441.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can I use multiple cores
Hi Paul and Ramprasad, I follow your discussion with interest as I will have more or less the same requirement. When you say that you use on demand core loading, are you talking about LotsOfCore stuff? Erick told me that it does not work very well in a distributed environnement. How do you handle this problem? Do you use multiple single Solr instances? What about failover? Thanks for your answer, Aurelien Le 12/08/2014 14:48, Noble Paul a écrit : Hi Ramprasad, I have used it in a cluster with millions of users (1 user per core) in legacy cloud mode .We used the on demand core loading feature where each Solr had 30,000 cores and at a time only 2000 cores were in memory. You are just hitting 400 and I don't see much of a problem . What is your h/w bTW? On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores
Re: Passivate core in Solr Cloud
Thank you Erick and Alex for your answers. Lots of core stuff seems to meet my requirement but it is a problem if it does not work with Solr Cloud. Is there an issue opened for this problem? If I understand well, the only solution for me is to use multiple monoinstances of Solr using transient cores and to distribute manually the cores for my tenant (I assume the LRU mechanimn will be less effective as it will be done per solr instance). When you say does NOT play nice with distributed mode, does it also include the standard replication mecanism? Thanks, Regards, Aurelien Le 23/07/2014 17:21, Erick Erickson a écrit : Do note that the lots of cores stuff does NOT play nice with in distributed mode (yet). Best, Erick On Wed, Jul 23, 2014 at 6:00 AM, Alexandre Rafalovitcharafa...@gmail.com wrote: Solr has some support for large number of cores, including transient cores:http://wiki.apache.org/solr/LotsOfCores Regards, Alex. Personal:http://www.outerthoughts.com/ and @arafalov Solr resources:http://www.solr-start.com/ and @solrstart Solr popularizers community:https://www.linkedin.com/groups?gid=6713853 On Wed, Jul 23, 2014 at 7:55 PM, Aurélien MAZOYER aurelien.mazo...@francelabs.com wrote: Hello, We want to setup a Solr Cloud cluster in order to handle a high volume of documents with a multi-tenant architecture. The problem is that an application-level isolation for a tenant (using a mutual index with a field customer) is not enough to fit our requirements. As a result, we need 1 collection/customer. There is more than a thousand customers and it seems unreasonable to create thousands of collections in Solr Cloud... But as we know that there are less than 1 query/customer/day, we are currently looking for a way to passivate collection when they are not in use. Can it be a good idea? If yes, are there best practices to implement this? What side effects can we expect? Do we need to put some application-level logic on top on the Solr Cloud cluster to choose which collection we have to unload (and maybe there is something smarter (and quicker?) than simply loading/unloading the core when it is not in used?) ? Thank you for your answer(s), Aurelien
Multipart documents with different update cycles
Hello, I have to index a dataset containing multipart documents. The main part and the user metadata part have different update cycles : we want to update the user metadata part frequently without having to refetch the main part from the datasource nor storing every fields in order to use atomic update. As there is no true field level update in Solr yet, I am afraid that I have to build an index for both parts and to perform a query time join, with all the well-known performance limitation. I have also heard of side car index. Is it a solution that can meet my requirements? Is it stable enough to be usable in production? Does the community plan to make it part of the trunk code? Thanks, Aurelien
Passivate core in Solr Cloud
Hello, We want to setup a Solr Cloud cluster in order to handle a high volume of documents with a multi-tenant architecture. The problem is that an application-level isolation for a tenant (using a mutual index with a field customer) is not enough to fit our requirements. As a result, we need 1 collection/customer. There is more than a thousand customers and it seems unreasonable to create thousands of collections in Solr Cloud... But as we know that there are less than 1 query/customer/day, we are currently looking for a way to passivate collection when they are not in use. Can it be a good idea? If yes, are there best practices to implement this? What side effects can we expect? Do we need to put some application-level logic on top on the Solr Cloud cluster to choose which collection we have to unload (and maybe there is something smarter (and quicker?) than simply loading/unloading the core when it is not in used?) ? Thank you for your answer(s), Aurelien