RE: Transactional Behavior
Hello Emir, But this is not a transaction because if some of the bulk I need to add is committed; they will be searchable. In a transaction I need to insert a bulk of data (all bulk data will be searchable once) or roll it back according to some business scenarios. -- Regards, Amr Ali City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt Ext: 278 -Original Message- From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] Sent: Tuesday, May 12, 2015 10:46 PM To: solr-user@lucene.apache.org Subject: Re: Transactional Behavior Hi Amr, One option is to include transaction id in your documents and do delete in case of failed transaction. It is not cheap option - additional field if you don't have something to use to identify transaction. Assuming rollback will not happen to often deleting is not that big issue. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On 12.05.2015 22:37, Amr Ali wrote: Please check this https://lucene.apache.org/solr/4_1_0/solr-solrj/org/apache/solr/client/solrj/SolrServer.html#rollback() Note that this is not a true rollback as in databases. Content you have previously added may have been committed due to autoCommit, buffer full, other client performing a commit etc. It is not a real rollback if you have two threads T1 and T2 that are adding. If T1 is adding 500 and T2 is adding 3 then T2 will commit its 3 document PLUS the documents added by T1 (because T2 will finish add/commit before T2 due to the documents number). Solr transactions are server side only. -- Regards, Amr Ali City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt Ext: 278 -Original Message- From: Jack Krupansky [mailto:jack.krupan...@gmail.com] Sent: Tuesday, May 12, 2015 10:24 PM To: solr-user@lucene.apache.org Subject: Re: Transactional Behavior Solr does have a rollback/ command, but it is an expert feature and not so clear how it works in SolrCloud. See: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers and https://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 -- Jack Krupansky On Tue, May 12, 2015 at 12:58 PM, Amr Ali amr_...@siliconexpert.com wrote: Hello, I have a business case in which I need to be able for the rollback. When I tried add/commit I was not able to prevent other threads that write to a given Solr core from committing everything. I also tried indexwriter but Solr did not get changes until we restart it. -- Regards, Amr Ali City stars capital 8 - 3rd floor, Nasr city, Cairo, Egypt Ext: 278 -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: Unable to identify why faceting is taking so much time
On Wed, 2015-05-13 at 09:22 +, Abhishek Gupta wrote: Yes we have that many documents (exact count: 522664425), but I am not sure why that matters because what I understood from documentation is that fc will only work on the documents filtered by filter query and query. What the documentation does not mention explicitly is the UnInversion that takes place on first call. If you look in your Solr-log after UnInverted, you will see how many milliseconds it takes at the parameter time. For example: UnInverted multi-valued field {field=lsubject,memSize=216343445, tindexSize=1037315,time=36620,phase1=35868,nTerms=4440544,bigTerms=1, termInstances=55196823,uses=0} took 36620 milliseconds to UnInvert 4.440.544 terms. The number of references from documents to terms is 55.196.823. If we assume you have approximately 1 reference/document, you will have half a billion references or about 10 times my number. 10 times 37 seconds is quite close to the 300 seconds you state below. Of course our numbers cannot be compared directly, but it means that your measurements passed the sanity check. For my query there are only 137 documents for fc to work on and to make FieldCache. The mapping structure from your 522.664.425 documents to the values in your field (also in the higher millions, as I understand it) is independent of your search result. After the structure has been created, it is used to look up the terms used by your 137 hits. Also subsequent calls are not fast: First call time: 297572 Second call time (made with in 2 sec): 249287 Are you indexing while searching? Each time the index is changed, the UnInversion will have to be re-done. facet.method=fcs seems a better choice with an often-changing index of your size. - Toke Eskildsen, State and University Library, Denmark
Re: Unable to identify why faceting is taking so much time
Toke thanks for a quick reply. I am still confused, pls find the doubts I have inline: On Mon, May 11, 2015 at 1:22 PM Toke Eskildsen t...@statsbiblioteket.dk wrote: On Mon, 2015-05-11 at 05:48 +, Abhishek Gupta wrote: According to this there are 137 records. Now I am faceting over these 137 records with facet.method=fc. Ideally it should just iterate over these 137 records and sub up the facets. That is only the ideal method if you are not planning on issuing subsequent calls: facet.method=fc does more work up front to ensure that later calls are fast. http://localhost:9020/search/p1-umShard-1/select?q=*:*; fq=(msgType:38+AND+snCreatedTime:[2015-04-15T00:00:00Z%20TO%20*]) facet.field=conversationIdfacet=trueindent=onwt=jsonrows=0 facet.method=fcdebug=timing { - responseHeader: { - status: 0, - QTime: 395103 }, [...] According to this faceting is taking 395036 time. Why its taking *395 seconds* to just calculate facets of 137 records? 6½ minute is a long time, even for first call. Do you have tens to hundreds of millions of documents in your index? Or do you have a similiar amount of unique values in your facet? Yes we have that many documents (exact count: 522664425), but I am not sure why that matters because what I understood from documentation https://wiki.apache.org/solr/SimpleFacetParameters#facet.method is that *fc* will only work on the documents filtered by filter query and query. For my query there are only 137 documents for fc to work on and to make *FieldCache*. But seeing the faceting result it seems that faceting is being applied on all the documents which is not according to documentation *The facet counts are calculated by iterating over documents that match the query and summing the terms that appear in each document*. I am not able to understand why fc is calculating facets over all the documents? Just for your information the cardinality of the field(conversationId) on which I am faceting is very high but the possible values for this field matching my query and filter query is about 100 only. Either way, subsequent faceting calls should be much faster and a switch to DocValues should lower your first-call time significantly. Also subsequent calls are not fast: First call time: 297572 Second call time (made with in 2 sec): 249287 Yeah I agree docValues will reduce the time. Toke Eskildsen, State and University Library, Denmark
Re: How is the most relevant document of each group chosen when group.truncate is used?
Ok. Figured out it myself. Research has shown that when group.truncate (or collapsing query) is used then only head of the group is picked. That's why results are different. However group.facet gives facet results that I would want. The only thing that group.facet is very slow comparing to collapsing query. On Tue, May 12, 2015 at 6:15 PM, Andrii Berezhynskyi andrii.berezhyns...@home24.de wrote: Hi all, When I use group.truncate and filtering I'm getting strange faceting results. If I use just grouping without filtering: group=truegroup.field=parent_skugroup.ngroups=truegroup.truncate=truefacet=truefacet.field=color, then I get: facet_fields: { color: [ white, 19742, 19742 white items. However if I filter by white items: group=truegroup.field=parent_skugroup.ngroups=truegroup.truncate=truefacet=truefacet.field=colorfq=color:white, I'm getting 20543 items. The same happens when I use collapse query parser instead of grouping. I would expect those two numbers to be equal. So I assume the most relevant document of each group is chosen somehow differently when filtering is used. How can this be explained? Best regards, Andrii
Re: Beginner problems with solr.ICUCollationField
Thanks you for your help. That was only part of the problem, though.You also need ${solr.install.dir}/dist/solr-analysis-extras-X.jar where X is the version. The other two libraries are dependencies, but the do not contain the actual ICUCollationField class. It might be helpful if that was mentioned in the respective spots in the documentation and README.txt file.
Re: Reading an index while it is being updated?
On 5/13/2015 1:03 AM, Guy Thomas wrote: Up to now we’ve been using Lucene without Solr. The Lucene index is being updated and when the update is finished we notify a Hessian proxy service running on the web server that wants to read the index. When this proxy service is notified, the server knows it can read the updated index. Do we have the use a similar set-up when using Solr, that is: 1. Create/update the index 2. Notify the Solr client In Solr, the Solr server has complete control of the Lucene index and maintains the write lock at all times. Generally you create or update the index via requests to Solr, through the update handler. As soon as you issue a commit with openSearcher=true, and it completes, all clients can see the changes. There is no need to do any kind of notification. Commits may be fully automated within the Solr configuration or they may be explicitly sent by clients. If you are creating the index in some other way, then you generally need to reload the core. Recently (5.x versions) at least one person has been having trouble with loading a new index using RELOAD: https://issues.apache.org/jira/browse/SOLR-7526 Thanks, Shawn
Re: Beginner problems with solr.ICUCollationField
On 5/13/2015 4:16 AM, Björn Keil wrote: Thanks you for your help. That was only part of the problem, though.You also need ${solr.install.dir}/dist/solr-analysis-extras-X.jar where X is the version. The other two libraries are dependencies, but the do not contain the actual ICUCollationField class. It might be helpful if that was mentioned in the respective spots in the documentation and README.txt file. I have not used that particular class. I have used the ICU tokenizers and filters, which are in the lucene jar. The docs you quoted say this: --- solr.ICUCollationField is included in the Solr analysis-extras contrib see solr/contrib/analysis-extras/README.txt for instructions on which jars you need to add to your SOLR_HOME/lib in order to use it. --- That sounds to me like an indication that you need the solr analysis extras jar, which has the lucene-analyzers and icu4j jars as additional dependencies. The referenced README probably should mention that the required jar can be found in the dist/ folder of the binary download. Thanks, Shawn
Re: Wiki new user
Sergio - what is your wiki username? We can add you as an editor once you provide the username. Erik On May 13, 2015, at 10:33, Sergio Velasco ser...@mitula.com wrote: Hi, I would like to become a member of the Solr wiki. I have requested it to the Solr user lists and they have send me to this list to request access to the wiki. I am the Mitula CTO and we have been using Solr from the very beginning, 6 years ago. I think I can contribute a lot to this wiki. Thank you. www.mitula.com Sergio Velasco | Dpto. de Desarrollo Contáctame: ser...@mitula.com | Tfno. +34 917 08 21 47 | Fax +34 917 08 21 56 Síguenos en: Facebook.com/mitula.es.latam | @mitula_es | Linkedin.com/mitula | Blog El contenido de este correo electrónico puede ser confidencial o privilegiado. Si ha recibido este mensaje por error, por favor, no lo reenvíe a nadie. Le rogamos que borre todas las copias, mensajes adjuntos y por favor comuníquenos que lo ha recibido la persona equivocada. Gracias Antes de imprimir este mensaje, asegúrese de que es necesario. El medio ambiente está en nuestra mano El software de antivirus Avast ha analizado este correo electrónico en busca de virus. www.avast.com
Re: Is copyField a must?
I think with a proper configuration of the Edismax query parser and a proper management of field boosting, it's much more precise to use the list of interesting fields than a big blob copy field. Cheers 2015-05-13 15:54 GMT+01:00 Steven White swhite4...@gmail.com: Hi Everyone, In my search need, I will always be using df to specify the list of fields a search will be done in (the list of fields is group based which my application defines). Given this, is there any reason to use copyField to copy the data into a single master-field to search against? Am I losing any thing by not using copyField? Thanks, Steve -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Is copyField a must?
Hi Everyone, In my search need, I will always be using df to specify the list of fields a search will be done in (the list of fields is group based which my application defines). Given this, is there any reason to use copyField to copy the data into a single master-field to search against? Am I losing any thing by not using copyField? Thanks, Steve
Wiki new user
Hi, I would like to become a member of the Solr wiki. I have requested it to the Solr user lists and they have send me to this list to request access to the wiki. I am the Mitula CTO and we have been using Solr from the very beginning, 6 years ago. I think I can contribute a lot to this wiki. Thank you. http://img.mitula.net/www/mitula/images/firmas/logo_espanol.jpg http://www.mitula.com/ www.mitula.com Sergio Velasco | Dpto. de Desarrollo Contáctame: mailto:ser...@mitula.com ser...@mitula.com | Tfno. +34 917 08 21 47 | Fax +34 917 08 21 56 Síguenos en: http://www.facebook.com/mitula.es.latam Facebook.com/mitula.es.latam | http://twitter.com/mitula_es @mitula_es | http://www.linkedin.com/company/mitula.com Linkedin.com/mitula | http://blog-es.mitula.com/ Blog http://img.mitula.net/www/mitula/images/firmas/mitula-paises.jpg El contenido de este correo electrónico puede ser confidencial o privilegiado. Si ha recibido este mensaje por error, por favor, no lo reenvíe a nadie. Le rogamos que borre todas las copias, mensajes adjuntos y por favor comuníquenos que lo ha recibido la persona equivocada. Gracias Antes de imprimir este mensaje, asegúrese de que es necesario. El medio ambiente está en nuestra mano --- El software de antivirus Avast ha analizado este correo electrónico en busca de virus. http://www.avast.com
Re: Setting system property
Clemens - For this particular property, it is only accessed as a system property directly, so it must be set on the JVM startup and cannot be set any other way. Erik — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On May 13, 2015, at 3:49 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: I'd like to make use of solr.allow.unsafe.resourceloading=true. Is the commandline -D solr.allow.unsafe.resourceloading=true the only way to inject/set this property or can it be done (e.g.) in solr.xml ? Thx Clemens
Re: Is copyField a must?
No, there is no requirement for having a copyField of any kind. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On May 13, 2015, at 1:50 PM, Steven White swhite4...@gmail.com wrote: I don't have a need for Edismax. That said, do I still have a need for copyField into a default-field? Steve On Wed, May 13, 2015 at 11:13 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I think with a proper configuration of the Edismax query parser and a proper management of field boosting, it's much more precise to use the list of interesting fields than a big blob copy field. Cheers 2015-05-13 15:54 GMT+01:00 Steven White swhite4...@gmail.com: Hi Everyone, In my search need, I will always be using df to specify the list of fields a search will be done in (the list of fields is group based which my application defines). Given this, is there any reason to use copyField to copy the data into a single master-field to search against? Am I losing any thing by not using copyField? Thanks, Steve -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Is copyField a must?
Hmm, looks like I'm missing something here as I cannot get this to work. My need is as follows. From my application, I need to issue a generic search which is limited to a set of fields based on the group the user belongs to. For example, user-1 is in group-A which has default fields of F1, F2, F3. User-2 is in group-B which has default fields of F2, F3, F5, etc. What I tried to do is create multiple request handlers solrconfig.xml like so: requestHandler name=/select_group_a class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=dfF1,F2,F3/str str name=flid,score/str /lst /requestHandler And requestHandler name=/select_group_a class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=dfF2,F3,F5/str str name=flid,score/str /lst /requestHandler However, this isn't working because whatever is in df is being treated as single field name. How can I achieve my need? Note, I want to avoid a URL base solution (sending the list of fields over HTTP) because the list of fields could be large (1000+) and thus I will exceed GET limit quickly (does Solr support POST for searching, if so, than I can use URL base solution?) Thanks in advance. Steve On Wed, May 13, 2015 at 2:29 PM, Erik Hatcher erik.hatc...@gmail.com wrote: No, there is no requirement for having a copyField of any kind. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On May 13, 2015, at 1:50 PM, Steven White swhite4...@gmail.com wrote: I don't have a need for Edismax. That said, do I still have a need for copyField into a default-field? Steve On Wed, May 13, 2015 at 11:13 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I think with a proper configuration of the Edismax query parser and a proper management of field boosting, it's much more precise to use the list of interesting fields than a big blob copy field. Cheers 2015-05-13 15:54 GMT+01:00 Steven White swhite4...@gmail.com: Hi Everyone, In my search need, I will always be using df to specify the list of fields a search will be done in (the list of fields is group based which my application defines). Given this, is there any reason to use copyField to copy the data into a single master-field to search against? Am I losing any thing by not using copyField? Thanks, Steve -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Is copyField a must?
Looks like I got it working (however I still have an outstanding issue, see end of my email). Here is what I have done: 1) In my solrconfig.xml, I created: requestHandler name=/select_group_a class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=defTypeedismax/str str name=qfF1 F2 F3/str str name=fltype,id,score/str str name=wtxml/str str name=indenttrue/str /lst /requestHandler And requestHandler name=/select_group_b class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=defTypeedismax/str str name=qfF2 F3 F5/str str name=fltype,id,score/str str name=wtxml/str str name=indenttrue/str /lst /requestHandler 2) My search URL is now: http://localhost:8983/solr/db/select_group_a?q.op=ORq=search string and http://localhost:8983/solr/db/select_group_b?q.op=ORq=search string This all works, BUT when I use q=type:(PDF OR DOC OR TXT) so that I can further narrow down search to within, for example, file-extensions, this doesn't seem to work. Is this because using qf with edismax ends doesn't parse the string the same way as the default defType? Steve On Wed, May 13, 2015 at 6:11 PM, Steven White swhite4...@gmail.com wrote: Thanks for the quick reply Shawn. I will dig into dismax and edismax and come back with questions if I cannot figure it out. I avoided them thinking they are for faceting use only, my need is generic search (all the features I get via solr.SearchHandler) but limited to a set of fields. Steve On Wed, May 13, 2015 at 5:58 PM, Shawn Heisey apa...@elyograg.org wrote: On 5/13/2015 3:36 PM, Steven White wrote: requestHandler name=/select_group_a class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=dfF2,F3,F5/str str name=flid,score/str /lst /requestHandler However, this isn't working because whatever is in df is being treated as single field name. The df parameter is shorthand for default field. It is, by definition, a single field -- it is the field searched by default when you don't specify a field directly in a query handled by the default (lucene) query parser. The default parser doesn't search multiple fields for your search terms. What you're going to want to do here is use a different query parser -- dismax or edismax -- and put your field list in the qf field, separated by spaces rather than commas. The qf parameter means query fields and is specific to the dismax/edismax parsers. Depending on your exact needs, you may also want to define the pf parameter as well (phrase fields). There is a LOT of detail on these parsers, so I'll give you the documentation links rather than try and explain everything: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser Thanks, Shawn
Re: Is copyField a must?
On 5/13/2015 3:36 PM, Steven White wrote: requestHandler name=/select_group_a class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=dfF2,F3,F5/str str name=flid,score/str /lst /requestHandler However, this isn't working because whatever is in df is being treated as single field name. The df parameter is shorthand for default field. It is, by definition, a single field -- it is the field searched by default when you don't specify a field directly in a query handled by the default (lucene) query parser. The default parser doesn't search multiple fields for your search terms. What you're going to want to do here is use a different query parser -- dismax or edismax -- and put your field list in the qf field, separated by spaces rather than commas. The qf parameter means query fields and is specific to the dismax/edismax parsers. Depending on your exact needs, you may also want to define the pf parameter as well (phrase fields). There is a LOT of detail on these parsers, so I'll give you the documentation links rather than try and explain everything: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser Thanks, Shawn
Re: Is copyField a must?
Two things: 1 There's really no need to define two request handlers here. The defaults section is exactly that, defaults which can be overridden by the URL. So rather than have select_group_b, use something like ... solr/collection/select_group_a?q=whateverqf=F2,F3,F5 2 When you add a field qualifier to an edismax query, it overrides all the qf definitions. Consider using an fq (filter query) clause by adding fq=type:(PDF OR DOC OR TXT) Best, Erick On Wed, May 13, 2015 at 6:15 PM, Steven White swhite4...@gmail.com wrote: Looks like I got it working (however I still have an outstanding issue, see end of my email). Here is what I have done: 1) In my solrconfig.xml, I created: requestHandler name=/select_group_a class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=defTypeedismax/str str name=qfF1 F2 F3/str str name=fltype,id,score/str str name=wtxml/str str name=indenttrue/str /lst /requestHandler And requestHandler name=/select_group_b class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=defTypeedismax/str str name=qfF2 F3 F5/str str name=fltype,id,score/str str name=wtxml/str str name=indenttrue/str /lst /requestHandler 2) My search URL is now: http://localhost:8983/solr/db/select_group_a?q.op=ORq=search string and http://localhost:8983/solr/db/select_group_b?q.op=ORq=search string This all works, BUT when I use q=type:(PDF OR DOC OR TXT) so that I can further narrow down search to within, for example, file-extensions, this doesn't seem to work. Is this because using qf with edismax ends doesn't parse the string the same way as the default defType? Steve On Wed, May 13, 2015 at 6:11 PM, Steven White swhite4...@gmail.com wrote: Thanks for the quick reply Shawn. I will dig into dismax and edismax and come back with questions if I cannot figure it out. I avoided them thinking they are for faceting use only, my need is generic search (all the features I get via solr.SearchHandler) but limited to a set of fields. Steve On Wed, May 13, 2015 at 5:58 PM, Shawn Heisey apa...@elyograg.org wrote: On 5/13/2015 3:36 PM, Steven White wrote: requestHandler name=/select_group_a class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=dfF2,F3,F5/str str name=flid,score/str /lst /requestHandler However, this isn't working because whatever is in df is being treated as single field name. The df parameter is shorthand for default field. It is, by definition, a single field -- it is the field searched by default when you don't specify a field directly in a query handled by the default (lucene) query parser. The default parser doesn't search multiple fields for your search terms. What you're going to want to do here is use a different query parser -- dismax or edismax -- and put your field list in the qf field, separated by spaces rather than commas. The qf parameter means query fields and is specific to the dismax/edismax parsers. Depending on your exact needs, you may also want to define the pf parameter as well (phrase fields). There is a LOT of detail on these parsers, so I'll give you the documentation links rather than try and explain everything: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser Thanks, Shawn
Re: Is copyField a must?
On 5/13/2015 3:36 PM, Steven White wrote: Note, I want to avoid a URL base solution (sending the list of fields over HTTP) because the list of fields could be large (1000+) and thus I will exceed GET limit quickly (does Solr support POST for searching, if so, than I can use URL base solution?) Solr does indeed support a query sent as the body in a POST request. I'm not completely positive, but I think you'd use the same format as you put on the URL: q=foorows=1fq=bar If anyone knows for sure what should be in the POST body, please let me and Steven know. In particular, should the content be URL escaped, as might be required for a GET? Thanks, Shawn
Re: Is copyField a must?
Thanks for the quick reply Shawn. I will dig into dismax and edismax and come back with questions if I cannot figure it out. I avoided them thinking they are for faceting use only, my need is generic search (all the features I get via solr.SearchHandler) but limited to a set of fields. Steve On Wed, May 13, 2015 at 5:58 PM, Shawn Heisey apa...@elyograg.org wrote: On 5/13/2015 3:36 PM, Steven White wrote: requestHandler name=/select_group_a class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=dfF2,F3,F5/str str name=flid,score/str /lst /requestHandler However, this isn't working because whatever is in df is being treated as single field name. The df parameter is shorthand for default field. It is, by definition, a single field -- it is the field searched by default when you don't specify a field directly in a query handled by the default (lucene) query parser. The default parser doesn't search multiple fields for your search terms. What you're going to want to do here is use a different query parser -- dismax or edismax -- and put your field list in the qf field, separated by spaces rather than commas. The qf parameter means query fields and is specific to the dismax/edismax parsers. Depending on your exact needs, you may also want to define the pf parameter as well (phrase fields). There is a LOT of detail on these parsers, so I'll give you the documentation links rather than try and explain everything: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser Thanks, Shawn
Re: Is copyField a must?
Hi Erick, The fq did the trick. This basically solved my need, and I can call it a day (now that it is late Friday) The reason why I'm using two (and there will be move) handlers vs qf in the URL, is due to the GET limit. The list of fields will be large (nearing 1000) and each field name can be long (up to 40 characters). If Solr will accept POST, than I can pass the list via qf and call it a day (I might have to worry a bit about the larger than normal network traffic per search request, but I can deal with that). So, do you know or does any one know if Solr supports POST request? If so, what's the body format that I need to send (Shawn asked this question too)? Thanks Steve On Wed, May 13, 2015 at 8:13 PM, Erick Erickson erickerick...@gmail.com wrote: Two things: 1 There's really no need to define two request handlers here. The defaults section is exactly that, defaults which can be overridden by the URL. So rather than have select_group_b, use something like ... solr/collection/select_group_a?q=whateverqf=F2,F3,F5 2 When you add a field qualifier to an edismax query, it overrides all the qf definitions. Consider using an fq (filter query) clause by adding fq=type:(PDF OR DOC OR TXT) Best, Erick On Wed, May 13, 2015 at 6:15 PM, Steven White swhite4...@gmail.com wrote: Looks like I got it working (however I still have an outstanding issue, see end of my email). Here is what I have done: 1) In my solrconfig.xml, I created: requestHandler name=/select_group_a class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=defTypeedismax/str str name=qfF1 F2 F3/str str name=fltype,id,score/str str name=wtxml/str str name=indenttrue/str /lst /requestHandler And requestHandler name=/select_group_b class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=defTypeedismax/str str name=qfF2 F3 F5/str str name=fltype,id,score/str str name=wtxml/str str name=indenttrue/str /lst /requestHandler 2) My search URL is now: http://localhost:8983/solr/db/select_group_a?q.op=ORq=search string and http://localhost:8983/solr/db/select_group_b?q.op=ORq=search string This all works, BUT when I use q=type:(PDF OR DOC OR TXT) so that I can further narrow down search to within, for example, file-extensions, this doesn't seem to work. Is this because using qf with edismax ends doesn't parse the string the same way as the default defType? Steve On Wed, May 13, 2015 at 6:11 PM, Steven White swhite4...@gmail.com wrote: Thanks for the quick reply Shawn. I will dig into dismax and edismax and come back with questions if I cannot figure it out. I avoided them thinking they are for faceting use only, my need is generic search (all the features I get via solr.SearchHandler) but limited to a set of fields. Steve On Wed, May 13, 2015 at 5:58 PM, Shawn Heisey apa...@elyograg.org wrote: On 5/13/2015 3:36 PM, Steven White wrote: requestHandler name=/select_group_a class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=dfF2,F3,F5/str str name=flid,score/str /lst /requestHandler However, this isn't working because whatever is in df is being treated as single field name. The df parameter is shorthand for default field. It is, by definition, a single field -- it is the field searched by default when you don't specify a field directly in a query handled by the default (lucene) query parser. The default parser doesn't search multiple fields for your search terms. What you're going to want to do here is use a different query parser -- dismax or edismax -- and put your field list in the qf field, separated by spaces rather than commas. The qf parameter means query fields and is specific to the dismax/edismax parsers. Depending on your exact needs, you may also want to define the pf parameter as well (phrase fields). There is a LOT of detail on these parsers, so I'll give you the documentation links rather than try and explain everything: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser Thanks, Shawn
Re: QQ on segments during indexing.
Thanks Shawn, In my case, the document size is small. So, for sure it will reach 50k docs first than 100MB buffer size. Thanks, Manohar On Thu, May 14, 2015 at 10:49 AM, Shawn Heisey apa...@elyograg.org wrote: On 5/13/2015 10:01 PM, Manohar Sripada wrote: I have a question on segment creation on disk during indexing. In my solrconfig.xml, I have commented maxBufferedDocs and ramBufferSizeMB. I am controlling the flushing of data to disk using autoCommit's maxDocs and maxTime. Here, maxDocs is set to 5 and will be hit first, so that commit of data to disk happens every 5 docs. So, my question here is will it create a new segment when this commit happens? In the wiki https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor, it is mentioned that a new segment creation is determined based on maxBufferedDocs parameter. As I have commented this parameter, how a new segment creation is determined? In recent Solr versions, the ramBufferSizeMB setting defaults to 100 and maxBufferedDocs defaults to -1. A setting of -1 on maxBufferedDocs means that the number of docs doesn't matter, it will use ramBufferSizeMB unless a commit happens before the buffer fills up. A commit does trigger a segment flush, although if it's a soft commit, the situation might be more complicated. Unless the docs are very small, I would expect a 100MB buffer to fill up before you reach 5 docs. It's been a while since I watched index segments get created, but if I remember correctly, the amount of space required in the RAM buffer to index documents is more than the size of the segment that eventually gets flushed to disk. Thanks, Shawn
Re: QQ on segments during indexing.
On 5/13/2015 10:01 PM, Manohar Sripada wrote: I have a question on segment creation on disk during indexing. In my solrconfig.xml, I have commented maxBufferedDocs and ramBufferSizeMB. I am controlling the flushing of data to disk using autoCommit's maxDocs and maxTime. Here, maxDocs is set to 5 and will be hit first, so that commit of data to disk happens every 5 docs. So, my question here is will it create a new segment when this commit happens? In the wiki https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor, it is mentioned that a new segment creation is determined based on maxBufferedDocs parameter. As I have commented this parameter, how a new segment creation is determined? In recent Solr versions, the ramBufferSizeMB setting defaults to 100 and maxBufferedDocs defaults to -1. A setting of -1 on maxBufferedDocs means that the number of docs doesn't matter, it will use ramBufferSizeMB unless a commit happens before the buffer fills up. A commit does trigger a segment flush, although if it's a soft commit, the situation might be more complicated. Unless the docs are very small, I would expect a 100MB buffer to fill up before you reach 5 docs. It's been a while since I watched index segments get created, but if I remember correctly, the amount of space required in the RAM buffer to index documents is more than the size of the segment that eventually gets flushed to disk. Thanks, Shawn
Confusion about zkcli.sh and solr.war
I'm trying to use zkcli.sh to upload configurations to zookeeper and solr 5.1. It's throwing an error because it references webapps/solr.war which no longer exists. Do I have to build my own solr.war in order to use zkcli.sh? Please forgive me if I'm missing something here. Jim Musil
Re: Is copyField a must?
I don't have a need for Edismax. That said, do I still have a need for copyField into a default-field? Steve On Wed, May 13, 2015 at 11:13 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I think with a proper configuration of the Edismax query parser and a proper management of field boosting, it's much more precise to use the list of interesting fields than a big blob copy field. Cheers 2015-05-13 15:54 GMT+01:00 Steven White swhite4...@gmail.com: Hi Everyone, In my search need, I will always be using df to specify the list of fields a search will be done in (the list of fields is group based which my application defines). Given this, is there any reason to use copyField to copy the data into a single master-field to search against? Am I losing any thing by not using copyField? Thanks, Steve -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
QQ on segments during indexing.
I have a question on segment creation on disk during indexing. In my solrconfig.xml, I have commented maxBufferedDocs and ramBufferSizeMB. I am controlling the flushing of data to disk using autoCommit's maxDocs and maxTime. Here, maxDocs is set to 5 and will be hit first, so that commit of data to disk happens every 5 docs. So, my question here is will it create a new segment when this commit happens? In the wiki https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor, it is mentioned that a new segment creation is determined based on maxBufferedDocs parameter. As I have commented this parameter, how a new segment creation is determined? Thanks, Manohar
Re: utility methods to get field values from index
In Solr 5.0+ you can use Lucene's DocValues API to read the indexed information. This is a unifying API over field cache and doc values so it can be used on all indexed fields. e.g. for single-valued field use searcher.getLeafReader().getSortedDocValues(fieldName); and for multi-valued fields use searcher.getLeafReader().getSortedSetDocValues(fieldName); On Wed, May 13, 2015 at 11:11 AM, Parvesh Garg parv...@zettata.com wrote: Hi All, Was wondering if there is any class in Solr that provides utility methods to fetch indexed field values for documents using docId. Something simple like getMultiLong(String field, int docId) getLong(String field, int docId) We have written a solr component to return group level stats like avg score, max score etc over a large number of documents (say 5000+) against a query executed using edismax. Need to get the group id fields value to do that, this is a single valued long field. This component also looks at one more field that is a multivalued long field for each document and compute a score based on frequency + document score for each value. Currently we are using stored fields and was wondering if this approach would be faster. Apologies if this is too much to ask for. Parvesh Garg, -- Regards, Shalin Shekhar Mangar.
Reading an index while it is being updated?
Up to now we've been using Lucene without Solr. The Lucene index is being updated and when the update is finished we notify a Hessian proxy service running on the web server that wants to read the index. When this proxy service is notified, the server knows it can read the updated index. Do we have the use a similar set-up when using Solr, that is: 1. Create/update the index 2. Notify the Solr client [cid:image001.jpg@01D08D5B.0112E420] Guy Thomas Analist-Programmeur Provincie Vlaams-Brabant Dienst Projecten en Ontwikkelingen Provincieplein 1 - 3010 Leuven Tel: 016-26 79 45 www.vlaamsbrabant.behttp://www.vlaamsbrabant.be/ Aan dit bericht kunnen geen rechten worden ontleend. Alle berichten naar dit professioneel e-mailadres kunnen door de werkgever gelezen worden. In het kader van de vervulling van onze taak van openbaar belang nemen wij uw relevante persoonlijke gegevens op in onze bestanden. U kunt deze inzien en verbeteren conform de Wet Verwerking Persoonsgegevens van 8 december 1992. Het ondernemingsnummer van het provinciebestuur is 0253.973.219
Re: utility methods to get field values from index
Hi Shalin, Thanks for your answer. Forgot to mention that we are using 4.10 solr. Also, I tried using docValues and the performance was worse than getting it from stored values. Time taken to retrieve data for 2000 docs for 2 fields was 120 ms vs 230 ms previously and for docValues respectively. May be there is something wrong in my code. The code used for retrieving docValues is: *public* *static* *long* getSingleLong(*SolrIndexSearcher* searcher, *int* docId, *String* field) *throws* IOException { *NumericDocValues* sdv = *DocValues*.*getNumeric* (searcher.getAtomicReader(), field); *return* sdv.get(docId); } and *public* *static* *ListLong* getMultiLong(*SolrIndexSearcher* searcher, *int* docId, *String* field) *throws* IOException { *SortedSetDocValues* ssdv = *DocValues*.*getSortedSet*( searcher.getAtomicReader(), field); ssdv.setDocument(docId); *long* l; *ListLong* retval = *new* *ArrayListLong*(40); *while* ((l = ssdv.nextOrd()) != *SortedSetDocValues*.*NO_MORE_ORDS*) { *BytesRef* bytes = ssdv.lookupOrd(l); retval.add(*NumericUtils*.*prefixCodedToLong*(bytes)); } *return* retval; } Parvesh Garg On Wed, May 13, 2015 at 11:36 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: In Solr 5.0+ you can use Lucene's DocValues API to read the indexed information. This is a unifying API over field cache and doc values so it can be used on all indexed fields. e.g. for single-valued field use searcher.getLeafReader().getSortedDocValues(fieldName); and for multi-valued fields use searcher.getLeafReader().getSortedSetDocValues(fieldName); On Wed, May 13, 2015 at 11:11 AM, Parvesh Garg parv...@zettata.com wrote: Hi All, Was wondering if there is any class in Solr that provides utility methods to fetch indexed field values for documents using docId. Something simple like getMultiLong(String field, int docId) getLong(String field, int docId) We have written a solr component to return group level stats like avg score, max score etc over a large number of documents (say 5000+) against a query executed using edismax. Need to get the group id fields value to do that, this is a single valued long field. This component also looks at one more field that is a multivalued long field for each document and compute a score based on frequency + document score for each value. Currently we are using stored fields and was wondering if this approach would be faster. Apologies if this is too much to ask for. Parvesh Garg, -- Regards, Shalin Shekhar Mangar.
Setting system property
I'd like to make use of solr.allow.unsafe.resourceloading=true. Is the commandline -D solr.allow.unsafe.resourceloading=true the only way to inject/set this property or can it be done (e.g.) in solr.xml ? Thx Clemens
Upgrading from Solr 5.0.0 to Solr 5.1.0
Hi, As this is my first time planning to do an upgrade between different Solr version, would like to check, how should we go about doing the upgrade so that I can start up my Solr 5.1.0 with my config and index built on Solr 5.0.0? Like what files do I need to copy and what are the things to take note of? I'm also using external ZooKeeper 3.4.6, so my config files are loaded at the ZooKeeper. Regards, Edwin
Block Join Query update documents, how to do it correctly?
I am using the Block Join Query Parser with success, following the example on: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers As this example shows, each parent document can have a number of documents embedded, and each document, be it a parent or a child, has its own unique identifier. Now I would like to update some of the parent documents, and read that there are horror stories with duplicate documents, scrambled data etc., the two prominent JIRA entries for this are: https://issues.apache.org/jira/browse/SOLR-6700 https://issues.apache.org/jira/browse/SOLR-6096 My question is, how do you usually update such documents, for example to update a value for the parent or a value for one of its children? I tried to repost the whole modified document (the parent and ALL of its children as one file), and it seems to work on a small toy example, but of course I cannot be sure for a larger instance with thousands of documents, and I would like to know if this is the correct way to go or not. To make it clear, if originally I used bin/solr post on on the following file: add doc field name=id1/field field name=titleSolr has block join support/field field name=content_typeparentDocument/field doc field name=id2/field field name=commentsSolrCloud supports it too!/field /doc /doc /add Now I could do bin/solr post on a file: add doc field name=id1/field field name=titleUpdated field: Solr has block join support/field field name=content_typeparentDocument/field doc field name=id2/field field name=commentsUpdated field: SolrCloud supports it too!/field /doc /doc /add Will this avoid these inconsistent and scrambled or duplicate data on Solr instances as discussed in the JIRAs? How do you usually do this? Thanks for any help or hints. Tom