Re: Remove operation of partial update doesn't work
In my code when operation is add it works correctly on a multivalue field. But no multivalue field can be deleted with remove operation. The add operation adds a value to a multivaled field. The remove operation removes a value from a multivalued field. If you believe that something is not working, please state clearly why you believe that something is not working. Start by describing the symptom. -- Jack Krupansky On Mon, Jul 6, 2015 at 9:22 PM, Mohsen Saboorian mohs...@gmail.com wrote: I can partially 'add' fields to my Solr index, but 'remove' operation seems not working. I'm on Solr 4.10. Here is my SolrJ snippet: SolrInputDocument doc = new SolrInputDocument(); MapString, Object partialUpdate = new HashMap(); partialUpdate.put(operation, value); // value can be object (string, number, etc) or list. operation can be add, set or remove. doc.addField(id, id); // document id doc.addField(fieldName, partialUpdate); getSolrServer().add(doc, commitWithin); Is there anything wrong with my code?
Re: Too many Soft commits and opening searchers realtime
Summer, A log excerpt usually helps to troubleshoot any magic. Would you mind to provide one? On Wed, Jul 8, 2015 at 2:30 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: So you are saying that no-one is triggering any commit, and that the auto soft commit solution is not actually waiting the proper time ? I suspect something is not like described, because if the Auto Soft commit was not working I would expect thousands of bugs raised. let's dig a little bit into details… What are you using exactly to index content ? Maybe some commit is actually hidden there :) Cheers 2015-07-08 2:21 GMT+01:00 Summer Shire shiresum...@gmail.com: No the client lets solr handle it. On Jul 7, 2015, at 2:38 PM, Mike Drob mad...@cloudera.com wrote: Are the clients that are posting updates requesting commits? On Tue, Jul 7, 2015 at 4:29 PM, Summer Shire shiresum...@gmail.com wrote: HI All, Can someone help me understand the following behavior. I have the following maxTimes on hard and soft commits yet I see a lot of Opening Searchers in the log org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1656a258 [main] realtime also I see a soft commit happening almost every 30 secs org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} autoCommit maxTime48/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime18/maxTime /autoSoftCommit I tried disabling softCommit by setting maxTime to -1. On startup solrCore recognized it and logged Soft AutoCommit: disabled but I could still see softCommit=true org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} autoSoftCommit maxTime-1/maxTime /autoSoftCommit Thanks, Summer -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Too many Soft commits and opening searchers realtime
A realtime searcher is necessary for internal bookkeeping / uses if a normal searcher isn't opened on a commit. This searcher doesn't have caches and hence doesn't carry the weight that a normal searcher would. It's also invisible to clients (it doesn't change the view of the index for normal searches). Your hard autocommit at 8 minutes with openSearcher=false will trigger a realtime searcher to open on every 8 minutes along with the hard commit. -Yonik On Tue, Jul 7, 2015 at 5:29 PM, Summer Shire shiresum...@gmail.com wrote: HI All, Can someone help me understand the following behavior. I have the following maxTimes on hard and soft commits yet I see a lot of Opening Searchers in the log org.apache.solr.search.SolrIndexSearcher- Opening Searcher@1656a258[main] realtime also I see a soft commit happening almost every 30 secs org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} autoCommit maxTime48/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime18/maxTime /autoSoftCommit I tried disabling softCommit by setting maxTime to -1. On startup solrCore recognized it and logged Soft AutoCommit: disabled but I could still see softCommit=true org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} autoSoftCommit maxTime-1/maxTime /autoSoftCommit Thanks, Summer
Re: Solr Boost Search word before Specific Content
Thanks Ahmet for the proposed Solution, that should work, but it is really hardcoded and coupled with the specific keyword ( with in the example) . I recently read an article from master Doug ( http://opensourceconnections.com/blog/2014/12/08/title-search-when-relevancy-is-only-skin-deep/ ) . I do believe this is the point you should start with. In particular take extra care of the Pantheon approach, which can be really useful to you. … We can use our pantheon along with a KeepWordsFilter http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keep-words-tokenfilter.html to create yet another search field to use in our search. We can create a “keep words” list that contains the terms in our pantheon. Only terms in our list make it into the search index. We can call this field pantheon_title. For example, when the following title is analyzed to go into the index: - Who was Socrates we will strip out all terms other than the ones in our pantheon: - Socrates Similarly the title - Socrates and Plato on Metaphysics Can be boiled down to these three members of our pantheon: - Socrates Plato Metaphysics Hope this can help ! Cheers 2015-07-08 8:09 GMT+01:00 Ahmet Arslan iori...@yahoo.com.invalid: Hi Jack, Here is hypothetical example: product_title_1 : dell laptop with laptop bag product_title_2 : laptop bag with cover product_title_3 : laptop bag and table You create an artificial/additional field, before_field_1 : dell laptop before_field_2 : laptop bag before_field_3 : laptop bag You can implement/embed any complex/custom logic (to indexing code) for obtaining values of this new boostable before_field. You can even implement it in a custom update processor. Then, at search time, use (e)Dismax's field boosting mechanism q=Laptop bagqf=product_title^0.3 before_field^0.7defType=edismax Ahmet On Wednesday, July 8, 2015 6:56 AM, JACK mfal...@gmail.com wrote: Hi Ahmet, Can you elaborate it more? Is it possible to solve my problem in Solr 5.0.0? if yes can just explain how? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Boost-Search-word-before-Specific-Content-tp4216072p4216257.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Remove operation of partial update doesn't work
In this scenarios, Documentation is key : Modifier Usage set Set or replace the field value(s) with the specified value(s), or *remove the values if 'null' or empty list is specified as the new value.* May be specified as a single value, or as a list for multivalued fields add Adds the specified values to a multivalued field. May be specified as a single value, or as a list. remove Removes (all occurrences of) the specified values from a multivalued field. May be specified as a single value, or as a list. removeregex Removes all occurrences of the specified regex from a multiValued field. May be specified as a single value, or as a list. inc Increments a numeric value by a specific amount. Must be specified as a single numeric value. In my opinion set is the right direction to look into. Not sure what happens if you use the remove to remove only a single valued field value. Can you explain us what you noticed ? An empty value remain for that field ? It is kind of weird, I would expect the field to become null. Cheers 2015-07-08 10:34 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: In my code when operation is add it works correctly on a multivalue field. But no multivalue field can be deleted with remove operation. The add operation adds a value to a multivaled field. The remove operation removes a value from a multivalued field. If you believe that something is not working, please state clearly why you believe that something is not working. Start by describing the symptom. -- Jack Krupansky On Mon, Jul 6, 2015 at 9:22 PM, Mohsen Saboorian mohs...@gmail.com wrote: I can partially 'add' fields to my Solr index, but 'remove' operation seems not working. I'm on Solr 4.10. Here is my SolrJ snippet: SolrInputDocument doc = new SolrInputDocument(); MapString, Object partialUpdate = new HashMap(); partialUpdate.put(operation, value); // value can be object (string, number, etc) or list. operation can be add, set or remove. doc.addField(id, id); // document id doc.addField(fieldName, partialUpdate); getSolrServer().add(doc, commitWithin); Is there anything wrong with my code? -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Sorting documents by child documents
I would like to get a deep understanding of your problem… How do you want to sort a parent document by a normal field of children ?? Example : Document 1 Id: 5 Children 1 Id:51 Title : A Children 2 Id:52 Title : Z Document 2 Id: 6 Children 1 Id:61 Title : C Children 2 Id:62 Title : B How can you sort the parent based on children fields ? You can sort a parent based on a value calculated out of children fields ( after you calculate an unique value out of them Max ? Sum ? Concat ? ext ext). Can you explain better your problem ? Cheers 2015-07-08 7:17 GMT+01:00 DorZion dorz...@gmail.com: Hey, I'm using Solr 4.10.2 and I have child documents in every parent document. Previously, I used FunctionQuery to sort the documents: http://lucene.472066.n3.nabble.com/Sorting-documents-by-nested-child-docs-with-FunctionQueries-tp4209940.html http://lucene.472066.n3.nabble.com/Sorting-documents-by-nested-child-docs-with-FunctionQueries-tp4209940.html Now, I want to sort the documents by their child documents with normal fields. It doesn't work when I use the sort parameter. Thanks in advance, Dor -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-documents-by-child-documents-tp4216263.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Too many Soft commits and opening searchers realtime
So you are saying that no-one is triggering any commit, and that the auto soft commit solution is not actually waiting the proper time ? I suspect something is not like described, because if the Auto Soft commit was not working I would expect thousands of bugs raised. let's dig a little bit into details… What are you using exactly to index content ? Maybe some commit is actually hidden there :) Cheers 2015-07-08 2:21 GMT+01:00 Summer Shire shiresum...@gmail.com: No the client lets solr handle it. On Jul 7, 2015, at 2:38 PM, Mike Drob mad...@cloudera.com wrote: Are the clients that are posting updates requesting commits? On Tue, Jul 7, 2015 at 4:29 PM, Summer Shire shiresum...@gmail.com wrote: HI All, Can someone help me understand the following behavior. I have the following maxTimes on hard and soft commits yet I see a lot of Opening Searchers in the log org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1656a258 [main] realtime also I see a soft commit happening almost every 30 secs org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} autoCommit maxTime48/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime18/maxTime /autoSoftCommit I tried disabling softCommit by setting maxTime to -1. On startup solrCore recognized it and logged Soft AutoCommit: disabled but I could still see softCommit=true org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} autoSoftCommit maxTime-1/maxTime /autoSoftCommit Thanks, Summer -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
solr 5 and schema.xml
Had a look at previous postings, but am still thoroughly confused. I installed Solr 5 out of the box, built a core and uploaded some documents using dynamic field types. I can see my uploaded docs using the get method. When I query those docs, results seem all over the place. The answer seems to be to alter my schema.xml file, but it doesn't appear to be in conf directory where everyone seems to be directing me to. I've then read that solr 5 doesn't by default use the schema.xml file, but is using a managed schema by default. Apparently, I can't alter the schema.xml file (which I can't find) but now need to use a REST api. However, since I'm using dynamic fields, I'm not sure if this is still necessary. I've hunted high and low for clear documentation on this, but am still confused. I need to build a single index based upon customer data, searching by email address. Any help, or pointing in the right direction to where this is clearly documented would be gratefully received. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-5-and-schema-xml-tp4216290.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Running Solr 5.2.1 on WIndows using NSSM
Answered my own question. :) It seems to work great for me by following this article. http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/ Regards, Adrian -Original Message- From: Adrian Liew [mailto:adrian.l...@avanade.com] Sent: Wednesday, July 8, 2015 4:43 PM To: solr-user@lucene.apache.org Subject: Running Solr 5.2.1 on WIndows using NSSM Hi guys, I am looking to run Apache Solr v5.2.1 on a windows machine. I tried to setup a windows service using NSSM (Non-Sucking-Service-Manager) to install the windows service on the machine pointing to the solr.cmd file path itself and installing the service. After installation, I tried to start the windows service but it gives back an alert message. It says \Windows could not start the SolrService service on Local Computer. The service did not return an error. This could be an internal Windows error or an internal service error. Most of the examples of older Apache Solr uses the java -start start.jar command to run Solr and seem to run okay with nssm. I am not sure if this could be the solr.cmd issue or NSSM's issue. Alternatively, I have tried to use Windows Task Scheduler to configure a task to point to the solr.cmd as well and run task whenever the computer starts (regardless a user is logged in or not). The task scheduler seems to report back 'Task Start Failed' with Level of 'Error'. Additionally, after checking Event Viewer, it returns the error with nssm Failed to open process handle for process with PID 3640 when terminating service Solr Service : The parameter is incorrect. Chances this can point back to the solr.cmd file itself. Thoughts? Regards, Adrian
Re: solr 5 and schema.xml
You have the choice. You can use dynamic schema and control it using API or use classic schema and control it explicitly via schema.xml. You control that when you create the schema by using different templates. It's just the default one is a dynamic schema. Also, dynamic fields is not the same as dynamic schema, but I think you knew that. You can use dynamic fields with either one of them. So, try something like this: bin/solr create_core -c classic_core -d basic_configs Regards, Alex. P.s. You still get some APIs even with classic schema. But that's more for overriding solrconfig.xml settings. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 8 July 2015 at 05:11, spleenboy paul.br...@neilltech.com wrote: Had a look at previous postings, but am still thoroughly confused. I installed Solr 5 out of the box, built a core and uploaded some documents using dynamic field types. I can see my uploaded docs using the get method. When I query those docs, results seem all over the place. The answer seems to be to alter my schema.xml file, but it doesn't appear to be in conf directory where everyone seems to be directing me to. I've then read that solr 5 doesn't by default use the schema.xml file, but is using a managed schema by default. Apparently, I can't alter the schema.xml file (which I can't find) but now need to use a REST api. However, since I'm using dynamic fields, I'm not sure if this is still necessary. I've hunted high and low for clear documentation on this, but am still confused. I need to build a single index based upon customer data, searching by email address. Any help, or pointing in the right direction to where this is clearly documented would be gratefully received. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-5-and-schema-xml-tp4216290.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexed field to schema field
I wish to do it in code so schema browser is lesser of an option. Use case is : I wish to boost particular fields while matching, for that i need to know My field to Solr field mapping. SO that i can put that in the query. Thanks and regards, Gajendra Dadheech On Tue, Jul 7, 2015 at 9:23 PM, Erick Erickson erickerick...@gmail.com wrote: Feels like an XY problem. Why do you want to do this? What's the use-case? Perhaps there's an alternative approach that satisfies the need. Best, Erick On Tue, Jul 7, 2015 at 4:21 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Just an idea, Solr Admin/Schema Browser reports some info like this, hence, you can trace the way in which it does it. On Tue, Jul 7, 2015 at 10:34 AM, Gajendra Dadheech gajju3...@gmail.com wrote: Hi, Can i some how translate fields which i read from newSearcher.getAtomicReader().fields(), to schema fields ? Does solr expose any method to do this translation ? Alternative approach i am thinking will involved lots of regex computation as the fields would be _string, _float etc and i would have to remove those suffixes, this becomes little tricky when fields are dynamic. Thanks and regards, Gajendra Dadheech -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Grouping and recip function not working with Sharding
Hi, I am using sharding (3 shards) with Zookeeper. When I query a collection using *group=truegroup.field=NAMEgroup.ngroups=true* parameters, *ngroups* in response is incorrect. However I am getting correct count in doclist array. Ex: Below response contains 5 groups (Which is correct) but ngroups is 11. { responseHeader:{ status:0, QTime:49, params:{ group.ngroups:true, indent:true, start:0, q:*:*, group.field:NAME, group:true, wt:json, rows:5 } }, grouped:{ NAME:{ matches:18, ngroups:11, groups:[ { groupValue:A-SERIES, doclist:{ numFound:5, start:0, maxScore:1, docs:[ { NAME:A-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:B-SERIES, doclist:{ numFound:5, start:0, docs:[ { NAME:B-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:C-SERIES, doclist:{ numFound:1, start:0, docs:[ { NAME:C-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:D-SERIES, doclist:{ numFound:5, start:0, docs:[ { NAME:D-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:E-SERIES, doclist:{ numFound:3, start:0, maxScore:1, docs:[ { NAME:E-SERIES, _version_:1505559209034383400 } ] } } ] } } } I am facing same problem with Recip function to get latest record on some date field when using sharding. It returns back records in wrong order. Note: Same configuration works fine on single machine without sharding. Please Help me to find solution. Thanks.
Re: Solr 5.2.1 - SolrCloud create collection, core is only loaded after restart
Hi, there was a problem with zookeeper and IPv6 that could be solved by using -Djava.net.preferIPv4Stack=true. Now, the core is correctly created, but I am wondering why I cannot see the core on the web interface, neither on the core admin screen nor in the Core Selector field. Only after restarting solr the core shows up on the web interface. Best Regards, Jens Am 07.07.2015 um 12:49 schrieb Jens Brandt bra...@docoloc.de: Hi Erick, thanks for your reply. after creating the new collection via CollectionAPI I can see in the solr log files that the core was created: Solr index directory '/var/lib/solr/gagel_shard1_replica1/data/index' doesn't exist. Creating new index... However, when calling curl http://solrtest:8080/solr/gagel/query?q=*:*; I get an HTTP 404 error: html head meta http-equiv=Content-Type content=text/html; charset=UTF-8/ titleError 404 Not Found/title /head bodyh2HTTP ERROR 404/h2 pProblem accessing /solr/tubs/query. Reason: preNot Found/pre/phrismallPowered by Jetty:///small/ihr/ /body /html Am 06.07.2015 um 19:51 schrieb Erick Erickson erickerick...@gmail.com: bq: However, the named core is created but not loaded in solr. I'm not quite sure what that means, what is the symptom you see? Because this should be fine. I've sometimes been fooled by looking at the core admin UI screen and not reloading it. What happens if you try querying your new collection directly right after you create it? e.g. http://blah blah/solr/gagel/query?q=*:* You should get back a valid packet. Admittedly with 0 hits, but if the core were truly not loaded you'd get an error. And please, please, please do NOT use the core admin screen to try to add cores in SolrCloud mode. It's possible to use, but you must know _exactly_ what parameters to set or Bad Things Happen. Continue to use the collections API, it's safer. Best, Erick On Mon, Jul 6, 2015 at 8:54 AM, Jens Brandt bra...@docoloc.de wrote: Hi, I am trying to setup SolrCloud with external zookeeper. Solr 5.2.1 is running on host solrtest at port 8080 and zookeeper already contains a config with the name customerSolr. When I create a new collection using the CollectionAPI by calling the following url: http://solrtest:8080/solr/admin/collections?action=CREATEnumShards=1collection.configName=customerSolrname=gagel; I get a positive response and the core name gagel_shard1_replica1 is returned. However, the named core is created but not loaded in solr. When I trie to manually add the core by using the Core Admin webinterface I get the error that the core already exists. After a restart of solr the core is loaded correctly. Can anyone please advise if I am doing something wrong or maybe this is an issue in solr 5.2.1? Best Regards, Jens signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Tlog replay
Hi Summer, If you take a look to the CommitUpdateCommand class, you will notice no Flag is in there. // this is the toString for example @Override public String toString() { return super.toString() + ,optimize=+optimize +,openSearcher=+openSearcher +,waitSearcher=+waitSearcher +,expungeDeletes=+expungeDeletes +,softCommit=+softCommit +,prepareCommit=+prepareCommit +'}'; } If you then access the UpdateCommand object, you find the flag : public static int BUFFERING = 0x0001;// update command is being buffered. public static int REPLAY= 0x0002;// update command is from replaying a log. public static int PEER_SYNC= 0x0004; // update command is a missing update being provided by a peer. public static int IGNORE_AUTOCOMMIT = 0x0008; // this update should not count toward triggering of autocommits. public static int CLEAR_CACHES = 0x0010; // clear caches associated with the update log. used when applying reordered DBQ updates when doing an add. So the flag =2 is actually saying that the update command is from replaying a log ( which is what you would expect) Cheers 2015-07-08 3:01 GMT+01:00 Summer Shire shiresum...@gmail.com: Hi, When I restart my solr core the log replay starts and just before it finishes I see the following commit start commit{flags=2,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} what does the “flags=2” param do ? when I try to send that param to the updateHandler manually solr does not like it curl http://localhost:6600/solr/main/update -H Content-Type: text/xml --data-binary 'commit openSearcher=true flags=2 waitSearcher=false/' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime0/int/lstlst name=errorstr name=msgUnknown commit parameter 'flags'/strint name=code400/int/lst /response thanks, Summer -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Indexed field to schema field
I am really sorry Gajendra, but what do your latex mails mean ? Why classic field boosting is not an option for you ? Are you developing a custom query parser ? What are the parameter expected for this query parser ? What is the behaviour expected ? It is really hard to help with such fragmented information. Cheers 2015-07-08 11:42 GMT+01:00 Gajendra Dadheech gajju3...@gmail.com: At the time of forming this request i am not sure which kind of field that would be. So i read fields in new searcher. Thanks and regards, Gajendra Dadheech On Wed, Jul 8, 2015 at 2:12 PM, Gajendra Dadheech gajju3...@gmail.com wrote: I wish to do it in code so schema browser is lesser of an option. Use case is : I wish to boost particular fields while matching, for that i need to know My field to Solr field mapping. SO that i can put that in the query. Thanks and regards, Gajendra Dadheech On Tue, Jul 7, 2015 at 9:23 PM, Erick Erickson erickerick...@gmail.com wrote: Feels like an XY problem. Why do you want to do this? What's the use-case? Perhaps there's an alternative approach that satisfies the need. Best, Erick On Tue, Jul 7, 2015 at 4:21 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Just an idea, Solr Admin/Schema Browser reports some info like this, hence, you can trace the way in which it does it. On Tue, Jul 7, 2015 at 10:34 AM, Gajendra Dadheech gajju3...@gmail.com wrote: Hi, Can i some how translate fields which i read from newSearcher.getAtomicReader().fields(), to schema fields ? Does solr expose any method to do this translation ? Alternative approach i am thinking will involved lots of regex computation as the fields would be _string, _float etc and i would have to remove those suffixes, this becomes little tricky when fields are dynamic. Thanks and regards, Gajendra Dadheech -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Remove operation of partial update doesn't work
I use add and remove both on a multivalue field (think of tags on a blog post). For this, set null won't work because I want only one value (tag) to be removed , and set null neither remove one nor all of values (all tags here). So I use some S olr J code which would translate to something like this: { id: docId, tagId: {remove: someTagId} } After commit, there is still taId: someTagId in my document. Here is my schema part for tagId: field name= tagId type=int indexed=true stored=true multiValued=true / Thanks, Mohsen On Wed, Jul 8, 2015 at 3:26 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: In this scenarios, Documentation is key : Modifier Usage set Set or replace the field value(s) with the specified value(s), or *remove the values if 'null' or empty list is specified as the new value.* May be specified as a single value, or as a list for multivalued fields add Adds the specified values to a multivalued field. May be specified as a single value, or as a list. remove Removes (all occurrences of) the specified values from a multivalued field. May be specified as a single value, or as a list. removeregex Removes all occurrences of the specified regex from a multiValued field. May be specified as a single value, or as a list. inc Increments a numeric value by a specific amount. Must be specified as a single numeric value. In my opinion set is the right direction to look into. Not sure what happens if you use the remove to remove only a single valued field value. Can you explain us what you noticed ? An empty value remain for that field ? It is kind of weird, I would expect the field to become null. Cheers 2015-07-08 10:34 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: In my code when operation is add it works correctly on a multivalue field. But no multivalue field can be deleted with remove operation. The add operation adds a value to a multivaled field. The remove operation removes a value from a multivalued field. If you believe that something is not working, please state clearly why you believe that something is not working. Start by describing the symptom. -- Jack Krupansky On Mon, Jul 6, 2015 at 9:22 PM, Mohsen Saboorian mohs...@gmail.com wrote: I can partially 'add' fields to my Solr index, but 'remove' operation seems not working. I'm on Solr 4.10. Here is my SolrJ snippet: SolrInputDocument doc = new SolrInputDocument(); MapString, Object partialUpdate = new HashMap(); partialUpdate.put(operation, value); // value can be object (string, number, etc) or list. operation can be add, set or remove. doc.addField(id, id); // document id doc.addField(fieldName, partialUpdate); getSolrServer().add(doc, commitWithin); Is there anything wrong with my code? -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Synonym with Proximity search in solr 5.1.0
Hi, We have a synonym file with below content: 1 2 cell phone ,nokia mobile And we have 3 documents: doc1: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 doc field name=id1001/field field name=nameDoc 1/field field name=textI like nokia mobile /field /doc doc2: doc field name=id1002/field field name=nameDoc 2/field field name=textI cant leave without cell phone /field /doc doc3: doc field name=id1003/field field name=nameDoc 3/field field name=textI work with Nokia inc/field /doc when i search for cell phone, I should get doc1 and doc2 returned but not doc3. The search syntax is : text: cell phone~500 How could i achieve this? Best Regards, Dinesh Naik
Re: Synonym with Proximity search in solr 5.1.0
Showing your debug query would clarify the situation, but I assume you got into a classic multi-word synonym problem[1] . Hope the documents I pointed out are good for you. Cheers [1] http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ [2] http://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ 2015-07-08 15:47 GMT+01:00 dinesh naik dineshkumarn...@gmail.com: Hi, We have a synonym file with below content: 1 2 cell phone ,nokia mobile And we have 3 documents: doc1: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 doc field name=id1001/field field name=nameDoc 1/field field name=textI like nokia mobile /field /doc doc2: doc field name=id1002/field field name=nameDoc 2/field field name=textI cant leave without cell phone /field /doc doc3: doc field name=id1003/field field name=nameDoc 3/field field name=textI work with Nokia inc/field /doc when i search for cell phone, I should get doc1 and doc2 returned but not doc3. The search syntax is : text: cell phone~500 How could i achieve this? Best Regards, Dinesh Naik -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1
Taking a look into the documentation I see this inconsistent orderings in my opinion : *Example:* Concatenate word parts and number parts, but not word and number parts that occur in the same token. analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory catenateWords=1 catenateNumbers=1/ /analyzer *In:* hot-spot 100+42 XL40 *Tokenizer to Filter:* hot-spot(1), 100+42(2), XL40(3) *Out:* hot(1), spot(2), hotspot(2) *(1?)*, 100(3), 42(4), 10042(4) *(2?)*, XL(5)*(3?)*, 40(6)*(4?)* *Example:* Concatenate all. Word and/or number parts are joined together. analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory catenateAll=1/ /analyzer *In:* XL-4000/ES *Tokenizer to Filter:* XL-4000/ES(1) *Out:* XL(1), 4000(2), ES(3), XL4000ES(3)*(1?)* I have not clear why a token generated by a catenation should not occupy the same position of the original one. In your example , I am a little bit surprised of the first results as well : RRR-COLECCION: COLECCIÓN: Gracita Morales foobar Here are the final positions and terms that 4.7.2 yields for this on query analysis: 1 rrr-coleccion 1 rrr 2 coleccion 2 rrrcoleccion *(1) ?* 3 coleccion 4 gracita 5 morales 6 foobar It is not so clear, if the tokens must simply inherit their position from the parent token, or if they must arrange it based on the final list of tokens . 2015-07-08 16:03 GMT+01:00 Shawn Heisey apa...@elyograg.org: On 7/8/2015 8:44 AM, Shawn Heisey wrote: This is what 4.9.1 does with it: 1 rrr-coleccion 2 rrr 2 coleccion 2 rrrcoleccion 3 coleccion 4 gracita 5 morales 6 foobar Followup: This is what Solr 5.2.1 does for query analysis, which also seems wrong, and doesn't match the phrase query: 1 rrr-coleccion 2 coleccion 2 rrr 2 rrrcoleccion 3 coleccion 4 gracita 5 morales 6 bleh The index analysis on 5.2.1 is the same as the other two versions. Thanks, Shawn -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Indexed field to schema field
Sorry,thought this was common problem. Will present with decoration in some time if not able to solve it by then. Thanks and regards, Gajendra Dadheech On Wed, Jul 8, 2015 at 6:23 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I am really sorry Gajendra, but what do your latex mails mean ? Why classic field boosting is not an option for you ? Are you developing a custom query parser ? What are the parameter expected for this query parser ? What is the behaviour expected ? It is really hard to help with such fragmented information. Cheers 2015-07-08 11:42 GMT+01:00 Gajendra Dadheech gajju3...@gmail.com: At the time of forming this request i am not sure which kind of field that would be. So i read fields in new searcher. Thanks and regards, Gajendra Dadheech On Wed, Jul 8, 2015 at 2:12 PM, Gajendra Dadheech gajju3...@gmail.com wrote: I wish to do it in code so schema browser is lesser of an option. Use case is : I wish to boost particular fields while matching, for that i need to know My field to Solr field mapping. SO that i can put that in the query. Thanks and regards, Gajendra Dadheech On Tue, Jul 7, 2015 at 9:23 PM, Erick Erickson erickerick...@gmail.com wrote: Feels like an XY problem. Why do you want to do this? What's the use-case? Perhaps there's an alternative approach that satisfies the need. Best, Erick On Tue, Jul 7, 2015 at 4:21 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Just an idea, Solr Admin/Schema Browser reports some info like this, hence, you can trace the way in which it does it. On Tue, Jul 7, 2015 at 10:34 AM, Gajendra Dadheech gajju3...@gmail.com wrote: Hi, Can i some how translate fields which i read from newSearcher.getAtomicReader().fields(), to schema fields ? Does solr expose any method to do this translation ? Alternative approach i am thinking will involved lots of regex computation as the fields would be _string, _float etc and i would have to remove those suffixes, this becomes little tricky when fields are dynamic. Thanks and regards, Gajendra Dadheech -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1
On 7/8/2015 8:44 AM, Shawn Heisey wrote: This is what 4.9.1 does with it: 1 rrr-coleccion 2 rrr 2 coleccion 2 rrrcoleccion 3 coleccion 4 gracita 5 morales 6 foobar Followup: This is what Solr 5.2.1 does for query analysis, which also seems wrong, and doesn't match the phrase query: 1 rrr-coleccion 2 coleccion 2 rrr 2 rrrcoleccion 3 coleccion 4 gracita 5 morales 6 bleh The index analysis on 5.2.1 is the same as the other two versions. Thanks, Shawn
Re: Search Handler Question
Awesome. This looks like a great resource. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Search-Handler-Question-tp4216341p4216348.html Sent from the Solr - User mailing list archive at Nabble.com.
Search Handler Question
Hello, I've been trying to tune my search handler to get some better search results and I just have like a general question about the search handler. This being the first time I've designed/implemented a search engine I've been told that other engines operate on a kind of layered search. By layered I mean you can 1. Prioritize exact phrasing first 2. Return documents that contain an AND meaning that they just contain both words not necessarily in that order. prioritize these as second. 3. Return documents that hit OR meaning that one of the words appears. and so on... I guess my question is could you do this in Solr with a SINGLE query. Not multiple. I've tested some queries with the + modifier and it seems to only return the documents if it contains both words and no OR's or anything. Which I suppose it should. But could you implement the a layered search handler if you wanted to? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-Handler-Question-tp4216341.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search Handler Question
You are actually describing the Edismax Query parser ( which does what you quoted and even more) : https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser Take a look there, probably with a little tuning this is going to be a good fit for you. If any additional questions come up, just let us know, Cheers 2015-07-08 15:26 GMT+01:00 Paden rumsey...@gmail.com: Hello, I've been trying to tune my search handler to get some better search results and I just have like a general question about the search handler. This being the first time I've designed/implemented a search engine I've been told that other engines operate on a kind of layered search. By layered I mean you can 1. Prioritize exact phrasing first 2. Return documents that contain an AND meaning that they just contain both words not necessarily in that order. prioritize these as second. 3. Return documents that hit OR meaning that one of the words appears. and so on... I guess my question is could you do this in Solr with a SINGLE query. Not multiple. I've tested some queries with the + modifier and it seems to only return the documents if it contains both words and no OR's or anything. Which I suppose it should. But could you implement the a layered search handler if you wanted to? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-Handler-Question-tp4216341.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Remove operation of partial update doesn't work
I’d like to unsubscribe please. On Jul 8, 2015, at 11:01 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I just tried on my own, and it is working perfectly. Stupid question, have you committed after your update? Cheers 2015-07-08 15:41 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: I use add and remove both on a multivalue field (think of tags on a blog post). For this, set null won't work because I want only one value (tag) to be removed , and set null neither remove one nor all of values (all tags here). So I use some S olr J code which would translate to something like this: { id: docId, tagId: {remove: someTagId} } After commit, there is still taId: someTagId in my document. Here is my schema part for tagId: field name= tagId type=int indexed=true stored=true multiValued=true / Thanks, Mohsen On Wed, Jul 8, 2015 at 3:26 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: In this scenarios, Documentation is key : Modifier Usage set Set or replace the field value(s) with the specified value(s), or *remove the values if 'null' or empty list is specified as the new value.* May be specified as a single value, or as a list for multivalued fields add Adds the specified values to a multivalued field. May be specified as a single value, or as a list. remove Removes (all occurrences of) the specified values from a multivalued field. May be specified as a single value, or as a list. removeregex Removes all occurrences of the specified regex from a multiValued field. May be specified as a single value, or as a list. inc Increments a numeric value by a specific amount. Must be specified as a single numeric value. In my opinion set is the right direction to look into. Not sure what happens if you use the remove to remove only a single valued field value. Can you explain us what you noticed ? An empty value remain for that field ? It is kind of weird, I would expect the field to become null. Cheers 2015-07-08 10:34 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: In my code when operation is add it works correctly on a multivalue field. But no multivalue field can be deleted with remove operation. The add operation adds a value to a multivaled field. The remove operation removes a value from a multivalued field. If you believe that something is not working, please state clearly why you believe that something is not working. Start by describing the symptom. -- Jack Krupansky On Mon, Jul 6, 2015 at 9:22 PM, Mohsen Saboorian mohs...@gmail.com wrote: I can partially 'add' fields to my Solr index, but 'remove' operation seems not working. I'm on Solr 4.10. Here is my SolrJ snippet: SolrInputDocument doc = new SolrInputDocument(); MapString, Object partialUpdate = new HashMap(); partialUpdate.put(operation, value); // value can be object (string, number, etc) or list. operation can be add, set or remove. doc.addField(id, id); // document id doc.addField(fieldName, partialUpdate); getSolrServer().add(doc, commitWithin); Is there anything wrong with my code? -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1
I'm not sure if this is a bug, but it does break searches that work fine in 4.7.2if we put the same config and index on 4.9.1. Here's a slightly redacted bit of text that's been sent to the index, and is also used as a phrase query: RRR-COLECCION: COLECCIÓN: Gracita Morales foobar Here are the final positions and terms that 4.7.2 yields for this on query analysis: 1 rrr-coleccion 1 rrr 2 coleccion 2 rrrcoleccion 3 coleccion 4 gracita 5 morales 6 foobar This is what 4.9.1 does with it: 1 rrr-coleccion 2 rrr 2 coleccion 2 rrrcoleccion 3 coleccion 4 gracita 5 morales 6 foobar In both versions, this is what the index analysis generates: 1 rrr 2 coleccion 3 coleccion 4 gracita 5 morales 6 bleh Remember that it's a phrase query. As you can see, only the query analysis from 4.7.2 matches. I'm not an expert, but the 4.9.1 WDF position output seems wrong. The difference in these positions happens on the WordDelimiterFilter step. I going to try my fieldType on the 5.2.1 to example to see what it does, see if maybe the problem has already been fixed. Unfortunately, due to a third-party component that has not been tested with anything newer, I cannot upgrade beyond 4.9.1 at this time. This is the fieldType present in both versions. The 4.7 config has a luceneMatchVersion of LUCENE_47, the 4.9.1 has LUCENE_4_9. fieldType name=genText class=solr.TextField sortMissingLast=true positionIncrementGap=100 analyzer type=index tokenizer class=solr.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / filter class=solr.ICUFoldingFilterFactory/ filter class=solr.CJKBigramFilterFactory outputUnigrams=true/ filter class=solr.LengthFilterFactory min=1 max=512/ /analyzer analyzer type=query tokenizer class=solr.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 / filter class=solr.ICUFoldingFilterFactory/ filter class=solr.CJKBigramFilterFactory outputUnigrams=false/ filter class=solr.LengthFilterFactory min=1 max=512/ /analyzer /fieldType Thanks, Shawn
Re: Remove operation of partial update doesn't work
I just tried on my own, and it is working perfectly. Stupid question, have you committed after your update? Cheers 2015-07-08 15:41 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: I use add and remove both on a multivalue field (think of tags on a blog post). For this, set null won't work because I want only one value (tag) to be removed , and set null neither remove one nor all of values (all tags here). So I use some S olr J code which would translate to something like this: { id: docId, tagId: {remove: someTagId} } After commit, there is still taId: someTagId in my document. Here is my schema part for tagId: field name= tagId type=int indexed=true stored=true multiValued=true / Thanks, Mohsen On Wed, Jul 8, 2015 at 3:26 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: In this scenarios, Documentation is key : Modifier Usage set Set or replace the field value(s) with the specified value(s), or *remove the values if 'null' or empty list is specified as the new value.* May be specified as a single value, or as a list for multivalued fields add Adds the specified values to a multivalued field. May be specified as a single value, or as a list. remove Removes (all occurrences of) the specified values from a multivalued field. May be specified as a single value, or as a list. removeregex Removes all occurrences of the specified regex from a multiValued field. May be specified as a single value, or as a list. inc Increments a numeric value by a specific amount. Must be specified as a single numeric value. In my opinion set is the right direction to look into. Not sure what happens if you use the remove to remove only a single valued field value. Can you explain us what you noticed ? An empty value remain for that field ? It is kind of weird, I would expect the field to become null. Cheers 2015-07-08 10:34 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: In my code when operation is add it works correctly on a multivalue field. But no multivalue field can be deleted with remove operation. The add operation adds a value to a multivaled field. The remove operation removes a value from a multivalued field. If you believe that something is not working, please state clearly why you believe that something is not working. Start by describing the symptom. -- Jack Krupansky On Mon, Jul 6, 2015 at 9:22 PM, Mohsen Saboorian mohs...@gmail.com wrote: I can partially 'add' fields to my Solr index, but 'remove' operation seems not working. I'm on Solr 4.10. Here is my SolrJ snippet: SolrInputDocument doc = new SolrInputDocument(); MapString, Object partialUpdate = new HashMap(); partialUpdate.put(operation, value); // value can be object (string, number, etc) or list. operation can be add, set or remove. doc.addField(id, id); // document id doc.addField(fieldName, partialUpdate); getSolrServer().add(doc, commitWithin); Is there anything wrong with my code? -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
unexpected hl.fragsize behavior
I'm seeing strange hl.fragsize behavior in the version of Solr 4.6.0, the version I happen to be using. I've been testing with this mp500.xml file... http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_6_0/solr/example/exampledocs/mp500.xml?view=markup ... using the query q=indication and I get some highlights: ``` $ curl -s http://localhost:8983/solr/collection1/select?wt=jsonindent=truehl=truehl.fl=*q=indication; | jq '.highlighting' { MA147LL/A: { features: [ , Battery level emindication/em ] } } ``` Great! I got a highlight snippet back! But what if I start playing with fragsize? According to https://wiki.apache.org/solr/HighlightingParameters#hl.fragsize , fragsize=0 should give me the whole field value should be used with no fragmenting. And it does: ``` $ curl -s http://localhost:8983/solr/collection1/select?wt=jsonindent=truehl=truehl.fl=*q=indicationhl.fragsize=0; | jq '.highlighting' { MA147LL/A: { features: [ Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level emindication/em ] } } ``` As the docs indicate, fragsize=100 is the default and gives me the same results as we saw above when we left out fragsize: ``` $ curl -s http://localhost:8983/solr/collection1/select?wt=jsonindent=truehl=truehl.fl=*q=indicationhl.fragsize=100; | jq '.highlighting' { MA147LL/A: { features: [ , Battery level emindication/em ] } } ``` But wait a minute... fragsize is defined as the size, in characters, of the snippets (aka fragments) created by the highlighter. Is that really 100 characters? More like 27 if I strip out the HTML tags: ``` $ echo -n , Battery level emindication/em | awk '{gsub([^]*, )}1' , Battery level indication $ echo -n , Battery level emindication/em | awk '{gsub([^]*, )}1' | wc -c 27 ``` So that's weird. I ask for 100 characters but only get 27? Let's try asking for 110 characters: ``` $ curl -s http://localhost:8983/solr/collection1/select?wt=jsonindent=truehl=truehl.fl=*q=indicationhl.fragsize=110; | jq '.highlighting' { MA147LL/A: { features: [ , Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level emindication/em ] } } ``` That's better. With fragsize=110 we got back a snippet of 121 characters that time. But why did we only get back 27 characters from fragsize=100? Here's something else that's strange. With fragsize=120 I get back *fewer* characters than fragsize=110. Only 108 characters back rather than 121: ``` $ curl -s http://localhost:8983/solr/collection1/select?wt=jsonindent=truehl=truehl.fl=*q=indicationhl.fragsize=120; | jq '.highlighting' { MA147LL/A: { features: [ firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level emindication/em ] } } ``` As I increase the fragsize shouldn't I get *more* characters back? And again, why do I only get 27 characters back from fragsize=100? I'm concerned about this because my fix for https://github.com/IQSS/dataverse/issues/2191 is to make fragsize configurable, but I'm getting such unexpected results playing with different fragsize values I'm losing faith in it. We use highlighting heavily to indicate where in the document a query matched. To be clear, I haven't lost faith in Solr itself. It's a great project. I'm just trying to understand what's going on above. Any advice is welcome! Phil p.s. In case it's more readable, I also posted this (long) email as a gist: https://gist.github.com/pdurbin/1a7b55e5714b7424fa94 -- Philip Durbin Software Developer for http://dataverse.org http://www.iq.harvard.edu/people/philip-durbin
Re: Solr 5.2.1 - SolrCloud create collection, core is only loaded after restart
My _guess_ is that you're getting a cached page somehow and never getting to Solr at all when you don't see the new core. What happens if you look at the admin UI from another machine? Or perhaps a different browser? If you tail the Solr log when you are looking you should see the request when you try to see the new core, if you don't see a request come through then it's a caching issue... Or perhaps try issuing a core admin STATUS command? http://solr:port/solr/admin/cores?action=STATUS Best, Erick On Wed, Jul 8, 2015 at 6:14 AM, Jens Brandt bra...@docoloc.de wrote: Hi, there was a problem with zookeeper and IPv6 that could be solved by using -Djava.net.preferIPv4Stack=true. Now, the core is correctly created, but I am wondering why I cannot see the core on the web interface, neither on the core admin screen nor in the Core Selector field. Only after restarting solr the core shows up on the web interface. Best Regards, Jens Am 07.07.2015 um 12:49 schrieb Jens Brandt bra...@docoloc.de: Hi Erick, thanks for your reply. after creating the new collection via CollectionAPI I can see in the solr log files that the core was created: Solr index directory '/var/lib/solr/gagel_shard1_replica1/data/index' doesn't exist. Creating new index... However, when calling curl http://solrtest:8080/solr/gagel/query?q=*:*; I get an HTTP 404 error: html head meta http-equiv=Content-Type content=text/html; charset=UTF-8/ titleError 404 Not Found/title /head bodyh2HTTP ERROR 404/h2 pProblem accessing /solr/tubs/query. Reason: preNot Found/pre/phrismallPowered by Jetty:///small/ihr/ /body /html Am 06.07.2015 um 19:51 schrieb Erick Erickson erickerick...@gmail.com: bq: However, the named core is created but not loaded in solr. I'm not quite sure what that means, what is the symptom you see? Because this should be fine. I've sometimes been fooled by looking at the core admin UI screen and not reloading it. What happens if you try querying your new collection directly right after you create it? e.g. http://blah blah/solr/gagel/query?q=*:* You should get back a valid packet. Admittedly with 0 hits, but if the core were truly not loaded you'd get an error. And please, please, please do NOT use the core admin screen to try to add cores in SolrCloud mode. It's possible to use, but you must know _exactly_ what parameters to set or Bad Things Happen. Continue to use the collections API, it's safer. Best, Erick On Mon, Jul 6, 2015 at 8:54 AM, Jens Brandt bra...@docoloc.de wrote: Hi, I am trying to setup SolrCloud with external zookeeper. Solr 5.2.1 is running on host solrtest at port 8080 and zookeeper already contains a config with the name customerSolr. When I create a new collection using the CollectionAPI by calling the following url: http://solrtest:8080/solr/admin/collections?action=CREATEnumShards=1collection.configName=customerSolrname=gagel; I get a positive response and the core name gagel_shard1_replica1 is returned. However, the named core is created but not loaded in solr. When I trie to manually add the core by using the Core Admin webinterface I get the error that the core already exists. After a restart of solr the core is loaded correctly. Can anyone please advise if I am doing something wrong or maybe this is an issue in solr 5.2.1? Best Regards, Jens
Re: Remove operation of partial update doesn't work
Won June Tai: Please follow the instructions here: http://lucene.apache.org/solr/resources.html search for unsubscribe. You must use the _exact_ e-mail you used to subscribe. Also see the problems link if it doesn't work the first time. Best, Erick On Wed, Jul 8, 2015 at 8:03 AM, Won June Tai wonjune@gmail.com wrote: I’d like to unsubscribe please. On Jul 8, 2015, at 11:01 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I just tried on my own, and it is working perfectly. Stupid question, have you committed after your update? Cheers 2015-07-08 15:41 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: I use add and remove both on a multivalue field (think of tags on a blog post). For this, set null won't work because I want only one value (tag) to be removed , and set null neither remove one nor all of values (all tags here). So I use some S olr J code which would translate to something like this: { id: docId, tagId: {remove: someTagId} } After commit, there is still taId: someTagId in my document. Here is my schema part for tagId: field name= tagId type=int indexed=true stored=true multiValued=true / Thanks, Mohsen On Wed, Jul 8, 2015 at 3:26 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: In this scenarios, Documentation is key : Modifier Usage set Set or replace the field value(s) with the specified value(s), or *remove the values if 'null' or empty list is specified as the new value.* May be specified as a single value, or as a list for multivalued fields add Adds the specified values to a multivalued field. May be specified as a single value, or as a list. remove Removes (all occurrences of) the specified values from a multivalued field. May be specified as a single value, or as a list. removeregex Removes all occurrences of the specified regex from a multiValued field. May be specified as a single value, or as a list. inc Increments a numeric value by a specific amount. Must be specified as a single numeric value. In my opinion set is the right direction to look into. Not sure what happens if you use the remove to remove only a single valued field value. Can you explain us what you noticed ? An empty value remain for that field ? It is kind of weird, I would expect the field to become null. Cheers 2015-07-08 10:34 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: In my code when operation is add it works correctly on a multivalue field. But no multivalue field can be deleted with remove operation. The add operation adds a value to a multivaled field. The remove operation removes a value from a multivalued field. If you believe that something is not working, please state clearly why you believe that something is not working. Start by describing the symptom. -- Jack Krupansky On Mon, Jul 6, 2015 at 9:22 PM, Mohsen Saboorian mohs...@gmail.com wrote: I can partially 'add' fields to my Solr index, but 'remove' operation seems not working. I'm on Solr 4.10. Here is my SolrJ snippet: SolrInputDocument doc = new SolrInputDocument(); MapString, Object partialUpdate = new HashMap(); partialUpdate.put(operation, value); // value can be object (string, number, etc) or list. operation can be add, set or remove. doc.addField(id, id); // document id doc.addField(fieldName, partialUpdate); getSolrServer().add(doc, commitWithin); Is there anything wrong with my code? -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Synonym with Proximity search in solr 5.1.0
Hi Alessandro, I have gone through the above suggested links, but i am not able to achieve the above expected result. The issue here is , my searched text is a part of field 'text' . field name=textI like nokia mobile /field searched text: nokia mobile~500. Best Regards, Dinesh Naik On Wed, Jul 8, 2015 at 8:36 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Showing your debug query would clarify the situation, but I assume you got into a classic multi-word synonym problem[1] . Hope the documents I pointed out are good for you. Cheers [1] http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ [2] http://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ 2015-07-08 15:47 GMT+01:00 dinesh naik dineshkumarn...@gmail.com: Hi, We have a synonym file with below content: 1 2 cell phone ,nokia mobile And we have 3 documents: doc1: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 doc field name=id1001/field field name=nameDoc 1/field field name=textI like nokia mobile /field /doc doc2: doc field name=id1002/field field name=nameDoc 2/field field name=textI cant leave without cell phone /field /doc doc3: doc field name=id1003/field field name=nameDoc 3/field field name=textI work with Nokia inc/field /doc when i search for cell phone, I should get doc1 and doc2 returned but not doc3. The search syntax is : text: cell phone~500 How could i achieve this? Best Regards, Dinesh Naik -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- Best Regards, Dinesh Naik
Can I instruct the Tika Entity Processor to skip the first page using the DIH?
Hello, I'm using the DIH to import some files from one of my local directories. However, every single one of these files has the same first page. So I want to skip that first page in order to optimize search. Can this be accomplished by an instruction within the dataimporthandler or, if not, how could you do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-instruct-the-Tika-Entity-Processor-to-skip-the-first-page-using-the-DIH-tp4216373.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 5 and schema.xml
bq: I've then read that solr 5 doesn't by default use the schema.xml file, but is using a managed schema by default. Apparently, I can't alter the schema.xml file (which I can't find) but now need to use a REST api. However, since I'm using dynamic fields, I'm not sure if this is still necessary. Not at all, although it _is_ confusing. There's 1 classic, i.e. non-cloud Solr 2 SolrCloud and the cross product of 1 standard (there should be a conf/schema.xml file to edit) 2 schemaless 3 managed schema So you have six possible configurations, although whether you're running in cloud mode or not the schemaless and managed schemas are used identically. So, which ones are you interested in? If you're running in SolrCloud mode, you won't find any conf directory to edit, the files are stored in Zookeeper and you must use the zkcli script with the upconfig command to change them. Although for development there's a sweet little IntelliJ plugin that lets you edit them directly... Best, Erick On Wed, Jul 8, 2015 at 3:31 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: You have the choice. You can use dynamic schema and control it using API or use classic schema and control it explicitly via schema.xml. You control that when you create the schema by using different templates. It's just the default one is a dynamic schema. Also, dynamic fields is not the same as dynamic schema, but I think you knew that. You can use dynamic fields with either one of them. So, try something like this: bin/solr create_core -c classic_core -d basic_configs Regards, Alex. P.s. You still get some APIs even with classic schema. But that's more for overriding solrconfig.xml settings. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 8 July 2015 at 05:11, spleenboy paul.br...@neilltech.com wrote: Had a look at previous postings, but am still thoroughly confused. I installed Solr 5 out of the box, built a core and uploaded some documents using dynamic field types. I can see my uploaded docs using the get method. When I query those docs, results seem all over the place. The answer seems to be to alter my schema.xml file, but it doesn't appear to be in conf directory where everyone seems to be directing me to. I've then read that solr 5 doesn't by default use the schema.xml file, but is using a managed schema by default. Apparently, I can't alter the schema.xml file (which I can't find) but now need to use a REST api. However, since I'm using dynamic fields, I'm not sure if this is still necessary. I've hunted high and low for clear documentation on this, but am still confused. I need to build a single index based upon customer data, searching by email address. Any help, or pointing in the right direction to where this is clearly documented would be gratefully received. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-5-and-schema-xml-tp4216290.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Remove operation of partial update doesn't work
Can you post your solrj code? در تاریخ 8 ژوئیهٔ 2015 19:32، Alessandro Benedetti benedetti.ale...@gmail.com نوشت: I just tried on my own, and it is working perfectly. Stupid question, have you committed after your update? Cheers 2015-07-08 15:41 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: I use add and remove both on a multivalue field (think of tags on a blog post). For this, set null won't work because I want only one value (tag) to be removed , and set null neither remove one nor all of values (all tags here). So I use some S olr J code which would translate to something like this: { id: docId, tagId: {remove: someTagId} } After commit, there is still taId: someTagId in my document. Here is my schema part for tagId: field name= tagId type=int indexed=true stored=true multiValued=true / Thanks, Mohsen On Wed, Jul 8, 2015 at 3:26 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: In this scenarios, Documentation is key : Modifier Usage set Set or replace the field value(s) with the specified value(s), or *remove the values if 'null' or empty list is specified as the new value.* May be specified as a single value, or as a list for multivalued fields add Adds the specified values to a multivalued field. May be specified as a single value, or as a list. remove Removes (all occurrences of) the specified values from a multivalued field. May be specified as a single value, or as a list. removeregex Removes all occurrences of the specified regex from a multiValued field. May be specified as a single value, or as a list. inc Increments a numeric value by a specific amount. Must be specified as a single numeric value. In my opinion set is the right direction to look into. Not sure what happens if you use the remove to remove only a single valued field value. Can you explain us what you noticed ? An empty value remain for that field ? It is kind of weird, I would expect the field to become null. Cheers 2015-07-08 10:34 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: In my code when operation is add it works correctly on a multivalue field. But no multivalue field can be deleted with remove operation. The add operation adds a value to a multivaled field. The remove operation removes a value from a multivalued field. If you believe that something is not working, please state clearly why you believe that something is not working. Start by describing the symptom. -- Jack Krupansky On Mon, Jul 6, 2015 at 9:22 PM, Mohsen Saboorian mohs...@gmail.com wrote: I can partially 'add' fields to my Solr index, but 'remove' operation seems not working. I'm on Solr 4.10. Here is my SolrJ snippet: SolrInputDocument doc = new SolrInputDocument(); MapString, Object partialUpdate = new HashMap(); partialUpdate.put(operation, value); // value can be object (string, number, etc) or list. operation can be add, set or remove. doc.addField(id, id); // document id doc.addField(fieldName, partialUpdate); getSolrServer().add(doc, commitWithin); Is there anything wrong with my code? -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Synonym with Proximity search in solr 5.1.0
What do you mean ? Have you used the implemented plugins already ? Can you show us the debugged query please ? Cheers 2015-07-08 16:48 GMT+01:00 dinesh naik dineshkumarn...@gmail.com: Hi Alessandro, I have gone through the above suggested links, but i am not able to achieve the above expected result. The issue here is , my searched text is a part of field 'text' . field name=textI like nokia mobile /field searched text: nokia mobile~500. Best Regards, Dinesh Naik On Wed, Jul 8, 2015 at 8:36 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Showing your debug query would clarify the situation, but I assume you got into a classic multi-word synonym problem[1] . Hope the documents I pointed out are good for you. Cheers [1] http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ [2] http://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ 2015-07-08 15:47 GMT+01:00 dinesh naik dineshkumarn...@gmail.com: Hi, We have a synonym file with below content: 1 2 cell phone ,nokia mobile And we have 3 documents: doc1: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 doc field name=id1001/field field name=nameDoc 1/field field name=textI like nokia mobile /field /doc doc2: doc field name=id1002/field field name=nameDoc 2/field field name=textI cant leave without cell phone /field /doc doc3: doc field name=id1003/field field name=nameDoc 3/field field name=textI work with Nokia inc/field /doc when i search for cell phone, I should get doc1 and doc2 returned but not doc3. The search syntax is : text: cell phone~500 How could i achieve this? Best Regards, Dinesh Naik -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- Best Regards, Dinesh Naik -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Grouping and recip function not working with Sharding
From the reference guide: group.ngroups and group.facet require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. Document routing via composite keys can be a useful solution in many situations. It's not clear what you think the prolbem here is. You say: bq: Ex: Below response contains 5 groups (Which is correct) but ngroups is 11. But you have rows set to 5 so? As far as your sorting issue, again an example showing what you think is wrong would be very helpful. Best, Erick On Wed, Jul 8, 2015 at 6:38 AM, Pankaj Sonawane pankaj4sonaw...@gmail.com wrote: Hi, I am using sharding (3 shards) with Zookeeper. When I query a collection using *group=truegroup.field=NAMEgroup.ngroups=true* parameters, *ngroups* in response is incorrect. However I am getting correct count in doclist array. Ex: Below response contains 5 groups (Which is correct) but ngroups is 11. { responseHeader:{ status:0, QTime:49, params:{ group.ngroups:true, indent:true, start:0, q:*:*, group.field:NAME, group:true, wt:json, rows:5 } }, grouped:{ NAME:{ matches:18, ngroups:11, groups:[ { groupValue:A-SERIES, doclist:{ numFound:5, start:0, maxScore:1, docs:[ { NAME:A-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:B-SERIES, doclist:{ numFound:5, start:0, docs:[ { NAME:B-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:C-SERIES, doclist:{ numFound:1, start:0, docs:[ { NAME:C-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:D-SERIES, doclist:{ numFound:5, start:0, docs:[ { NAME:D-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:E-SERIES, doclist:{ numFound:3, start:0, maxScore:1, docs:[ { NAME:E-SERIES, _version_:1505559209034383400 } ] } } ] } } } I am facing same problem with Recip function to get latest record on some date field when using sharding. It returns back records in wrong order. Note: Same configuration works fine on single machine without sharding. Please Help me to find solution. Thanks.
Re: Remove operation of partial update doesn't work
Yes I did. I use commitWithin to commit after a fixed timeout. Moreover my add operation works! در تاریخ 8 ژوئیهٔ 2015 19:32، Alessandro Benedetti benedetti.ale...@gmail.com نوشت: I just tried on my own, and it is working perfectly. Stupid question, have you committed after your update? Cheers 2015-07-08 15:41 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: I use add and remove both on a multivalue field (think of tags on a blog post). For this, set null won't work because I want only one value (tag) to be removed , and set null neither remove one nor all of values (all tags here). So I use some S olr J code which would translate to something like this: { id: docId, tagId: {remove: someTagId} } After commit, there is still taId: someTagId in my document. Here is my schema part for tagId: field name= tagId type=int indexed=true stored=true multiValued=true / Thanks, Mohsen On Wed, Jul 8, 2015 at 3:26 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: In this scenarios, Documentation is key : Modifier Usage set Set or replace the field value(s) with the specified value(s), or *remove the values if 'null' or empty list is specified as the new value.* May be specified as a single value, or as a list for multivalued fields add Adds the specified values to a multivalued field. May be specified as a single value, or as a list. remove Removes (all occurrences of) the specified values from a multivalued field. May be specified as a single value, or as a list. removeregex Removes all occurrences of the specified regex from a multiValued field. May be specified as a single value, or as a list. inc Increments a numeric value by a specific amount. Must be specified as a single numeric value. In my opinion set is the right direction to look into. Not sure what happens if you use the remove to remove only a single valued field value. Can you explain us what you noticed ? An empty value remain for that field ? It is kind of weird, I would expect the field to become null. Cheers 2015-07-08 10:34 GMT+01:00 Mohsen Saboorian mohs...@gmail.com: In my code when operation is add it works correctly on a multivalue field. But no multivalue field can be deleted with remove operation. The add operation adds a value to a multivaled field. The remove operation removes a value from a multivalued field. If you believe that something is not working, please state clearly why you believe that something is not working. Start by describing the symptom. -- Jack Krupansky On Mon, Jul 6, 2015 at 9:22 PM, Mohsen Saboorian mohs...@gmail.com wrote: I can partially 'add' fields to my Solr index, but 'remove' operation seems not working. I'm on Solr 4.10. Here is my SolrJ snippet: SolrInputDocument doc = new SolrInputDocument(); MapString, Object partialUpdate = new HashMap(); partialUpdate.put(operation, value); // value can be object (string, number, etc) or list. operation can be add, set or remove. doc.addField(id, id); // document id doc.addField(fieldName, partialUpdate); getSolrServer().add(doc, commitWithin); Is there anything wrong with my code? -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Tlog replay
Thanks Alessandro ! Any idea on why I couldn't curl the solr core and pass the flag param ? On Jul 8, 2015, at 7:12 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi Summer, If you take a look to the CommitUpdateCommand class, you will notice no Flag is in there. // this is the toString for example @Override public String toString() { return super.toString() + ,optimize=+optimize +,openSearcher=+openSearcher +,waitSearcher=+waitSearcher +,expungeDeletes=+expungeDeletes +,softCommit=+softCommit +,prepareCommit=+prepareCommit +'}'; } If you then access the UpdateCommand object, you find the flag : public static int BUFFERING = 0x0001;// update command is being buffered. public static int REPLAY= 0x0002;// update command is from replaying a log. public static int PEER_SYNC= 0x0004; // update command is a missing update being provided by a peer. public static int IGNORE_AUTOCOMMIT = 0x0008; // this update should not count toward triggering of autocommits. public static int CLEAR_CACHES = 0x0010; // clear caches associated with the update log. used when applying reordered DBQ updates when doing an add. So the flag =2 is actually saying that the update command is from replaying a log ( which is what you would expect) Cheers 2015-07-08 3:01 GMT+01:00 Summer Shire shiresum...@gmail.com: Hi, When I restart my solr core the log replay starts and just before it finishes I see the following commit start commit{flags=2,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} what does the “flags=2” param do ? when I try to send that param to the updateHandler manually solr does not like it curl http://localhost:6600/solr/main/update -H Content-Type: text/xml --data-binary 'commit openSearcher=true flags=2 waitSearcher=false/' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime0/int/lstlst name=errorstr name=msgUnknown commit parameter 'flags'/strint name=code400/int/lst /response thanks, Summer -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Running Solr 5.2.1 on WIndows using NSSM
Hi guys, I am looking to run Apache Solr v5.2.1 on a windows machine. I tried to setup a windows service using NSSM (Non-Sucking-Service-Manager) to install the windows service on the machine pointing to the solr.cmd file path itself and installing the service. After installation, I tried to start the windows service but it gives back an alert message. It says \Windows could not start the SolrService service on Local Computer. The service did not return an error. This could be an internal Windows error or an internal service error. Most of the examples of older Apache Solr uses the java -start start.jar command to run Solr and seem to run okay with nssm. I am not sure if this could be the solr.cmd issue or NSSM's issue. Alternatively, I have tried to use Windows Task Scheduler to configure a task to point to the solr.cmd as well and run task whenever the computer starts (regardless a user is logged in or not). The task scheduler seems to report back 'Task Start Failed' with Level of 'Error'. Additionally, after checking Event Viewer, it returns the error with nssm Failed to open process handle for process with PID 3640 when terminating service Solr Service : The parameter is incorrect. Chances this can point back to the solr.cmd file itself. Thoughts? Regards, Adrian
Re: Indexed field to schema field
At the time of forming this request i am not sure which kind of field that would be. So i read fields in new searcher. Thanks and regards, Gajendra Dadheech On Wed, Jul 8, 2015 at 2:12 PM, Gajendra Dadheech gajju3...@gmail.com wrote: I wish to do it in code so schema browser is lesser of an option. Use case is : I wish to boost particular fields while matching, for that i need to know My field to Solr field mapping. SO that i can put that in the query. Thanks and regards, Gajendra Dadheech On Tue, Jul 7, 2015 at 9:23 PM, Erick Erickson erickerick...@gmail.com wrote: Feels like an XY problem. Why do you want to do this? What's the use-case? Perhaps there's an alternative approach that satisfies the need. Best, Erick On Tue, Jul 7, 2015 at 4:21 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Just an idea, Solr Admin/Schema Browser reports some info like this, hence, you can trace the way in which it does it. On Tue, Jul 7, 2015 at 10:34 AM, Gajendra Dadheech gajju3...@gmail.com wrote: Hi, Can i some how translate fields which i read from newSearcher.getAtomicReader().fields(), to schema fields ? Does solr expose any method to do this translation ? Alternative approach i am thinking will involved lots of regex computation as the fields would be _string, _float etc and i would have to remove those suffixes, this becomes little tricky when fields are dynamic. Thanks and regards, Gajendra Dadheech -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Adding field to query result
Maya - where’s the variable come from? You can compute a “pseudo-field”, something like this: $ bin/solr create -c test $ bin/post -c test -type text/csv -out yes -d $'id,type,price_td\n1,Toys,55.00’ $ open http://localhost:8983/solr/test/select?q=*:*wt=xmlfl=id,type,price_td,sale_price:product(price_td,0.9) Note that I used price_td, otherwise the field type guessing will make it multivalued and not suitable for functions like that. Does that help? Or maybe you’re interested in something like Solr’s ExternalFileField? — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com On Jul 8, 2015, at 2:03 PM, Maya G maiki...@gmail.com wrote: Hello, I'm using solr 4.10. I'd like to know if it is possible to add a field only on query response and calculate its' value for the specific query. For example: Assume this is the document. doc Id1/Id TypeToys/Type Price55/Price /doc I would like the response to contain another field which its' value to be calculates from the value of 'price' and a given variable. The field's value can be change from query to query and shouldn't be indexed. Is it possible to run a query and create a new field 'price_sale' on runtime? Thanks in advance, Maya Is there a way to do this on solr? -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-field-to-query-result-tp4216396.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Encoding Issue?
Attachments are pretty aggressively stripped by the e-mail server, so there's nothing to see, you'll have to paste it somewhere else and provide a link. Usually, though, this is a character set issue with the browser using a different charset than Solr, it's really the same character, just displayed differently. Shot in the dark though. Erick On Wed, Jul 8, 2015 at 10:49 AM, Tarala, Magesh mtar...@bh.com wrote: I’m ingesting a .TXT file with HTML content into Solr. The content has the following character highlighted below: The file we get from CRM (also attached): [image: cid:image001.png@01D0B972.75BE23F0] After ingesting into solr, I see a different character. This is query response from solr management console. [image: cid:image003.png@01D0B972.D1AED290] Anybody know how I can prevent this from happening? Thanks!
RE: Solr Encoding Issue?
Looks like images did not come through. Here's the text... I'm ingesting a .TXT file with HTML content into Solr. The content has the following character highlighted below: The file we get from CRM (also attached): td align=center style=text-align:center;font size=3span style=font-size:12pt;bEnter Data in TK Only/bfont face=Wingdingsbà/b/font/span/font/td After ingesting into solr, I see a different character. This is query response from solr management console. td align=\center\ style=\text-align:center;\font size=\3\span style=\font-size:12pt;\bEnter Data in TK Only/bfont face=\Wingdings\bà /b/font/span/font/td I'm expecting to see bà/b But I'm seeing bà /b Anybody know how I can prevent this from happening? Thanks!
Adding field to query result
Hello, I'm using solr 4.10. I'd like to know if it is possible to add a field only on query response and calculate its' value for the specific query. For example: Assume this is the document. doc Id1/Id TypeToys/Type Price55/Price /doc I would like the response to contain another field which its' value to be calculates from the value of 'price' and a given variable. The field's value can be change from query to query and shouldn't be indexed. Is it possible to run a query and create a new field 'price_sale' on runtime? Thanks in advance, Maya Is there a way to do this on solr? -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-field-to-query-result-tp4216396.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Encoding Issue?
I'm ingesting a .TXT file with HTML content into Solr. The content has the following character highlighted below: The file we get from CRM (also attached): [cid:image001.png@01D0B972.75BE23F0] After ingesting into solr, I see a different character. This is query response from solr management console. [cid:image003.png@01D0B972.D1AED290] Anybody know how I can prevent this from happening? Thanks!
Solr cache when using custom scoring
Hi, We are using solr and implemented our own custom scoring. The custom scoring code use a parameter which passed to the solr query, different parameter value will change the score of the same query. The problem which we have is that this parameter is not part of the query caching so running the same query with different parameter values return the first cached result. What is the best way to workaround it (without removing the cache)? Is there a way to tell solr to cache query with the parameter value as well? or maybe add a dummy query to the query (the parameter is pretty long json)? Thanks, Ami -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-cache-when-using-custom-scoring-tp4216419.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1
On 7/8/2015 9:26 AM, Alessandro Benedetti wrote: Taking a look into the documentation I see this inconsistent orderings in my opinion : Alessandro, thank you for your reply. I couldn't really tell what you were saying. I *think* you were agreeing with me that the current behavior seems like a problem, but I'm not really sure. At this point I think I should probably file a bug in Jira ... anyone have any thoughts on that? Thanks, Shawn
Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1
On 7/8/2015 2:10 PM, Shawn Heisey wrote: At this point I think I should probably file a bug in Jira ... anyone have any thoughts on that? It appears that changing luceneMatchVersion from LUCENE_4_9 to LUCENE_47 has fixed this problem ... so I think somebody must have fixed WDF to its current behavior, but put in a version check for the old behavior. I think that WDF's position output with a current luceneMatchVersion is wrong, but I'd like the input of someone who's a little more familiar with the codeand what SHOULD happen. Thanks, Shawn
SolrQueryRequest in SolrCloud vs Standalone Solr
Hi all We have a cluster of standalone Solr cores (Solr 4.3) for which we had built some custom requesthandlers and filters which do query processing using the Terms API. I'm now trying to port the custom functionality to work in the Solr Cloud world. Old configuration had standalone cores with the requesthandler embedded into each: core1 - requesthandler plugin core2 - requesthandler plugin We built an exernal (non-Solr) component that sent every query request to each core and aggregrated the results. When processing the request, within each request handler, it obtained a index searcher by doing SolrIndexSearcher searcher = solrQueryRequest.getSearcher(); followed by searcher.search()... Request1: http://localhost:xxx/solr/core1/plugin?q=blahblah Request2: http://localhost:xxx/solr/core2/plugin?q=blahblah In the SolrCloud version, I expected things to work similarly but at the collection level. New configuration: SolrCloud collection with plugin - shard1 - shard2 So my expectation is when I invoke SolrIndexSearcher searcher = solrQueryRequest.getSearcher() ... I obtain a searcher which can search against the collection i.e against all the shards. But this doesn't seem to happen. It seems that the searcher is executing the query only against shard1 ! Note: I peeked into the SolrQueryRequest object using a debugger and it has a reference to a SolrCore object which just points to shard1. Request: http://localhost:xxx/solr/collection1/plugin?q=blahblah Am I doing something wrong? Is my expectation of how it should work flawed? Any help would be appreciated. Regards CV
Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1
In Lucene 4.8, LUCENE-5111: Fix WordDelimiterFilter offsets https://issues.apache.org/jira/browse/LUCENE-5111 Make sure the documents are queried and indexed with the same Lucene match version. -- Jack Krupansky On Wed, Jul 8, 2015 at 5:19 PM, Shawn Heisey apa...@elyograg.org wrote: On 7/8/2015 2:19 PM, Shawn Heisey wrote: It appears that changing luceneMatchVersion from LUCENE_4_9 to LUCENE_47 has fixed this problem ... so I think somebody must have fixed WDF to its current behavior, but put in a version check for the old behavior. The luceneMatchVersion change has fixed this specific issue with WDF, but these searches on 4.9.1 are still returning zero hits, and I don't yet know why. Thanks, Shawn
RE: About indexing embed file with solr
This may have been an issue with Solr's wrapper of Tika. See: https://issues.apache.org/jira/browse/SOLR-7189 -Original Message- From: 步青云 [mailto:mailliup...@qq.com] Sent: Wednesday, June 17, 2015 10:17 PM To: solr-user Subject: About indexing embed file with solr Hello, Could anyone recieve my email? I'm new to solr and I have some questions, could anyone help me to give me some answer? I index file directly by extracting the content of file using Tika embeded in solr. There is no problem of normal files. While I index a word embeded an another file, such as a pdf file embed in a word, I couldn't get the content of embeded file. For example, I have a word(doc) and there is a pdf embeded in the word(doc), I couldn't index the content of the pdf file. While using the same jar of Tika to extract the content of embed file, I can get the content of embeded file. I know Tika could extract the embed file since version 1.3. And the version of my solr is 4.9.1, Tika used in this version of solr is 1.5. I don't know why I can't get the content of embed file. Could anyone help me? Thank you very much. Ping Liu 18 June. 2015
Re: Adding field to query result
Hey, Thanks for your response. Yes, I think what I'm looking for is a pseudo field. Is the product function a funtion query? I assume I can replace the the product function in an implementation of my own. BTW - is the score field a pseudo field? Maya -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-field-to-query-result-tp4216396p4216424.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr cache when using custom scoring
On Wed, Jul 8, 2015 at 11:30 PM, amid a...@donanza.com wrote: The custom scoring code use a parameter which passed to the solr query, this param should be evaluated in equals() and hashcode(). isn;t it? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
RE: Can I instruct the Tika Entity Processor to skip the first page using the DIH?
Unfortunately, no. We can't even do that now with straight Tika. I imagine this is for pdf files? If you'd like to add this as a feature, please submit a ticket over on Tika. -Original Message- From: Paden [mailto:rumsey...@gmail.com] Sent: Wednesday, July 08, 2015 12:14 PM To: solr-user@lucene.apache.org Subject: Can I instruct the Tika Entity Processor to skip the first page using the DIH? Hello, I'm using the DIH to import some files from one of my local directories. However, every single one of these files has the same first page. So I want to skip that first page in order to optimize search. Can this be accomplished by an instruction within the dataimporthandler or, if not, how could you do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-instruct-the-Tika-Entity-Processor-to-skip-the-first-page-using-the-DIH-tp4216373.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1
On 7/8/2015 2:19 PM, Shawn Heisey wrote: It appears that changing luceneMatchVersion from LUCENE_4_9 to LUCENE_47 has fixed this problem ... so I think somebody must have fixed WDF to its current behavior, but put in a version check for the old behavior. The luceneMatchVersion change has fixed this specific issue with WDF, but these searches on 4.9.1 are still returning zero hits, and I don't yet know why. Thanks, Shawn
Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1
Yes Shawn, I was raising the fact that I see strange values in the positions as well. You said you fixed going back with an old version ? This should not be ok, I mean, I assume the latest version should be the best… Any idea or clarification guys ? 2015-07-08 21:10 GMT+01:00 Shawn Heisey apa...@elyograg.org: On 7/8/2015 9:26 AM, Alessandro Benedetti wrote: Taking a look into the documentation I see this inconsistent orderings in my opinion : Alessandro, thank you for your reply. I couldn't really tell what you were saying. I *think* you were agreeing with me that the current behavior seems like a problem, but I'm not really sure. At this point I think I should probably file a bug in Jira ... anyone have any thoughts on that? Thanks, Shawn -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Adding field to query result
Yes, product is a function query and yes you can write your own. Score is a _really_ special field, accessed with even referenced differently by just the plain score field. You can also use doc transformers to return things like which shard the doc came from, but that's a different syntax just to make it confusing [shard], see: https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents And you can make a custom one of these too, a place to start would be TestCustomDocTransformer in the Solr tests. Best, Erick On Wed, Jul 8, 2015 at 2:00 PM, Maya G maiki...@gmail.com wrote: Hey, Thanks for your response. Yes, I think what I'm looking for is a pseudo field. Is the product function a funtion query? I assume I can replace the the product function in an implementation of my own. BTW - is the score field a pseudo field? Maya -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-field-to-query-result-tp4216396p4216424.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Boost Search word before Specific Content
Hi Jack, Here is hypothetical example: product_title_1 : dell laptop with laptop bag product_title_2 : laptop bag with cover product_title_3 : laptop bag and table You create an artificial/additional field, before_field_1 : dell laptop before_field_2 : laptop bag before_field_3 : laptop bag You can implement/embed any complex/custom logic (to indexing code) for obtaining values of this new boostable before_field. You can even implement it in a custom update processor. Then, at search time, use (e)Dismax's field boosting mechanism q=Laptop bagqf=product_title^0.3 before_field^0.7defType=edismax Ahmet On Wednesday, July 8, 2015 6:56 AM, JACK mfal...@gmail.com wrote: Hi Ahmet, Can you elaborate it more? Is it possible to solve my problem in Solr 5.0.0? if yes can just explain how? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Boost-Search-word-before-Specific-Content-tp4216072p4216257.html Sent from the Solr - User mailing list archive at Nabble.com.
Sorting documents by child documents
Hey, I'm using Solr 4.10.2 and I have child documents in every parent document. Previously, I used FunctionQuery to sort the documents: http://lucene.472066.n3.nabble.com/Sorting-documents-by-nested-child-docs-with-FunctionQueries-tp4209940.html http://lucene.472066.n3.nabble.com/Sorting-documents-by-nested-child-docs-with-FunctionQueries-tp4209940.html Now, I want to sort the documents by their child documents with normal fields. It doesn't work when I use the sort parameter. Thanks in advance, Dor -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-documents-by-child-documents-tp4216263.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Jetty in Solr 5.2.0
On 7/7/2015 10:51 AM, Steven White wrote: What I am faced with is this. I have to create my own crawler, similar to DIH. I have to deploy this on the same server as Solr (this is given, I cannot change it). I have to manage this crawler just like I have to manage my Solr deployment using Solr API through HTTP request. I figured if I deploy my application under Jetty, with Solr, then problem is solved. At some point in the future, Jetty is expected to go away, with Solr becoming a true standalone application. There is no set timeframe for this to happen. It will hopefully happen before 6.0, but the work needs to be *started* before any kind of guess can be made. The other option I looked at is writing my own handler for my crawler and plugging it into Solr's solrconfig.xml. If I do this, then my crawler will run in the same JVM space as Solr, this is something I want to avoid. If you install another webapp into the same Jetty as Solr, then it will be running in the same JVM as Solr. Jetty is the application that the JVM runs, not Solr. This is not very different from a handler in solrconfig.xml. Yet another option is for me deploy a second instance of Jetty on the Solr server just for my crawler. This is over kill in my opinion. What do folks think about this and what's the best way to approach this issue? Deploy my crawler on a separate server is not an option and for my use case Solr will be used in a lightweight so there is plenty of CPU / RAM on this one server to host Solr and my crawler. As you've already been told, it's a very strong recommendation that you treat Solr as a standalone application and forget that it's running in a standard servlet container. That means that any other webapps, like the crawler you mention, should be installed completely separately. In my previous reply, I told you how you *could* install another application into the Jetty included with Solr, but we don't recommend it, because eventually you won't have that option. Thanks, Shawn
Re: Windows Version
On 7/7/2015 10:43 AM, Allan Elkowitz wrote: So I am a newbie at Solr and am having trouble getting the examples working on Windows 7. I downloaded and unzipped the distribution and have been able to get Solr up and running. I can access the admin page. However, when I try to follow the instructions for loading the examples I find that there is a file that I am supposed to have called post.jar which I cannot find in the directory specified, exampledocs. There is a file called post in another directory but it does not seem to be a .jar file. Two questions: 1. Has this been addressed on some site that I am not yet aware of? 2. What am I missing here? The post.jar file is in example\exampledocs in the Solr 5.2.1 download. The bin\post file is a shell script for Linux/UNIX systems that offers easier access to the SimplePostTool class included in the solr-core jar. Unfortunately, no Windows equivalent (post.cmd) exists yet. If you're getting the impression that Windows is a second-class citizen around here, you are not really wrong. A typical Solr user has found that the free operating systems offer better performance and stability, with the added advantage that they don't have to pay Microsoft a pile of money in order to get useful work done. Windows, especially the server operating systems, is a perfectly good platform, but it's not free. Thanks, Shawn
Re: Jetty in Solr 5.2.0
On 7/7/2015 10:03 AM, Steven White wrote: This may be a question to be posted on Jetty mailing list, but I figured I should start here first. Using Solr 5.2.0, when I start Solr, http://localhost:8983/solr/ is the entry point. My question is: 1) Where is solr on the file system? 2) How can I add http://localhost:8983/MyHandler/ to Jetty? For #2, I'm exploring the possibility of using the existing Web Server to see if I can have an additional application running on the same host as Solr. 1) The answer to this question is not simple. Solr is a Java servlet, written using the servlet API. The jetty home is the server directory, and most everything else is relative to that location. Solr comes in the download as the webapps/solr.war file (relative to that server directory) ... which, like a jar file, is a zip archive. The contexts/solr-jetty-context.xml file tells Jetty how to find that war file, where to extract it, and what URL path (normally /solr) will be used to access that application. The .war archive normally gets extracted to solr-webapp/webapp, and that is where Jetty finds all the bits that become Solr. Solr has a home directory, which defaults to ./solr (also relative to that server directory), where the solr.xml file tells Solr how to locate everything else. The solr home can be overridden with commandline options. Your question number 2 is indeed more properly addressed on the Jetty list. If what I've written below is not enough, ask further questions there. 2) You need to write (or find) a servlet and install its .war file into Jetty with a context fragment as we have done with Solr. A servlet container like Jetty is more complicated than a typical webserver like Apache httpd. It runs Java servlet applications, rather than simply serving html files and other similar resources out of a document root. A servlet can (and usually does) have static resources like html and image files. Solr's admin interface is mostly static html, css, images, and javascript that runs in the user's browser and pulls dynamic info from system handlers within Solr. Thanks, Shawn
Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1
On Wed, Jul 8, 2015 at 6:50 PM, Shawn Heisey apa...@elyograg.org wrote: After the fix (with luceneMatchVersion at 4.9), both aaa and bbb end up at position 2. Yikes, that's definitely wrong. -Yonik
Re: Too many Soft commits and opening searchers realtime
Yonik, Mikhail, Alessandro After a lot of digging around and isolation, All u guys were right. I was using property based value and there was one place where it was 30 secs and that was overriding my main props. Also Yonik thanks for the explanation on the real time searcher. I wasn't sure if the maxwarmingSearcher error I was getting also had something to do with it. Thanks a lot On Jul 8, 2015, at 5:28 AM, Yonik Seeley ysee...@gmail.com wrote: A realtime searcher is necessary for internal bookkeeping / uses if a normal searcher isn't opened on a commit. This searcher doesn't have caches and hence doesn't carry the weight that a normal searcher would. It's also invisible to clients (it doesn't change the view of the index for normal searches). Your hard autocommit at 8 minutes with openSearcher=false will trigger a realtime searcher to open on every 8 minutes along with the hard commit. -Yonik On Tue, Jul 7, 2015 at 5:29 PM, Summer Shire shiresum...@gmail.com wrote: HI All, Can someone help me understand the following behavior. I have the following maxTimes on hard and soft commits yet I see a lot of Opening Searchers in the log org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1656a258[main] realtime also I see a soft commit happening almost every 30 secs org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} autoCommit maxTime48/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime18/maxTime /autoSoftCommit I tried disabling softCommit by setting maxTime to -1. On startup solrCore recognized it and logged Soft AutoCommit: disabled but I could still see softCommit=true org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} autoSoftCommit maxTime-1/maxTime /autoSoftCommit Thanks, Summer
RE: EmbeddedSolrServer No such core: collection1
Hi, My problem was that I didn't had core.properties file, so it couldn't create the core. Thanks for the help Shani -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Sunday, July 05, 2015 18:25 To: solr-user@lucene.apache.org Subject: Re: EmbeddedSolrServer No such core: collection1 Hi Shani, What version of Solr are you using? The instructions you quote look like they are for something like 4.4 from what you have written below. The below is cloned from one of my projects, and hacked without testing, but I hope it gives you the idea of how it can be done. public SolrServer getEmbeddedServer(String solrHome, String solrConfigurationPath, String myCore) { // Create solr_home directory with solr.xml new File(solrHome).mkdirs(); FileUtils.copyFile(new File(config.getSolrXmlPath()), new File(solrHome, solr.xml)); // Create config dir for my new core File myCoreConfig = new File(solrHome + / + myCore + /conf); myCoreConfig.mkdirs(); FileUtils.copyDirectory(new File(solrConfigurationPath), myCoreConfig); // Create core.properties file FileUtils.write(new File(core, core.properties), name= + coreName); // Create CoreContainer and EmbeddedSolrServer File solrXml = new File(solrHome, solr.xml); CoreContainer coreContainer = CoreContainer.createAndLoad(solrHome, solrXml); EmbeddedSolrServer newServer = new EmbeddedSolrServer(coreContainer, myCore); } Upayavira On Sun, Jul 5, 2015, at 01:17 PM, Chaushu, Shani wrote: Hi, I'm using EmbeddedSolrServer for testing the solr. I went step by step in this instuctions (for solr 4) https://wiki.searchtechnologies.com/index.php/Unit_Testing_with_Embedd ed_Solr I can see that the config loaded, but when I try to put document, the error I get is: org.apache.solr.common.SolrException: No such core: collection1 I'm sure it's something in the solr.xml, but I couldn't find the issue, Any thought? in the solr.xml I have: solr solrcloud str name=host${host:}/str int name=hostPort${jetty.port:8983}/int str name=hostContext${hostContext:solr}/str int name=zkClientTimeout${zkClientTimeout:3}/int bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool /solrcloud solr persistent=true cores adminPath=collection1 defaultCoreName=collection1 core name=collection1 instanceDir=collection1 / /cores /solr shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr Thanks, Shani - Intel Electronics Ltd. This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. - Intel Electronics Ltd. This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
RE: Solr Encoding Issue?
Shawn - Stupid coding error in my java code. Used default charset. Changed to UTF-8 and problem fixed. Thanks again! -Original Message- From: Tarala, Magesh Sent: Wednesday, July 08, 2015 8:11 PM To: solr-user@lucene.apache.org Subject: RE: Solr Encoding Issue? Wow, that makes total sense. Thanks Shawn!! I'll go down this path. Thanks, Magesh -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Wednesday, July 08, 2015 7:24 PM To: solr-user@lucene.apache.org Subject: Re: Solr Encoding Issue? On 7/8/2015 6:09 PM, Tarala, Magesh wrote: I believe the issue is in solr. The character “à” is getting stored in solr as “à ”. Notice the space after Ã. I'm using solrj to ingest the documents into solr. So, one of those could be the culprit? Solr accepts and outputs text in UTF-8. The UTF-8 hex encoding for the à character is C3A0. In the latin1 character set, hex C3 is the à character. Similarly, in latin1, hex A0 is a non-breaking space. So it sounds like your input is encoded as UTF-8, therefore that character in your input source is hex c3a0, but something in your indexing process is incorrectly interpreting the UTF-8 representation as latin1, so it sees it as à . SolrJ is faithfully converting that input to UTF-8 and sending it to Solr. Thanks, Shawn
RE: Do I really need copyField when my app can do the copy?
Perhaps some people like maybe those using DIH to feed their index might not have that luxury and copyfield is the better way for them. If you have an application you can do it either way. I have done both ways in different situations. Robi -Original Message- From: Steven White [mailto:swhite4...@gmail.com] Sent: Wednesday, July 08, 2015 3:38 PM To: solr-user@lucene.apache.org Subject: Do I really need copyField when my app can do the copy? Hi Everyone, What good is the use of copyField in Solr's schema.xml if my application can do it into the designated field? Having my application do so helps me simplify the schema.xml maintains task thus my motivation. Thanks Steve
Re: Jetty in Solr 5.2.0
Thank you all for your help. I will leave Solr as-is and not step on its feet. Steve On Wed, Jul 8, 2015 at 2:29 AM, Shawn Heisey apa...@elyograg.org wrote: On 7/7/2015 10:51 AM, Steven White wrote: What I am faced with is this. I have to create my own crawler, similar to DIH. I have to deploy this on the same server as Solr (this is given, I cannot change it). I have to manage this crawler just like I have to manage my Solr deployment using Solr API through HTTP request. I figured if I deploy my application under Jetty, with Solr, then problem is solved. At some point in the future, Jetty is expected to go away, with Solr becoming a true standalone application. There is no set timeframe for this to happen. It will hopefully happen before 6.0, but the work needs to be *started* before any kind of guess can be made. The other option I looked at is writing my own handler for my crawler and plugging it into Solr's solrconfig.xml. If I do this, then my crawler will run in the same JVM space as Solr, this is something I want to avoid. If you install another webapp into the same Jetty as Solr, then it will be running in the same JVM as Solr. Jetty is the application that the JVM runs, not Solr. This is not very different from a handler in solrconfig.xml. Yet another option is for me deploy a second instance of Jetty on the Solr server just for my crawler. This is over kill in my opinion. What do folks think about this and what's the best way to approach this issue? Deploy my crawler on a separate server is not an option and for my use case Solr will be used in a lightweight so there is plenty of CPU / RAM on this one server to host Solr and my crawler. As you've already been told, it's a very strong recommendation that you treat Solr as a standalone application and forget that it's running in a standard servlet container. That means that any other webapps, like the crawler you mention, should be installed completely separately. In my previous reply, I told you how you *could* install another application into the Jetty included with Solr, but we don't recommend it, because eventually you won't have that option. Thanks, Shawn
Best way to facets with value preprocessing (w/ docValues)
Hi, folks. Earlier I used solr.TextField with preprocessing (ASCII folding, lowercase etc) on some fields for search and faceting. But on larger index it takes several minutes to uninvert that fields for faceting (I use fieldValueCache warmup queries with facets). It becomes too expensive in case of frequent soft commits (5-10 mins), so I want to migrate to docValues to avoid uninvert phase. Documentation[1] says that only Trie*Field, StrField and UUIDField (which itself is subtype of StrField) support docValues=true. I have tried two ways to workaround this issue: 1. Make a subtype of TextField which overrides `checkSchemaField` efficiently turning docValues for this TextField on. All preprocessing is specified in TokenizeChain analyzer with KeywordTokenizerFactory (so it produces exactly one token for each value in this multivalued field), defined via schema.xml. It seems to work but I haven't tested it under load. What are potential caveats in such scheme? Why it isn't used in trunk Solr? 2. Make subtype of StrField which will perform hardcoded preprocessing (like ASCII folding, lowercasing) but I can't find appropriate point to insert this behavior. The only working method was to override both toInternal and createFields (since creating BytesRef for docValues don't use toInternal there) and do value preprocessing there. What are potential caveats? Search becomes case-insensitive (since toInternal is used by createField and default tokenizer), facets become lowercase because docValues created lowercase by createFields override. StrField-based variant should be faster than TextField-based since TokenStream is reused internally in first case and recreated on each doc with TokenizedChain in second one. But StrField-based approach hardcodes preprocessing. Next issue is that I want to use prefix and suffix wildcard search for some fields. As I understood from code it works only on TextField (because it requires Analyzer to be an instance of TokenizerChain with ReversedWildcardFilterFactory in TokenFilter chain). Should I use it in StrField-based variant by overriding getIndexAnalyzer/getQueryAnalyzer or it would break something? [1]: https://cwiki.apache.org/confluence/display/solr/DocValues -- Best regards, Konstantin Gribov
Re: Tlog replay
On Wed, Jul 8, 2015 at 12:31 PM, Summer Shire shiresum...@gmail.com wrote: Thanks Alessandro ! Any idea on why I couldn't curl the solr core and pass the flag param ? These flags are for internal use only. Solr sets them, the client doesn't. -Yonik
RE: Solr Encoding Issue?
Thanks Erick. I believe the issue is in solr. The character “à” is getting stored in solr as “Ã ”. Notice the space after Ã. I'm using solrj to ingest the documents into solr. So, one of those could be the culprit? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, July 08, 2015 1:36 PM To: solr-user@lucene.apache.org Subject: Re: Solr Encoding Issue? Attachments are pretty aggressively stripped by the e-mail server, so there's nothing to see, you'll have to paste it somewhere else and provide a link. Usually, though, this is a character set issue with the browser using a different charset than Solr, it's really the same character, just displayed differently. Shot in the dark though. Erick On Wed, Jul 8, 2015 at 10:49 AM, Tarala, Magesh mtar...@bh.com wrote: I’m ingesting a .TXT file with HTML content into Solr. The content has the following character highlighted below: The file we get from CRM (also attached): [image: cid:image001.png@01D0B972.75BE23F0] After ingesting into solr, I see a different character. This is query response from solr management console. [image: cid:image003.png@01D0B972.D1AED290] Anybody know how I can prevent this from happening? Thanks!
Re: Difference in WordDelimiterFilter behavior between 4.7.2 and 4.9.1
On 7/8/2015 4:01 PM, Jack Krupansky wrote: In Lucene 4.8, LUCENE-5111: Fix WordDelimiterFilter offsets https://issues.apache.org/jira/browse/LUCENE-5111 Make sure the documents are queried and indexed with the same Lucene match version. Since I have updated the luceneMatchVersion on the 4.9.1 version to LUCENE_47, I am now reindexing it, to see if that helps. I discovered that I had some information backwards in my previous messages -- it is *index* time analysis that differs. Query time analysis is the same across versions. The reindex may very well fix this problem, but luceneMatchVersion is a band-aid, and I think there is a bug to be fixed. I have no doubt that LUCENE-5111 fixed a real issue, but I think it also caused some new problems. When faced with text like aaa-bbb, the original term (created by preserveOriginal) ends up at relative position 1. Prior to this fix, the next terms will be aaa at position 1 and bbb at position 2. The aaabbb term created by the catenation option also ends up at position 2. This arrangement makes perfect sense to me. After the fix (with luceneMatchVersion at 4.9), both aaa and bbb end up at position 2. I can't see how it is logical to end up with these positions. It breaks phrase queries on my index because the query-time analysis puts these two terms at position 1 and 2. The WDF options I chose seemed logical to me when I made them (about four years ago), but I admit that I don't remember the exact motivation behind those choices. You can find the entire fieldType definition in a previous message on this thread. The two analysis chains are the same except for WDF options. Should I use different options? Index-time options: |filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / Query-time options: ||filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 /| Thanks, Shawn
RE: Solr Encoding Issue?
Wow, that makes total sense. Thanks Shawn!! I'll go down this path. Thanks, Magesh -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Wednesday, July 08, 2015 7:24 PM To: solr-user@lucene.apache.org Subject: Re: Solr Encoding Issue? On 7/8/2015 6:09 PM, Tarala, Magesh wrote: I believe the issue is in solr. The character “à” is getting stored in solr as “à ”. Notice the space after Ã. I'm using solrj to ingest the documents into solr. So, one of those could be the culprit? Solr accepts and outputs text in UTF-8. The UTF-8 hex encoding for the à character is C3A0. In the latin1 character set, hex C3 is the à character. Similarly, in latin1, hex A0 is a non-breaking space. So it sounds like your input is encoded as UTF-8, therefore that character in your input source is hex c3a0, but something in your indexing process is incorrectly interpreting the UTF-8 representation as latin1, so it sees it as à . SolrJ is faithfully converting that input to UTF-8 and sending it to Solr. Thanks, Shawn
Do I really need copyField when my app can do the copy?
Hi Everyone, What good is the use of copyField in Solr's schema.xml if my application can do it into the designated field? Having my application do so helps me simplify the schema.xml maintains task thus my motivation. Thanks Steve
Re: Do I really need copyField when my app can do the copy?
On 7/8/2015 4:38 PM, Steven White wrote: What good is the use of copyField in Solr's schema.xml if my application can do it into the designated field? Having my application do so helps me simplify the schema.xml maintains task thus my motivation. I can think of two main uses for copyField. One is to combine the inputs for multiple fields into a catchall field, the other is to analyze the same input in multiple ways. For instance, you may want a field analyzed in one way for searching, but analyzed in a different way to use for facets. Your indexing application can indeed take care of that, but having Solr do it means that your indexing application doesn't need to worry about how the data is being used in search, it just has to get the information to Solr. There may be additional use cases, but those are the ones that came to me when I thought about it for a couple of minutes. Thanks, Shawn
Re: Solr Encoding Issue?
On 7/8/2015 6:09 PM, Tarala, Magesh wrote: I believe the issue is in solr. The character “à” is getting stored in solr as “à ”. Notice the space after Ã. I'm using solrj to ingest the documents into solr. So, one of those could be the culprit? Solr accepts and outputs text in UTF-8. The UTF-8 hex encoding for the à character is C3A0. In the latin1 character set, hex C3 is the à character. Similarly, in latin1, hex A0 is a non-breaking space. So it sounds like your input is encoded as UTF-8, therefore that character in your input source is hex c3a0, but something in your indexing process is incorrectly interpreting the UTF-8 representation as latin1, so it sees it as à . SolrJ is faithfully converting that input to UTF-8 and sending it to Solr. Thanks, Shawn
Re: Solr cache when using custom scoring
No sure I get you, the parameter is passed to solr as a string. It seems like solr use for the caching key only the query, sort and range of documents (from the doc - This cache holds the results of previous searches: ordered lists of document IDs (DocList) based on a query, a sort, and the range of documents requested) Searching for a good way to make sure this parameter will be used as well so different parameters values with the same query will create different cache keys -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-cache-when-using-custom-scoring-tp4216419p4216479.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Grouping and recip function not working with Sharding
Hi Erick, Below example is for grouping issue not for sorting. I have indexed 1839 records with 'NAME' field in all, There may be duplicate record for each 'NAME' value. Let say There are 5 records with NAME='A-SERIES',similarly 3 records with NAME='E-SERIES' etc. I have total 264 unique NAME values. So when I query collection using grouping it should return 264 unique groups with ngroups value as 264. But query returns response with ngroups as 558, however length of groups array in response is 264. { responseHeader:{ status:0, QTime:19, params:{ group.ngroups:true, indent:true, q:*:*, group.field:NAME, group:true, wt:json } }, grouped:{ NAME:{ matches:1839, ngroups:558, - This value should be 264 groups:[ { groupValue:A-SERIES, doclist:{ } }, { groupValue:B-SERIES, doclist:{ } }, { groupValue:C-SERIES, doclist:{ } }, ---Similarly there are total 264 such groups ] } } } From the reference guide: group.ngroups and group.facet require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. Document routing via composite keys can be a useful solution in many situations. It's not clear what you think the prolbem here is. You say: bq: Ex: Below response contains 5 groups (Which is correct) but ngroups is 11. But you have rows set to 5 so? As far as your sorting issue, again an example showing what you think is wrong would be very helpful. Best, Erick On Wed, Jul 8, 2015 at 6:38 AM, Pankaj Sonawane pankaj4sonaw...@gmail.com wrote: Hi, I am using sharding (3 shards) with Zookeeper. When I query a collection using *group=truegroup.field=NAMEgroup.ngroups=true* parameters, *ngroups* in response is incorrect. However I am getting correct count in doclist array. Ex: Below response contains 5 groups (Which is correct) but ngroups is 11. { responseHeader:{ status:0, QTime:49, params:{ group.ngroups:true, indent:true, start:0, q:*:*, group.field:NAME, group:true, wt:json, rows:5 } }, grouped:{ NAME:{ matches:18, ngroups:11, groups:[ { groupValue:A-SERIES, doclist:{ numFound:5, start:0, maxScore:1, docs:[ { NAME:A-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:B-SERIES, doclist:{ numFound:5, start:0, docs:[ { NAME:B-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:C-SERIES, doclist:{ numFound:1, start:0, docs:[ { NAME:C-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:D-SERIES, doclist:{ numFound:5, start:0, docs:[ { NAME:D-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:E-SERIES, doclist:{ numFound:3, start:0, maxScore:1, docs:[ { NAME:E-SERIES, _version_:1505559209034383400 } ] } } ] } } } I am facing same problem with Recip function to get latest record on some date field when using sharding. It returns back records in wrong order. Note: Same configuration works fine on single machine without sharding. Please Help me to find solution. Thanks. On Wed, Jul 8, 2015 at 7:08 PM, Pankaj Sonawane pankaj4sonaw...@gmail.com wrote: Hi, I am using sharding (3 shards) with Zookeeper. When I query a collection using *group=truegroup.field=NAMEgroup.ngroups=true* parameters, *ngroups* in response is
Re: Grouping and recip function not working with Sharding
Erick Erickson erickerickson at gmail.com writes: From the reference guide: group.ngroups and group.facet require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. Document routing via composite keys can be a useful solution in many situations. It's not clear what you think the prolbem here is. You say: bq: Ex: Below response contains 5 groups (Which is correct) but ngroups is 11. But you have rows set to 5 so? As far as your sorting issue, again an example showing what you think is wrong would be very helpful. Best, Erick On Wed, Jul 8, 2015 at 6:38 AM, Pankaj Sonawane pankaj4sonawane at gmail.com wrote: Hi, I am using sharding (3 shards) with Zookeeper. When I query a collection using *group=truegroup.field=NAMEgroup.ngroups=true* parameters, *ngroups* in response is incorrect. However I am getting correct count in doclist array. Ex: Below response contains 5 groups (Which is correct) but ngroups is 11. { responseHeader:{ status:0, QTime:49, params:{ group.ngroups:true, indent:true, start:0, q:*:*, group.field:NAME, group:true, wt:json, rows:5 } }, grouped:{ NAME:{ matches:18, ngroups:11, groups:[ { groupValue:A-SERIES, doclist:{ numFound:5, start:0, maxScore:1, docs:[ { NAME:A-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:B-SERIES, doclist:{ numFound:5, start:0, docs:[ { NAME:B-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:C-SERIES, doclist:{ numFound:1, start:0, docs:[ { NAME:C-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:D-SERIES, doclist:{ numFound:5, start:0, docs:[ { NAME:D-SERIES, _version_:1505559209034383400 } ] } }, { groupValue:E-SERIES, doclist:{ numFound:3, start:0, maxScore:1, docs:[ { NAME:E-SERIES, _version_:1505559209034383400 } ] } } ] } } } I am facing same problem with Recip function to get latest record on some date field when using sharding. It returns back records in wrong order. Note: Same configuration works fine on single machine without sharding. Please Help me to find solution. Thanks. Hi Erick, Below example is for grouping issue not for sorting. I have indexed 1839 records with 'NAME' field in all, There may be duplicate record for each 'NAME' value. Let say There are 5 records with NAME='A-SERIES',similarly 3 records with NAME='E-SERIES' etc. I have total 264 unique NAME values. So when I query collection using grouping it should return 264 unique groups with ngroups value as 264. But query returns response with ngroups as 558, however length of groups array in response is 264. { responseHeader:{ status:0, QTime:19, params:{ group.ngroups:true, indent:true, q:*:*, group.field:NAME, group:true, wt:json } }, grouped:{ NAME:{ matches:1839, ngroups:558, - This value should be 264 groups:[ { groupValue:A-SERIES, doclist:{ } }, { groupValue:B-SERIES, doclist:{ } }, { groupValue:C-SERIES, doclist:{ } }, ---Similarly there are total 264 such groups ] } } }