Re: Suggester configuration queries.
Hi, I am using the Solr Terms Component for auto-suggestion, this provides me the functionality as per my requirements. https://wiki.apache.org/solr/TermsComponent Regards, Sachin Vyas. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4217029.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Suggester configuration queries.
Using the term component to get Auto-suggest is a very old approach, and gives minimal features… If it is ok for you, ok! I would suggest these reading for Auto suggestions : Suggester Solr wiki https://cwiki.apache.org/confluence/display/solr/Suggester Solr suggester http://lucidworks.com/blog/solr-suggester/ ( Erick's post) http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html ( my post) Hope they help! Cheers 2015-07-13 11:51 GMT+01:00 ssharma7...@gmail.com ssharma7...@gmail.com: Hi, For my reply dated Jul 02, 2015; 4:47pm, Actually *there is no difference in results* for spellchecker suggester components in Solr 4.6 and Solr 5.1. I was actually mixing up the two components. Thanks Regards, Sachin Vyas. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4217030.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Restore index API does not work in solr 5.1.0 ?
Hi all, How can we restore index in Solr 5.1.0 ? Best Regards, Dinesh Naik On Thu, Jul 9, 2015 at 6:54 PM, dinesh naik dineshkumarn...@gmail.com wrote: Hi all, How can we restore the index in Solr 5.1.0 ? We did following: 1:- Started Solr Cloud from: bin/solr start -e cloud -noprompt 2:- posted some documents to solr from examples folder using : java -Dc=gettingstarted -jar post.jar *.xml 3:- Backed up the Index using: http://localhost:8983/solr/gettingstarted/replication?command=backup 4:- Deleted 1 document using: http://localhost:8983/solr/gettingstarted/update?stream.body=deletequeryid:IW-02/query/deletecommit=true 5:- restored the index using: http://localhost:8983/solr/gettingstarted/replication?command=restore The Restore works fine with same steps for 5.2 versions but not 5.1 Is there any other way to restore index in Solr 5.1.0? -- Best Regards, Dinesh Naik -- Best Regards, Dinesh Naik
Re: Trouble getting a solr join query done
I was to comment the very same solution! I think this will satisfy the user requirement. Thanks Antonio! Cheers 2015-07-13 12:22 GMT+01:00 Antonio David Pérez Morales adperezmora...@gmail.com: Hi again Yusnel Just to confirm, I have tested your use case and the query which returns what you need is this one: http://localhost:8983/solr/category/select?q={!join from=categoryId fromIndex=product to=id}*:*wt=jsonindent=truefq=name:clotheshl=false Please, check and let us know if it works for you Regards 2015-07-12 17:02 GMT+02:00 Antonio David Pérez Morales adperezmora...@gmail.com: Hi Yusnel I think the query is invalid. It should be q=clothesfq={!join from=type_id to=id fromIndex=products} or q=*:*fq={!join from=type_id to=id fromIndex=products}clothes as long as you are using an edismax parser or df param for default field, where clothes query is matched to. Regards 2015-07-11 2:23 GMT+02:00 Yusnel Rojas García yroj...@gmail.com: I have 2 indexes products { id, name, type_id .. } and categories { id, name .. } and I want to get all categories that match a name and have products in it. my best guess would be: http://localhost:8983/solr/categories/select?q=clothesfl=*,scorefq={!join from=type_id http://localhost:8983/solr/categories/select?q=clothesfl=*,scorefq=%7B!joinfrom=type_id to=id fromIndex=products}*:* but always get an empty response. help please! Is a better way of doing that without using another index? -- -- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Suggester configuration queries.
Hi, For my reply dated Jul 02, 2015; 4:47pm, for my scenario / test data, the results of Spellchecker of Solr 4.6 5.1 are fine. Also, the results of Suggester of Solr 4.6 5.1 are fine. I was mixing up the two components. Thanks Regards, Sachin Vyas. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4217032.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Trouble getting a solr join query done
Hi again Yusnel Just to confirm, I have tested your use case and the query which returns what you need is this one: http://localhost:8983/solr/category/select?q={!join from=categoryId fromIndex=product to=id}*:*wt=jsonindent=truefq=name:clotheshl=false Please, check and let us know if it works for you Regards 2015-07-12 17:02 GMT+02:00 Antonio David Pérez Morales adperezmora...@gmail.com: Hi Yusnel I think the query is invalid. It should be q=clothesfq={!join from=type_id to=id fromIndex=products} or q=*:*fq={!join from=type_id to=id fromIndex=products}clothes as long as you are using an edismax parser or df param for default field, where clothes query is matched to. Regards 2015-07-11 2:23 GMT+02:00 Yusnel Rojas García yroj...@gmail.com: I have 2 indexes products { id, name, type_id .. } and categories { id, name .. } and I want to get all categories that match a name and have products in it. my best guess would be: http://localhost:8983/solr/categories/select?q=clothesfl=*,scorefq={!join from=type_id http://localhost:8983/solr/categories/select?q=clothesfl=*,scorefq=%7B!joinfrom=type_id to=id fromIndex=products}*:* but always get an empty response. help please! Is a better way of doing that without using another index?
Re: Solr search in different servers based on search keyword
Hi Arijit, let me clarify some points, ok? 2015-07-13 6:22 GMT+01:00 Arijit Saha arijitsaha...@gmail.com: Hi Solr/ lucene Experts, We are planning to build a solr/ lucene search application. As per design requirement, the files (on which search operation require to be done) will be lying in separate server. Ok, so the datasources for you search engine, your source of information will be files on different servers. This is perfectly fine. Lucene/Solr don't use the physical files you want to index for search. You feed Solr with the Documents, which will be indexed, producing an Inverted index and a set of inherent data structures to provide Search at query time. What you do really care is whether the index(es) related to your corpus of Documents will be or not distributed across different nodes. We want to use Solr / lucene to perform search operation on files lying in different remote servers. So, considering now the files to be index segments, the answer is yes. Lucene can search between different indexes and Solr on top of it can as well. SolrCloud allow you to architect your Search Engine on a cluster of Solr instances. Each logic Collection can be partitioned in different shards ( partition of the whole index) and each shard can be replicated how much you want. It is possible to implement you own routing strategy ( the way your docs go into which shard), or use already available routing strategies. Yo may be interested in the compositeId routing, which later applies to your search requirement. Take a look to those interesting docs : https://lucidworks.com/blog/multi-level-composite-id-routing-solrcloud/ https://lucidworks.com/blog/solr-cloud-document-routing/ At indexing time you will be able to calculate the shard to send your documents, and be able to have your documents co-located depending of a specific key ( that can be the original server the doc is coming from) Do solr/ lucene support above feature of search in different servers based on search keyword Now you can use at query time the same key you configured at Indexing time and query only a subset of documents, based on their original location. I am newbie to Solr/ Lucene. Please help. Also, let know in case any additional details required. Happy to help again and with better details :) Cheers Much appreciated. Thanks, Arijit -- -- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Suggester configuration queries.
Hi, For my reply dated Jul 02, 2015; 4:47pm, Actually *there is no difference in results* for spellchecker suggester components in Solr 4.6 and Solr 5.1. I was actually mixing up the two components. Thanks Regards, Sachin Vyas. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4217030.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: copying data from one collection to another collection (solr cloud 521)
bq: does offline No. I'm talking about collection aliasing. You can create an entirely new collection, index to it however you want then switch to using that new collection. bq: Any updates to EXISTING document in the LIVE collection should NOT be replicated to the previous week(s) snapshot(s) then give it a new ID maybe? Best, Erick On Mon, Jul 13, 2015 at 3:21 PM, Raja Pothuganti rpothuga...@competitrack.com wrote: Thank you Erick Actually, my question is why do it this way at all? Why not index directly to your live nodes? This is what SolrCloud is built for. You an use implicit routing to create shards say, for each week and age out the ones that are too old as well. Any updates to EXISTING document in the LIVE collection should NOT be replicated to the previous week(s) snapshot(s). Think of the snapshot(s) as an archive of sort and searchable independent of LIVE. We're aiming to support at most 2 archives of data in the past. Another option would be to use collection aliasing to keep an offline index up to date then switch over when necessary. Does offline indexing refers to this link https://github.com/cloudera/search/tree/0d47ff79d6ccc0129ffadcb50f9fe0b271f 102aa/search-mr Thanks Raja On 7/13/15, 3:14 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, my question is why do it this way at all? Why not index directly to your live nodes? This is what SolrCloud is built for. There's the new backup/restore functionality that's still a work in progress, see: https://issues.apache.org/jira/browse/SOLR-5750 You an use implicit routing to create shards say, for each week and age out the ones that are too old as well. Another option would be to use collection aliasing to keep an offline index up to date then switch over when necessary. I'd really like to know this isn't an XY problem though, what's the high-level problem you're trying to solve? Best, Erick On Mon, Jul 13, 2015 at 12:49 PM, Raja Pothuganti rpothuga...@competitrack.com wrote: Hi, We are setting up a new SolrCloud environment with 5.2.1 on Ubuntu boxes. We currently ingest data into a large collection, call it LIVE. After the full ingest is done we then trigger a delta delta ingestion every 15 minutes to get the documents data that have changed into this LIVE instance. In Solr 4.X using a Master / Slave setup we had slaves that would periodically (weekly, or monthly) refresh their data from the Master rather than every 15 minutes. We're now trying to figure out how to get this same type of setup using SolrCloud. Question(s): - Is there a way to copy data from one SolrCloud collection into another quickly and easily? - Is there a way to programmatically control when a replica receives it's data or possibly move it to another collection (without losing data) that updates on a different interval? It ideally would be another collection name, call it Week1 ... Week52 ... to avoid a replica in the same collection serving old data. One option we thought of was to create a backup and then restore that into a new clean cloud. This has a lot of moving parts and isn't nearly as neat as the Master / Slave controlled replication setup. It also has the side effect of potentially taking a very long time to backup and restore instead of just copying the indexes like the old M/S setup. Any ideas of thoughts? Thanks in advance for you help. Raja
Re: Persistence problem with swapped cores after Solr restart -- 4.9.1
Uggghh. Not persistence again I'll stay tuned.. Erick On Mon, Jul 13, 2015 at 2:44 PM, Shawn Heisey apa...@elyograg.org wrote: On Solr 4.9.1 with core discovery, I seem to be having trouble with core swaps not persisting through a full Solr restart. I apologize for the fact that this message is lean on details ... I've seen the problem twice now, but I don't have any concrete before/after information about what's in each core.properties file. I am attempting to set up the scenario again and gather that information. The entire directory structure is set up as a git repo, so I will be able to tell if any files (like core.properties) are modified for the rebuild/swap that I have started. The repo shows no changes at the moment, but I have done several of these rebuild/swap operations, so even if core.properties is being correctly updated, it might just have landed back on the original configuration. I have another copy of my index using Solr 4.7.2 with the old solr.xml format that seems to have no problems with core swapping and persistence. That works differently, though -- all cores are defined in solr.xml rather than with core.properties files. When I first set up these Solr instances, I don't recall having this problem, but full Solr restarts are really rare, so it's possible I just didn't create the right circumstances. Thanks, Shawn
Re: Why I get a hit on %, , but not on !, @, #, $, ^, *
Oops... that's the types attribute. -- Jack Krupansky On Mon, Jul 13, 2015 at 11:11 PM, Jack Krupansky jack.krupan...@gmail.com wrote: The word delimiter filter is remmoving special characters. You can add a file containing a list of the special characters that you wish to treat as alpha, using the type parameter. -- Jack Krupansky On Mon, Jul 13, 2015 at 6:43 PM, Steven White swhite4...@gmail.com wrote: Hi Everyone, I think the subject line said it all. Here is the schema I'm using: fieldType name=my_text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 splitOnNumerics=1 stemEnglishPossessive=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I'm guessing this is due to how solr.WhitespaceTokenizerFactory works and those that it is not indexing are removed because they are considered white-spaces? If so, how can I include %, , etc. into this none-indexed list? I would rather see all these not indexed vs some are and some are not causing confusion to my users. Thanks Steve
Re: Running Solr 5.2.1 on WIndows using NSSM
Adrian, Do you know if this script creates a config file somewhere? Would it be possible/helpful to have a script in Solr's /bin to run it as a service? e.g.: bin\install_solr_service.cmd It would assume these defaults: -nssm c:\Program Files\nssm\win64\nssm -servicename Solr -start true The rest of the parameters would be the same as bin\solr.cmd It would, behind the scenes, run: nssm install Solr bin/solr.cmd -f %* nssm set Solr AppDirectory . And possibly: nssm start Solr I don't have a Windows setup to try this on right now, but I'd like to see such a script inside the bin/ directory. Would this work? Upayavira On Tue, Jul 14, 2015, at 02:53 AM, Adrian Liew wrote: Hi Edwin, Sorry for the late reply. Was caught up yesterday. Yes I did not use the start.jar command and followed this article using solr.cmd - http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/. I am using a Windows Server 2012 R2 Server. The article example shows that it passes the start -f -p 8983 as arguments to the service. I believe it is important to have the -f. Did you try this example? If it didn't work for you, have you tried to remove the service via nssm and add it again? Best regards, Adrian -Original Message- From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] Sent: Monday, July 13, 2015 10:51 AM To: solr-user@lucene.apache.org Subject: Re: Running Solr 5.2.1 on WIndows using NSSM Hi Adrian, I got this to work for Solr 5.1, but when I tried this in Solr 5.2.1, it gives the error Windows could not start the solr5.2.1 service on Local Computer. The service did not return an error. This could be an internal Windows error or an internal service error. As Solr 5.2.1 is not using the start.jar command to run Solr, are we still able to use the same arguments to set up the nssm? Regards, Edwin On 8 July 2015 at 17:38, Adrian Liew adrian.l...@avanade.com wrote: Answered my own question. :) It seems to work great for me by following this article. http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/ Regards, Adrian -Original Message- From: Adrian Liew [mailto:adrian.l...@avanade.com] Sent: Wednesday, July 8, 2015 4:43 PM To: solr-user@lucene.apache.org Subject: Running Solr 5.2.1 on WIndows using NSSM Hi guys, I am looking to run Apache Solr v5.2.1 on a windows machine. I tried to setup a windows service using NSSM (Non-Sucking-Service-Manager) to install the windows service on the machine pointing to the solr.cmd file path itself and installing the service. After installation, I tried to start the windows service but it gives back an alert message. It says \Windows could not start the SolrService service on Local Computer. The service did not return an error. This could be an internal Windows error or an internal service error. Most of the examples of older Apache Solr uses the java -start start.jar command to run Solr and seem to run okay with nssm. I am not sure if this could be the solr.cmd issue or NSSM's issue. Alternatively, I have tried to use Windows Task Scheduler to configure a task to point to the solr.cmd as well and run task whenever the computer starts (regardless a user is logged in or not). The task scheduler seems to report back 'Task Start Failed' with Level of 'Error'. Additionally, after checking Event Viewer, it returns the error with nssm Failed to open process handle for process with PID 3640 when terminating service Solr Service : The parameter is incorrect. Chances this can point back to the solr.cmd file itself. Thoughts? Regards, Adrian
Re: Why I get a hit on %, , but not on !, @, #, $, ^, *
The word delimiter filter is remmoving special characters. You can add a file containing a list of the special characters that you wish to treat as alpha, using the type parameter. -- Jack Krupansky On Mon, Jul 13, 2015 at 6:43 PM, Steven White swhite4...@gmail.com wrote: Hi Everyone, I think the subject line said it all. Here is the schema I'm using: fieldType name=my_text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 splitOnNumerics=1 stemEnglishPossessive=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I'm guessing this is due to how solr.WhitespaceTokenizerFactory works and those that it is not indexing are removed because they are considered white-spaces? If so, how can I include %, , etc. into this none-indexed list? I would rather see all these not indexed vs some are and some are not causing confusion to my users. Thanks Steve
Field collapsing on parent document
Hello, I use a blockjoin document structure with 3 levels (base, path and attributes). I am performing a facet query to count the number of different attributes, but I would like to group or collapse them at path level. I can easily collapse them on base (by using _root_), but I want them to be grouped or collapsed at the intermediate level. Can I do that? So basically a query that combines a parent child query with a collapse field query. Something like this: {!child of=type:path}{!collapse field=_root_} Gr -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-parent-document-tp4217053.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Lingo3g-Solr integration - ClassNotFoundException: com.google.common.base.MoreObjects
Just a quick update, The version of Lingo3G (1.12.0) does not seem compatible with the older version of Guava packaged with Solr. Switching to an older version of Lingo3G has resolved the issue. Thanks for the help! Collin Collin Mandris Associate Engineer, Software Defense Solutions Division General Dynamics Information Technology 55 Dodge Road Buffalo, New York 14068-1205 Phone: 1-716-243-4022 Fax: (716) 691-3642 collin.mand...@gdit.com -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Friday, July 10, 2015 15:50 To: solr-user@lucene.apache.org Subject: Re: Lingo3g-Solr integration - ClassNotFoundException: com.google.common.base.MoreObjects On 7/10/2015 10:09 AM, Mandris, Collin wrote: Hello, I am trying to integrate Lingo3g with Solr. I have arrived at the error ClassNotFoundException error using Lingo3g (verison 1.12.0) with Solr 4.8.0. I located the guava-18.0.jar, which contains the com.google.common.base.MoreObjects class, and have tried putting it in multiple locations within our Solr deployment, but have had no luck in getting by the error. So far, I have tried: 1)Adding Class-Path: guava-18.0.jar to the manifest file in start.jar, solr.war and lingo3g-1.12.0.jar, with guava-18.0.jar copied to the same folder as each respective jar file. 2)Putting guava-18.0.jar in the contrib\clustering\lib folder with the other lingo3g jar files. 3)Putting guava-18.0.jar in the java jdk bin folder. Solr already includes guava, but it's a very old version -- 14.0.1. This means that you can't simply add a newer guava jar... but I've just tried upgrading Guava in the solr source code to 18.0, and Solr won't compile. We have an unresolved issue to upgrade guava to version 15. Somebody mentioned kite-morphlines as a blocker for that, but I'm not sure what the full story is. I've updated the issue with a comment about this thread. https://issues.apache.org/jira/browse/SOLR-5584 At this time, you can't use anything that depends on Guava 18. This is a textbook case of jar hell ... we need to get guava upgraded in Solr. https://en.wikipedia.org/wiki/Java_Classloader#JAR_hell Thanks, Shawn
Multiple facet fields Query
Hi, If I want to add facet on multiple fields I am typically adding multiple facet.fields as part of the query. facet=true facet.field=field1 facet.field=field2 Is there another way to do this instead of using the facet.field multiple time but using only say facet.field=field1,field2. I am running into issue integrating it with our esb layer because of the field. Thanks, Ajeet Phansalkar
Re: Multiple facet fields Query
On Mon, Jul 13, 2015, at 03:09 PM, Phansalkar, Ajeet wrote: Hi, If I want to add facet on multiple fields I am typically adding multiple facet.fields as part of the query. facet=true facet.field=field1 facet.field=field2 Is there another way to do this instead of using the facet.field multiple time but using only say facet.field=field1,field2. I am running into issue integrating it with our esb layer because of the field. It is a common pattern within Solr to use multiple request parameters with the same name. You may be able to get around it, if you are using the latest Solr, using the JSON facet or the JSON query API, which encapsulate similar functionality in a JSON snippet. Upayavira
Re: Querying Nested documents
what about? http://localhost:8983/solr/demo/select?q={!parent%20which=%27type:parent%27}fl=*,[child%20parentFilter=type:parent%20childFilter=-type:parent]indent=true -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
RE: Running Solr 5.2.1 on WIndows using NSSM
Hi Edwin, Sorry for the late reply. Was caught up yesterday. Yes I did not use the start.jar command and followed this article using solr.cmd - http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/. I am using a Windows Server 2012 R2 Server. The article example shows that it passes the start -f -p 8983 as arguments to the service. I believe it is important to have the -f. Did you try this example? If it didn't work for you, have you tried to remove the service via nssm and add it again? Best regards, Adrian -Original Message- From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] Sent: Monday, July 13, 2015 10:51 AM To: solr-user@lucene.apache.org Subject: Re: Running Solr 5.2.1 on WIndows using NSSM Hi Adrian, I got this to work for Solr 5.1, but when I tried this in Solr 5.2.1, it gives the error Windows could not start the solr5.2.1 service on Local Computer. The service did not return an error. This could be an internal Windows error or an internal service error. As Solr 5.2.1 is not using the start.jar command to run Solr, are we still able to use the same arguments to set up the nssm? Regards, Edwin On 8 July 2015 at 17:38, Adrian Liew adrian.l...@avanade.com wrote: Answered my own question. :) It seems to work great for me by following this article. http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/ Regards, Adrian -Original Message- From: Adrian Liew [mailto:adrian.l...@avanade.com] Sent: Wednesday, July 8, 2015 4:43 PM To: solr-user@lucene.apache.org Subject: Running Solr 5.2.1 on WIndows using NSSM Hi guys, I am looking to run Apache Solr v5.2.1 on a windows machine. I tried to setup a windows service using NSSM (Non-Sucking-Service-Manager) to install the windows service on the machine pointing to the solr.cmd file path itself and installing the service. After installation, I tried to start the windows service but it gives back an alert message. It says \Windows could not start the SolrService service on Local Computer. The service did not return an error. This could be an internal Windows error or an internal service error. Most of the examples of older Apache Solr uses the java -start start.jar command to run Solr and seem to run okay with nssm. I am not sure if this could be the solr.cmd issue or NSSM's issue. Alternatively, I have tried to use Windows Task Scheduler to configure a task to point to the solr.cmd as well and run task whenever the computer starts (regardless a user is logged in or not). The task scheduler seems to report back 'Task Start Failed' with Level of 'Error'. Additionally, after checking Event Viewer, it returns the error with nssm Failed to open process handle for process with PID 3640 when terminating service Solr Service : The parameter is incorrect. Chances this can point back to the solr.cmd file itself. Thoughts? Regards, Adrian
Re: XML File Size for Post.jar
I don't think you can do files that big. The memory would blow out. You sure you cannot chunk it into smaller document sets? Or make it a streaming parsing with DIH in a pull fashion? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 13 July 2015 at 14:56, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: HI, Where I have to change to support the xml file more than 2GB to Index in Solr, using the simple post tool (post.jar) for Jetty and Tomcat. Thanks Ravi
RE: XML File Size for Post.jar
I Can break that into smaller files but for other case the number of files growing in 100s.. Can I Parse XML Files to DIH..? Can you refer few examples..? Thanks Ravi -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Monday, July 13, 2015 3:01 PM To: solr-user Subject: Re: XML File Size for Post.jar I don't think you can do files that big. The memory would blow out. You sure you cannot chunk it into smaller document sets? Or make it a streaming parsing with DIH in a pull fashion? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 13 July 2015 at 14:56, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: HI, Where I have to change to support the xml file more than 2GB to Index in Solr, using the simple post tool (post.jar) for Jetty and Tomcat. Thanks Ravi
RE: Solr cloud error during document ingestion
Shawn, Here are my responses: Is that the entire error, or is there additional error information? Do you have any way to know exactly what is in that request that's throwing the error? That's the entire error stack. Don’t see anything else in solr log. Probably need to turn on additional logging? I've identified the text in the email (.msg) that's causing it. This is it: (daños) The tilde in the n is the culprit. If I remove this and run the load, it works fine. You said 4.10.2 ... is this the Solr or SolrJ version? Are both of them the same version? Are you running Solr in the included jetty, or have you installed it into another servlet container? What Java vendor and version are you running, and is it 64-bit? Solr version is 4.10.2 Solrj version is 4.10.3 Using built in Jetty. Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Can you share your SolrJ code, solrconfig, schema, and any other information you can think of that might be relevant? Yes, absolutely. Where would you like to see it posted? Because the error is from a request, I doubt that autoCommit has anything to do with the problem, but I could be wrong about that. Yes, agree. This is not related to autoCommit. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Sunday, July 12, 2015 6:25 PM To: solr-user@lucene.apache.org Subject: Re: Solr cloud error during document ingestion On 7/11/2015 9:33 PM, Tarala, Magesh wrote: I'm using 4.10.2 in a 3 node solr cloud setup I have a collection with 3 shards and 2 replicas each. I'm ingesting solr documents via solrj. While ingesting the documents, I get the following error: 264147944 [updateExecutor-1-thread-268] ERROR org.apache.solr.update.StreamingSolrServers ? error org.apache.solr.common.SolrException: Bad Request request: http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I commit after every 100 documents in solrj. And I also have the following solrconfig.xml setting: autoCommit maxTime${solr.autoCommit.maxTime:15000}/maxTime openSearcherfalse/openSearcher /autoCommit Is that the entire error, or is there additional error information? Do you have any way to know exactly what is in that request that's throwing the error? You said 4.10.2 ... is this the Solr or SolrJ version? Are both of them the same version? Are you running Solr in the included jetty, or have you installed it into another servlet container? What Java vendor and version are you running, and is it 64-bit? Can you share your SolrJ code, solrconfig, schema, and any other information you can think of that might be relevant? Because the error is from a request, I doubt that autoCommit has anything to do with the problem, but I could be wrong about that. Thanks, Shawn
RE: Multiple facet fields Query
Indeed, it is built into the HTML Forms specification that any query parameter may be repeated any number of times. If your ESB tool didn't support this, it would be very broken. My expectation is that it does and a bit more debugging and/or research into the product will yield results. Are you using POST but not setting the Content-Type: application/x-www-form-urlencoded? Also, check that you are encoding using UTF-8 character set and have correctly escaped reserved characters. Fwiw, SolrJ will do the right thing here. So, if nothing else, what it puts on the wire can be used as a reference. See http://www.w3.org/TR/html401/interact/forms.html Every professional java/perl/C/C++/etc. URL implementation I have ever worked with supports multiple values per name encoded as name1=fooname1=bar... with a high degree of interoperability. -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Monday, July 13, 2015 10:33 AM To: solr-user@lucene.apache.org Subject: Re: Multiple facet fields Query On Mon, Jul 13, 2015, at 03:09 PM, Phansalkar, Ajeet wrote: Hi, If I want to add facet on multiple fields I am typically adding multiple facet.fields as part of the query. facet=true facet.field=field1 facet.field=field2 Is there another way to do this instead of using the facet.field multiple time but using only say facet.field=field1,field2. I am running into issue integrating it with our esb layer because of the field. It is a common pattern within Solr to use multiple request parameters with the same name. You may be able to get around it, if you are using the latest Solr, using the JSON facet or the JSON query API, which encapsulate similar functionality in a JSON snippet. Upayavira * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF *
XML File Size for Post.jar
HI, Where I have to change to support the xml file more than 2GB to Index in Solr, using the simple post tool (post.jar) for Jetty and Tomcat. Thanks Ravi
Re: Highlighting pre and post tags not working
You need to xml encode the tags. So instead of em, put lt;emgt; and instead of /em put lt;/emgt; Upayavira On Mon, Jul 13, 2015, at 05:19 PM, Paden wrote: Hello, I'm trying to get some Solr highlighting going but I've run into a small problem. When I set the pre and post tags with my own custom tag I get an XML error XML Parsing Error: mismatched tag. Expected: /em. Location: file:///home/paden/Downloads/solr-5.1.0/server/solr/Testcore2/conf/solrconfig.xml Line Number 476, Column 40: str name=hl.simple.preem/str I've seen it done like this on a lot of the other sites and I'm not sure if I'm missing an escape character or something. Just to emphasize that I did set a POST tag I put it right after the pre in solrconfig.xml like so str name=hl.simple.preem/str str name=hl.simple.post/em/str What am I doing wrong here? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-pre-and-post-tags-not-working-tp4217090.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting pre and post tags not working
Try str name=hl.simple.prelt;emgt;/str or str name=hl.simple.pre![CDATA[em]]/str The bare and confuse the XML parsing. Best Erick On Mon, Jul 13, 2015 at 9:19 AM, Paden rumsey...@gmail.com wrote: Hello, I'm trying to get some Solr highlighting going but I've run into a small problem. When I set the pre and post tags with my own custom tag I get an XML error XML Parsing Error: mismatched tag. Expected: /em. Location: file:///home/paden/Downloads/solr-5.1.0/server/solr/Testcore2/conf/solrconfig.xml Line Number 476, Column 40: str name=hl.simple.preem/str I've seen it done like this on a lot of the other sites and I'm not sure if I'm missing an escape character or something. Just to emphasize that I did set a POST tag I put it right after the pre in solrconfig.xml like so str name=hl.simple.preem/str str name=hl.simple.post/em/str What am I doing wrong here? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-pre-and-post-tags-not-working-tp4217090.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Planning Solr migration to production: clean and autoSoftCommit
Hi Erick, That status request shows if the Solr instance is busy or idle. I think this is a doable option to check if the indexing process completed (idle) or not (busy). Now, I have some concern about the solution of not using the default polling mechanism from the slave instance to the master instance. The load test showed that the initial batches of requests got much longer response time than later batches after the Solr server was started up. Gradually, the performance got much better, presumably due to the cache being warmed up . I understand that the indexing process will commit the changes and also auto warms queries in the existing cache. In this case, the indexing Solr instance will be in a good shape to serve the requests after the indexing process is completed. The question: When the slave instances poll the indexing instance (master), do these slave instances also auto warm queries in the existing cache? If it does, then the polling mechanism will also make the slave instance more ready to server requests (more performant) at any time. When we talk about the forced replication solution, are we pushing /overwriting all the old index files with the new index files? do we need to restart Solr instance? In addition, will slave instances warmed up in any way? If there are too many issues with the force replication, I might as well work out the incremental indexing option. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Planning-Solr-migration-to-production-clean-and-autoSoftCommit-tp4216736p4217102.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Planning Solr migration to production: clean and autoSoftCommit
bq: When the slave instances poll the indexing instance (master), do these slave instances also auto warm queries in the existing cache Yes. bq: When we talk about the forced replication solution, are we pushing /overwriting all the old index files with the new index files? I believe so, but don't know the entire details. In our situation this is what'll happen anyway since you're cleaning, right? So it really doesn't matter if you do a fetchindex or just disable/enable polling, the work will essentially be the same. bq: do we need to restart Solr instance? no bq: In addition, will slave instances warmed up in any way? all autowarming will be done. Really, I'd just start by disabling replication on the master, doing the indexing, then re-enabling it. The rest should just happen. Best, Erick On Mon, Jul 13, 2015 at 10:48 AM, wwang525 wwang...@gmail.com wrote: Hi Erick, That status request shows if the Solr instance is busy or idle. I think this is a doable option to check if the indexing process completed (idle) or not (busy). Now, I have some concern about the solution of not using the default polling mechanism from the slave instance to the master instance. The load test showed that the initial batches of requests got much longer response time than later batches after the Solr server was started up. Gradually, the performance got much better, presumably due to the cache being warmed up . I understand that the indexing process will commit the changes and also auto warms queries in the existing cache. In this case, the indexing Solr instance will be in a good shape to serve the requests after the indexing process is completed. The question: When the slave instances poll the indexing instance (master), do these slave instances also auto warm queries in the existing cache? If it does, then the polling mechanism will also make the slave instance more ready to server requests (more performant) at any time. When we talk about the forced replication solution, are we pushing /overwriting all the old index files with the new index files? do we need to restart Solr instance? In addition, will slave instances warmed up in any way? If there are too many issues with the force replication, I might as well work out the incremental indexing option. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Planning-Solr-migration-to-production-clean-and-autoSoftCommit-tp4216736p4217102.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting pre and post tags not working
Within XML, angle brackets must be escaped as lt; and gt; On Jul 13, 2015, at 12:19 PM, Paden rumsey...@gmail.com wrote: Hello, I'm trying to get some Solr highlighting going but I've run into a small problem. When I set the pre and post tags with my own custom tag I get an XML error XML Parsing Error: mismatched tag. Expected: /em. Location: file:///home/paden/Downloads/solr-5.1.0/server/solr/Testcore2/conf/solrconfig.xml Line Number 476, Column 40: str name=hl.simple.preem/str I've seen it done like this on a lot of the other sites and I'm not sure if I'm missing an escape character or something. Just to emphasize that I did set a POST tag I put it right after the pre in solrconfig.xml like so str name=hl.simple.preem/str str name=hl.simple.post/em/str What am I doing wrong here? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-pre-and-post-tags-not-working-tp4217090.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FieldCache error for multivalued fields in json facets.
On Mon, Jul 13, 2015 at 1:55 AM, Iana Bondarska yana2...@gmail.com wrote: Hi, I'm using json query api for solr 5.2. When query for metrics for multivalued fields, I get error: can not use FieldCache on multivalued field: sales. I've found in solr wiki that to avoid using fieldcache I should set facet.method parameter to enum. Now my question is how can I add facet.enum parameter to query? My original query looks like this: {limit:0,offset:0,facet:{facet:{facet:{mechanicnumbers_sum:sum(sales)},limit:0,field:brand,type:terms}}} sum(field) is currently only implemented for single-valued numeric fields. Can you make the sales field single-valued, or do you actually need multiple values per document? -Yonik
Highlighting pre and post tags not working
Hello, I'm trying to get some Solr highlighting going but I've run into a small problem. When I set the pre and post tags with my own custom tag I get an XML error XML Parsing Error: mismatched tag. Expected: /em. Location: file:///home/paden/Downloads/solr-5.1.0/server/solr/Testcore2/conf/solrconfig.xml Line Number 476, Column 40: str name=hl.simple.preem/str I've seen it done like this on a lot of the other sites and I'm not sure if I'm missing an escape character or something. Just to emphasize that I did set a POST tag I put it right after the pre in solrconfig.xml like so str name=hl.simple.preem/str str name=hl.simple.post/em/str What am I doing wrong here? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-pre-and-post-tags-not-working-tp4217090.html Sent from the Solr - User mailing list archive at Nabble.com.
Querying Nested documents
Hi, I have question regarding nested documents.My document looks like below, 1234xger00parent 2015-06-15T13:29:07ZegeDuperhttp://www.domain.com zoome1234-images http://somedomain.com/some.jpg1:1 1234-platform-iosios https://somedomain.comsomelinkfalse 2015-03-23T10:58:00Z-12-30T19:00:00Z 1234-platform-androidandroid somedomain.comsomelinkfalse 2015-03-23T10:58:00Z-12-30T19:00:00Z Right now I can query like thishttp://localhost:8983/solr/demo/select?q={!parent%20which=%27type:parent%27}fl=*,[child%20parentFilter=type:parent%20childFilter=image_uri_s:*]indent=trueand get the parent and child document with matching criteria (just parent and image child document).*But, I want to get all other children* (1234-platform-ios and 1234-platform-andriod) even if i query based on image_uri_s (1234-images) although they are other children which are part of the parent document.Is it possible ?Appreciate your help !Thanks,Ramesh -- View this message in context: http://lucene.472066.n3.nabble.com/Querying-Nested-documents-tp4217088.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: copying data from one collection to another collection (solr cloud 521)
On 7/13/2015 1:49 PM, Raja Pothuganti wrote: We are setting up a new SolrCloud environment with 5.2.1 on Ubuntu boxes. We currently ingest data into a large collection, call it LIVE. After the full ingest is done we then trigger a delta delta ingestion every 15 minutes to get the documents data that have changed into this LIVE instance. In Solr 4.X using a Master / Slave setup we had slaves that would periodically (weekly, or monthly) refresh their data from the Master rather than every 15 minutes. We're now trying to figure out how to get this same type of setup using SolrCloud. Question(s): - Is there a way to copy data from one SolrCloud collection into another quickly and easily? - Is there a way to programmatically control when a replica receives it's data or possibly move it to another collection (without losing data) that updates on a different interval? It ideally would be another collection name, call it Week1 ... Week52 ... to avoid a replica in the same collection serving old data. One option we thought of was to create a backup and then restore that into a new clean cloud. This has a lot of moving parts and isn't nearly as neat as the Master / Slave controlled replication setup. It also has the side effect of potentially taking a very long time to backup and restore instead of just copying the indexes like the old M/S setup. SolrCloud works very differently than replication. When you send an indexing request, the documents are forwarded to the leader replica of the shard that will index them. The leader indexes the documents locally and sends a copy to all other replicas, each of which independently indexes those documents. There's no need to copy finished indexes (or even index segments) around -- each shard replica builds itself incrementally in parallel with the others as you index new documents. There is no polling interval -- replicas change at nearly the same time when you do an index update. Rather than separate collections for each week, you might want to consider using the implicit router on a single collection and creating a new *shard* for each week. This would be done with the CREATESHARD action on the collections API. The implicit router does create a new wrinkle for indexing -- you cannot index to the entire collection ... you must specifically index to one of the replicas for that specific shard. There might be some way to indicate on the update request which shard it should go to, but I haven't examined SolrCloud requests in that much detail. As for copying indexes ... the newest versions of Solr include a backup/restore API, but if your indexes are very large, this will be quite slow. TL;DR info: With enough digging, you will learn that SolrCloud *does* require a replication handler, which might be very confusing, since I've just told you that it's very different from replication. That handler is *only* used when a replica requires recovery. Recovery might be required because a replica has been down too long, has been newly created, or some similar situation. It is NOT used during normal SolrCloud operation. Collections are made up of one or more shards. Shards have one or more replicas. Each replica is a core. https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works There's a lot of info in a small space here. It will hopefully be enough for you to find more detail in the Solr documentation, the wiki, or possibly other locations. Thanks, Shawn
copying data from one collection to another collection (solr cloud 521)
Hi, We are setting up a new SolrCloud environment with 5.2.1 on Ubuntu boxes. We currently ingest data into a large collection, call it LIVE. After the full ingest is done we then trigger a delta delta ingestion every 15 minutes to get the documents data that have changed into this LIVE instance. In Solr 4.X using a Master / Slave setup we had slaves that would periodically (weekly, or monthly) refresh their data from the Master rather than every 15 minutes. We're now trying to figure out how to get this same type of setup using SolrCloud. Question(s): - Is there a way to copy data from one SolrCloud collection into another quickly and easily? - Is there a way to programmatically control when a replica receives it's data or possibly move it to another collection (without losing data) that updates on a different interval? It ideally would be another collection name, call it Week1 ... Week52 ... to avoid a replica in the same collection serving old data. One option we thought of was to create a backup and then restore that into a new clean cloud. This has a lot of moving parts and isn't nearly as neat as the Master / Slave controlled replication setup. It also has the side effect of potentially taking a very long time to backup and restore instead of just copying the indexes like the old M/S setup. Any ideas of thoughts? Thanks in advance for you help. Raja
Re: Planning Solr migration to production: clean and autoSoftCommit
Hi Erick, I think this is good solution. It is going to work although I have not implemented with Http API which I was able to find in https://wiki.apache.org/solr/SolrReplication. In my local machine, a total of 800MB of index files were downloaded within a minute to another folder. However, transfer the index files across network could be longer. I will test it with two-machine scenario. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Planning-Solr-migration-to-production-clean-and-autoSoftCommit-tp4216736p4217122.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: XML File Size for Post.jar
If you have hundreds of files, the post command (SimplePostTool) can also push a directory of files up to Solr. (It is called Simple under the hood, but it is far from simple!) Upayavira On Mon, Jul 13, 2015, at 09:28 PM, Alexandre Rafalovitch wrote: Solr ships with XML processing example for DIH in the examples directory (RSS core). In your case, you will most probably read the filelist or directory list and then run XML processor as a nested entity. So, check the nested example at https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 13 July 2015 at 15:12, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: I Can break that into smaller files but for other case the number of files growing in 100s.. Can I Parse XML Files to DIH..? Can you refer few examples..? Thanks Ravi -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Monday, July 13, 2015 3:01 PM To: solr-user Subject: Re: XML File Size for Post.jar I don't think you can do files that big. The memory would blow out. You sure you cannot chunk it into smaller document sets? Or make it a streaming parsing with DIH in a pull fashion? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 13 July 2015 at 14:56, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: HI, Where I have to change to support the xml file more than 2GB to Index in Solr, using the simple post tool (post.jar) for Jetty and Tomcat. Thanks Ravi
Re: copying data from one collection to another collection (solr cloud 521)
Actually, my question is why do it this way at all? Why not index directly to your live nodes? This is what SolrCloud is built for. There's the new backup/restore functionality that's still a work in progress, see: https://issues.apache.org/jira/browse/SOLR-5750 You an use implicit routing to create shards say, for each week and age out the ones that are too old as well. Another option would be to use collection aliasing to keep an offline index up to date then switch over when necessary. I'd really like to know this isn't an XY problem though, what's the high-level problem you're trying to solve? Best, Erick On Mon, Jul 13, 2015 at 12:49 PM, Raja Pothuganti rpothuga...@competitrack.com wrote: Hi, We are setting up a new SolrCloud environment with 5.2.1 on Ubuntu boxes. We currently ingest data into a large collection, call it LIVE. After the full ingest is done we then trigger a delta delta ingestion every 15 minutes to get the documents data that have changed into this LIVE instance. In Solr 4.X using a Master / Slave setup we had slaves that would periodically (weekly, or monthly) refresh their data from the Master rather than every 15 minutes. We're now trying to figure out how to get this same type of setup using SolrCloud. Question(s): - Is there a way to copy data from one SolrCloud collection into another quickly and easily? - Is there a way to programmatically control when a replica receives it's data or possibly move it to another collection (without losing data) that updates on a different interval? It ideally would be another collection name, call it Week1 ... Week52 ... to avoid a replica in the same collection serving old data. One option we thought of was to create a backup and then restore that into a new clean cloud. This has a lot of moving parts and isn't nearly as neat as the Master / Slave controlled replication setup. It also has the side effect of potentially taking a very long time to backup and restore instead of just copying the indexes like the old M/S setup. Any ideas of thoughts? Thanks in advance for you help. Raja
Re: XML File Size for Post.jar
Solr ships with XML processing example for DIH in the examples directory (RSS core). In your case, you will most probably read the filelist or directory list and then run XML processor as a nested entity. So, check the nested example at https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 13 July 2015 at 15:12, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: I Can break that into smaller files but for other case the number of files growing in 100s.. Can I Parse XML Files to DIH..? Can you refer few examples..? Thanks Ravi -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Monday, July 13, 2015 3:01 PM To: solr-user Subject: Re: XML File Size for Post.jar I don't think you can do files that big. The memory would blow out. You sure you cannot chunk it into smaller document sets? Or make it a streaming parsing with DIH in a pull fashion? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 13 July 2015 at 14:56, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: HI, Where I have to change to support the xml file more than 2GB to Index in Solr, using the simple post tool (post.jar) for Jetty and Tomcat. Thanks Ravi
Re: Range Facet queries for date ranges with with non-constant gaps
Are there any examples/documentation for IntervalFaceting using dates that I could refer to? On Mon, Jul 13, 2015 at 6:36 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Some of the buckets return with a count of ‘0’ in the bucket even though : the facet.range.min is set to ‘1’. That is not the primary issue facet.range.min has never been a supported (or documented) param -- you are most likeley trying to use facet.mincount (which can be specified per field as a top level f.my_field_name.facet.mincount, or as a localparam, ex: facet.range={!facet.mincount=1}my_field_name : though. What I would like to get back are buckets of unevenly spaced : gaps. For example, counts for the last 7 days, last 30 days, last 90 : days. what you are describing is exactly what the Interval Faceting feature provides... https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-IntervalFaceting -Hoss http://www.lucidworks.com/
Why I get a hit on %, , but not on !, @, #, $, ^, *
Hi Everyone, I think the subject line said it all. Here is the schema I'm using: fieldType name=my_text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 splitOnNumerics=1 stemEnglishPossessive=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I'm guessing this is due to how solr.WhitespaceTokenizerFactory works and those that it is not indexing are removed because they are considered white-spaces? If so, how can I include %, , etc. into this none-indexed list? I would rather see all these not indexed vs some are and some are not causing confusion to my users. Thanks Steve
Re: copying data from one collection to another collection (solr cloud 521)
Thank you Erick Actually, my question is why do it this way at all? Why not index directly to your live nodes? This is what SolrCloud is built for. You an use implicit routing to create shards say, for each week and age out the ones that are too old as well. Any updates to EXISTING document in the LIVE collection should NOT be replicated to the previous week(s) snapshot(s). Think of the snapshot(s) as an archive of sort and searchable independent of LIVE. We're aiming to support at most 2 archives of data in the past. Another option would be to use collection aliasing to keep an offline index up to date then switch over when necessary. Does offline indexing refers to this link https://github.com/cloudera/search/tree/0d47ff79d6ccc0129ffadcb50f9fe0b271f 102aa/search-mr Thanks Raja On 7/13/15, 3:14 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, my question is why do it this way at all? Why not index directly to your live nodes? This is what SolrCloud is built for. There's the new backup/restore functionality that's still a work in progress, see: https://issues.apache.org/jira/browse/SOLR-5750 You an use implicit routing to create shards say, for each week and age out the ones that are too old as well. Another option would be to use collection aliasing to keep an offline index up to date then switch over when necessary. I'd really like to know this isn't an XY problem though, what's the high-level problem you're trying to solve? Best, Erick On Mon, Jul 13, 2015 at 12:49 PM, Raja Pothuganti rpothuga...@competitrack.com wrote: Hi, We are setting up a new SolrCloud environment with 5.2.1 on Ubuntu boxes. We currently ingest data into a large collection, call it LIVE. After the full ingest is done we then trigger a delta delta ingestion every 15 minutes to get the documents data that have changed into this LIVE instance. In Solr 4.X using a Master / Slave setup we had slaves that would periodically (weekly, or monthly) refresh their data from the Master rather than every 15 minutes. We're now trying to figure out how to get this same type of setup using SolrCloud. Question(s): - Is there a way to copy data from one SolrCloud collection into another quickly and easily? - Is there a way to programmatically control when a replica receives it's data or possibly move it to another collection (without losing data) that updates on a different interval? It ideally would be another collection name, call it Week1 ... Week52 ... to avoid a replica in the same collection serving old data. One option we thought of was to create a backup and then restore that into a new clean cloud. This has a lot of moving parts and isn't nearly as neat as the Master / Slave controlled replication setup. It also has the side effect of potentially taking a very long time to backup and restore instead of just copying the indexes like the old M/S setup. Any ideas of thoughts? Thanks in advance for you help. Raja
Re: Range Facet queries for date ranges with with non-constant gaps
: Some of the buckets return with a count of ‘0’ in the bucket even though : the facet.range.min is set to ‘1’. That is not the primary issue facet.range.min has never been a supported (or documented) param -- you are most likeley trying to use facet.mincount (which can be specified per field as a top level f.my_field_name.facet.mincount, or as a localparam, ex: facet.range={!facet.mincount=1}my_field_name : though. What I would like to get back are buckets of unevenly spaced : gaps. For example, counts for the last 7 days, last 30 days, last 90 : days. what you are describing is exactly what the Interval Faceting feature provides... https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-IntervalFaceting -Hoss http://www.lucidworks.com/
Range Facet queries for date ranges with with non-constant gaps
I am trying to do a range facet query for on date ranges. The query below executes and returns results (almost) as desired for 60DAY buckets. http://localhost:8983/solr/mykeyspace2.user_data/select?wt=jsonfq:id=7465033q=*:*rows=0indent=truefacet=onfacet.range=login_eventfacet.range.gap=%2B60DAYfacet.range.start=NOW/YEARfacet.range.end=NOW/MONTH%2B1MONTHfacet.range.min=1 Some of the buckets return with a count of ‘0’ in the bucket even though the facet.range.min is set to ‘1’. That is not the primary issue though. What I would like to get back are buckets of unevenly spaced gaps. For example, counts for the last 7 days, last 30 days, last 90 days. What would be the best way to accomplish this?And is there something wrong with facet.range.min usage?
Re: Querying Nested documents
Hi Rameshn, I would suggest you to rewrite your mail. It is really heavy to understand! Try to format your document and nested document in a nice way ( remember a document is a map field- value), let's try to not over complicate the things ! Furthermore, try to express the query as well not encoded. It will let us help you much more efficiently, without loosing 10 minutes decoding the mail :) Cheers 2015-07-13 17:03 GMT+01:00 rameshn ramesh.nuthalap...@gmail.com: Hi, I have question regarding nested documents.My document looks like below, 1234xger00parent 2015-06-15T13:29:07ZegeDuperhttp://www.domain.com zoome1234-images http://somedomain.com/some.jpg1:1 1234-platform-iosios https://somedomain.comsomelinkfalse 2015-03-23T10:58:00Z-12-30T19:00:00Z 1234-platform-androidandroid somedomain.comsomelinkfalse 2015-03-23T10:58:00Z-12-30T19:00:00Z Right now I can query like thishttp://localhost:8983/solr/demo/select?q={!parent%20which=%27type:parent%27}fl=*,[child%20parentFilter=type:parent%20childFilter=image_uri_s:*]indent=trueand get the parent and child document with matching criteria (just parent and image child document).*But, I want to get all other children* (1234-platform-ios and 1234-platform-andriod) even if i query based on image_uri_s (1234-images) although they are other children which are part of the parent document.Is it possible ?Appreciate your help !Thanks,Ramesh -- View this message in context: http://lucene.472066.n3.nabble.com/Querying-Nested-documents-tp4217088.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Persistence problem with swapped cores after Solr restart -- 4.9.1
On Solr 4.9.1 with core discovery, I seem to be having trouble with core swaps not persisting through a full Solr restart. I apologize for the fact that this message is lean on details ... I've seen the problem twice now, but I don't have any concrete before/after information about what's in each core.properties file. I am attempting to set up the scenario again and gather that information. The entire directory structure is set up as a git repo, so I will be able to tell if any files (like core.properties) are modified for the rebuild/swap that I have started. The repo shows no changes at the moment, but I have done several of these rebuild/swap operations, so even if core.properties is being correctly updated, it might just have landed back on the original configuration. I have another copy of my index using Solr 4.7.2 with the old solr.xml format that seems to have no problems with core swapping and persistence. That works differently, though -- all cores are defined in solr.xml rather than with core.properties files. When I first set up these Solr instances, I don't recall having this problem, but full Solr restarts are really rare, so it's possible I just didn't create the right circumstances. Thanks, Shawn
RE: Range Facet queries for date ranges with with non-constant gaps
Try facet.mincount=1. It will still apply to range facets. -Original Message- From: JoeSmith [mailto:fidw...@gmail.com] Sent: Monday, July 13, 2015 5:56 PM To: solr-user Subject: Range Facet queries for date ranges with with non-constant gaps I am trying to do a range facet query for on date ranges. The query below executes and returns results (almost) as desired for 60DAY buckets. http://localhost:8983/solr/mykeyspace2.user_data/select?wt=jsonfq:id=7465033q=*:*rows=0indent=truefacet=onfacet.range=login_eventfacet.range.gap=%2B60DAYfacet.range.start=NOW/YEARfacet.range.end=NOW/MONTH%2B1MONTHfacet.range.min=1 Some of the buckets return with a count of ‘0’ in the bucket even though the facet.range.min is set to ‘1’. That is not the primary issue though. What I would like to get back are buckets of unevenly spaced gaps. For example, counts for the last 7 days, last 30 days, last 90 days. What would be the best way to accomplish this?And is there something wrong with facet.range.min usage? * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF *
Re: Querying Nested documents
Hi rameshn, Nabble has a nasty habit of stripping out HTML and XML markup before sending your mail out to the mailing list - see your message quoted below for how it appears to people who aren’t reading via Nabble. My suggestion: directly subscribe to the solr-user mailing list[1] and avoid Nabble. (They’ve known about the problem for many years and AFAICT have done nothing about it.) Steve [1] https://lucene.apache.org/solr/resources.html#mailing-lists On Jul 13, 2015, at 12:03 PM, rameshn ramesh.nuthalap...@gmail.com wrote: Hi, I have question regarding nested documents.My document looks like below, 1234xger00parent 2015-06-15T13:29:07ZegeDuperhttp://www.domain.com zoome1234-images http://somedomain.com/some.jpg1:1 1234-platform-iosios https://somedomain.comsomelinkfalse 2015-03-23T10:58:00Z-12-30T19:00:00Z 1234-platform-androidandroid somedomain.comsomelinkfalse 2015-03-23T10:58:00Z-12-30T19:00:00Z Right now I can query like thishttp://localhost:8983/solr/demo/select?q={!parent%20which=%27type:parent%27}fl=*,[child%20parentFilter=type:parent%20childFilter=image_uri_s:*]indent=trueand get the parent and child document with matching criteria (just parent and image child document).*But, I want to get all other children* (1234-platform-ios and 1234-platform-andriod) even if i query based on image_uri_s (1234-images) although they are other children which are part of the parent document.Is it possible ?Appreciate your help !Thanks,Ramesh -- View this message in context: http://lucene.472066.n3.nabble.com/Querying-Nested-documents-tp4217088.html Sent from the Solr - User mailing list archive at Nabble.com.
Querying Nested documents
(Duplicate post as the xml is not formatted well in nabble, so posting directly to the list) Hi, I have question regarding nested documents. My document looks like below, doc field name=id1234/field field name=pk_idxger/field field name=title_t![CDATA[title]]/field field name=description_t![CDATA[this is a test]]/field field name=specCount_i0/field field name=viewCount_i0/field field name=lastModifiedDate_dt2015-06-15T13:29:07Z/field field name=vert_id_sege/field field name=vert_name_sDuper/field field name=vert_url_shttp://www.domain.com/field field name=sere_szoome/field field name=typeparent/field doc field name=id1234-images/field field name=image_uri_shttp://somedomain.com/some.jpg/field field name=image_flatten_s1:1/field /doc doc field name=id1234-platform-ios/field field name=platform_sios/field field name=downloadU_shttps://somedomain.com/field field name=link_ssomelink/field field name=authRequired_sfalse/field field name=startDate_s2015-03-23T10:58:00Z/field field name=endDate_s-12-30T19:00:00Z/field /doc doc field name=id1234-platform-android/field field name=platform_sandroid/field field name=downloadU_ssomedomain.com/field field name=link_ssomelink/field field name=authRequired_sfalse/field field name=startDate_s2015-03-23T10:58:00Z/field field name=endDate_s-12-30T19:00:00Z/field /doc /doc Right now I can query like this http://localhost:8983/solr/demo/select?q= {!parent%20which=%27type:parent%27}fl=*,[child%20parentFilter=type:parent%20childFilter=image_uri_s:*]indent=true and get the parent and child document with matching criteria (just parent and image child document). *But, I want to get all other children* (1234-platform-ios and 1234-platform-andriod) even if i query based on image_uri_s (1234-images) although they are other children which are part of the parent document. Is it possible ? Appreciate your help ! Thanks, Ramesh
Re: Querying Nested documents
My sincere Apologies. Re-submitted directly to the list. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Querying-Nested-documents-tp4217088p4217166.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FieldCache error for multivalued fields in json facets.
On Mon, Jul 13, 2015, at 06:55 AM, Iana Bondarska wrote: Hi, I'm using json query api for solr 5.2. When query for metrics for multivalued fields, I get error: can not use FieldCache on multivalued field: sales. I've found in solr wiki that to avoid using fieldcache I should set facet.method parameter to enum. Now my question is how can I add facet.enum parameter to query? My original query looks like this: {limit:0,offset:0,facet:{facet:{facet:{mechanicnumbers_sum:sum(sales)},limit:0,field:brand,type:terms}}} Adding method:enum inside facet doesn't help. Adding facet.method=enum outside json parameter also doesn't help. Can you provide the whole exception, including stack trace? This looks like a bug to me, as it should switch to using the FieldValueCache for multivalued fields rather than fail to use the FieldCache. Upayavira