Re: How to Manage RAM Usage at Heavy Indexing
This is sounding like an XY problem. What are you measuring when you say RAM usage is 99%? is this virtual memory? See: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html What errors are you seeing when you say: "my node stops to receiving documents"? How are you sending 10M documents? All at once in a huge packet or some smaller number at a time? From where? How? And what does Hadoop have to do with anything? Are you putting the Solr index on Hadoop? How? The recent contrib? In short, you haven't provided very many details. You've been around long enough that I'm surprised you're saying "it doesn't work, how can I fix it?" without providing much in the way of details to help us help you. Best Erick On Sat, Aug 24, 2013 at 1:52 PM, Furkan KAMACI wrote: > I make a test at my SolrCloud. I try to send 100 millions documents into my > node which has no replica via Hadoop. When document count send to that node > is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap Usage > is not 99%, it uses just 3GB - 4GB of RAM). After a time later my node > stops to receiving documents to index and the Indexer Job fails as well. > > How can I force to clean OS cache (if it is OS cache that blocks) me or > what should I do (maybe sending 10 million documents and waiting a little > etc.) What fellows do at heavy indexing situations? >
How to Manage RAM Usage at Heavy Indexing
I make a test at my SolrCloud. I try to send 100 millions documents into my node which has no replica via Hadoop. When document count send to that node is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap Usage is not 99%, it uses just 3GB - 4GB of RAM). After a time later my node stops to receiving documents to index and the Indexer Job fails as well. How can I force to clean OS cache (if it is OS cache that blocks) me or what should I do (maybe sending 10 million documents and waiting a little etc.) What fellows do at heavy indexing situations?
Re: SOLR Prevent solr of modifying fields when update doc
bq: but the uniqueId is generated by me. But when solr indexes and there is an update in a doc, it deletes the doc and creates a new one, so it generates a new UUID. right, this is why I was saying that a UUID field may not fit your use case. The _point_ of a UUID field is to generate a unique entry for every added document, there's no concept of "only generate the UUID once per indexed" which seems to be what you want. So I'd do something like just use the field rather than a separate UUID field. That doesn't change by definition. What advantage do you think you get from the UUID field over just using your field? Best, Erick On Sat, Aug 24, 2013 at 6:26 AM, Luis Portela Afonso wrote: > Hi, > > The uuid, that was been used like the id of a document, it's generated by > solr using an updatechain. > I just use the recommend method to generate uuid's. > > I think an atomic update is not suitable for me, because I want that solr > indexes the feeds and not me. I don't want to send information to solr, I > want that indexes it each 15 minutes, for example, and now it's doing that. > > Lance, I don't understand what you want to say with, software that I use to > index. > I just use solr. I have a configuration with two entities. One that selects > my rss sources from a database and then the main entity that get > information from an URL and processes it. > > Thank you all for the answers. > Much appreciated > > On Saturday, August 24, 2013, Greg Preston wrote: > > > But there is an API for sending a delta over the wire, and server side it > > does a read, overlay, delete, and insert. And only the fields you sent > > will be changed. > > > > *Might require your unchanged fields to all be stored, though. > > > > > > -Greg > > > > > > On Fri, Aug 23, 2013 at 7:08 PM, Lance Norskog > > > wrote: > > > > > Solr does not by default generate unique IDs. It uses what you give as > > > your unique field, usually called 'id'. > > > > > > What software do you use to index data from your RSS feeds? Maybe that > is > > > creating a new 'id' field? > > > > > > There is no partial update, Solr (Lucene) always rewrites the complete > > > document. > > > > > > > > > On 08/23/2013 09:03 AM, Greg Preston wrote: > > > > > >> Perhaps an atomic update that only changes the fields you want to > > change? > > >> > > >> -Greg > > >> > > >> > > >> On Fri, Aug 23, 2013 at 4:16 AM, Luís Portela Afonso > > >> > wrote: > > >> > > >>> Hi thanks by the answer, but the uniqueId is generated by me. But > when > > >>> solr indexes and there is an update in a doc, it deletes the doc and > > >>> creates a new one, so it generates a new UUID. > > >>> It is not suitable for me, because i want that solr just updates some > > >>> fields, because the UUID is the key that i use to map it to an user > in > > my > > >>> database. > > >>> > > >>> Right now i'm using information that comes from the source and never > > >>> chages, as my uniqueId, like for example the guid, that exists in > some > > rss > > >>> feeds, or if it doesn't exists i use link. > > >>> > > >>> I think there is any simple solution for me, because for what i have > > >>> read, when an update to a doc exists, SOLR deletes the old one and > > create a > > >>> new one, right? > > >>> > > >>> On Aug 23, 2013, at 12:07 PM, Erick Erickson < > erickerick...@gmail.com > > > > > >>> wrote: > > >>> > > >>> Well, not much in the way of help because you can't do what you > > want AFAIK. I don't think UUID is suitable for your use-case. Why > not > > use your ? > > > > Or generate something yourself... > > > > Best > > Erick > > > > > > On Thu, Aug 22, 2013 at 5:56 PM, Luís Portela Afonso < > > meligalet...@gmail.com > > > > > wrote: > > > Hi, > > > > > > How can i prevent solr from update some fields when updating a doc? > > > The problem is, i have an uuid with the field name uuid, but it is > > not > > > an > > > unique key. When a rss source updates a feed, solr will update the > > doc > > > with > > > the same link but it generates a new uuid. This is not the desired > > > because > > > this id is used by me to relate feeds with an user. > > > > > > Can someone help me? > > > > > > Many Thanks > > > > > > > > > > > > > -- > Sent from Gmail Mobile >
Re: Schema.xml definition problem
Solr does not index arbitrary XML, it only indexes XML in a very specific format. You could write some kind of SolrJ program that parsed your XML docs and constructed the appropriate SolrInputDocuments. You could use DIH with some of the XML/XSL transformations, but be aware that the XSLT bits don't implement the full specification. Best, Erick On Fri, Aug 23, 2013 at 2:34 PM, Everton Garcia wrote: > Hello > I want to index the XML below with multivalued fields. > What better way to set the schema.xml since there are nested data? > Thank you. > > > > > > > > > > > //String > > > > > > > //String > > > > > > > //Date > > > > > > > //String > > > > > > > //Multivalued > > > > > > > > //First register > > > > > > > > //String > > > > > > > //String > > > > > > > //String > > > > > > > //Multivalued > > > > > > > > //First register > > > > > > > > //String > > > > > > > //String > > > > > > > //Multivalued > > > > > > > > //First register > > > > > > > > //String > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > *Everton Rodrigues Garcia* >
Re: Storing query results
bq: Also, my boss told me it unequivocally has to be this way :p Pesky bosses . But how often is the index changing? If you're not doing any updates to it, then the problem is moot the other way to approach this problem is to just control when the index changes. Would it suffice to only have the data (possibly) change once every hour? Day? whatever? FWIW, Erick On Fri, Aug 23, 2013 at 11:57 AM, jfeist wrote: > I completely agree. I would prefer to just rerun the search each time. > However, we are going to be replacing our rdb based search with something > like Solr, and the application currently behaves this way. Our users > understand that the search is essentially a snapshot (and I would guess > many > prefer this over changing results) and we don't want to change existing > behavior and confuse anyone. Also, my boss told me it unequivocally has to > be this way :p > > Thanks for your input though, looks like I'm going to have to do something > like you've suggested within our application. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Storing-query-results-tp4086182p4086349.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Indexing status when one tomcat goes down
Well, "it depends". If the tomcat that went down contains all the replicas (leader and follower) then indexing will halt, searching should continue with indications that you're getting partial results back. If at least one node for each shard is still active, you should be fine. There may be some chatter while the new leader is elected, but the cluster should adjust and you should continue. Best, Erick On Fri, Aug 23, 2013 at 5:47 AM, Prasi S wrote: > hi all, > Im running solr cloud with solr 4.4 . I have 2 tomcat instances with 4 > shards ( 2 in each). > > What will happen if one of the tomcats go down during indexing. The otehr > tomcat throws status as " Leader not active" in the logs. > > Regards, > Prasi >
Re: Problem with SolrCloud + Zookeeper
Usually that error means you have a mix of old and new jars in your classpath somehow. How that's only being triggered when you have multiple nodes I'm not sure. By chance have you copied any jars into different places somehow? Best Erick On Fri, Aug 23, 2013 at 2:48 AM, 兴涛孙 wrote: > hello,guys: > > > > I've encounted a problem about configuring many cluested nodes with > solr,so i want to ask for your help,thanks in advance! > > The problems lists as follows: > > 1.Installation platform: > > solr4.3.1,zookeeper 3.4.5 and tomcat 7 with jdk1.7 > > 2.when i configured single node with DIH to build index,it can work > properly,but when i configured one more(three nodes present) shard nodes > with solr,then import data with DIH,some errors occured,i cannot find the > reason,error information lists as follows: > > 2013-08-09 08:19:24,349 : ERROR [http-bio-8983-exec-72] > null:java.lang.ClassCastException: > org.apache.lucene.codecs.BlockTreeTermsWriter$PendingTerm cannot be cast to > org.apache.lucene.codecs.BlockTreeTermsWriter$PendingBlock > > 3.It can be successfully import on each node independently. > > 4.Details information and error messages enclosed with this message,for > more information,please refer to it. > > > > Looking forward to hearing from > you,thanks a lot. > > > > Yours, > > >
Re: SOLR Prevent solr of modifying fields when update doc
Hi, The uuid, that was been used like the id of a document, it's generated by solr using an updatechain. I just use the recommend method to generate uuid's. I think an atomic update is not suitable for me, because I want that solr indexes the feeds and not me. I don't want to send information to solr, I want that indexes it each 15 minutes, for example, and now it's doing that. Lance, I don't understand what you want to say with, software that I use to index. I just use solr. I have a configuration with two entities. One that selects my rss sources from a database and then the main entity that get information from an URL and processes it. Thank you all for the answers. Much appreciated On Saturday, August 24, 2013, Greg Preston wrote: > But there is an API for sending a delta over the wire, and server side it > does a read, overlay, delete, and insert. And only the fields you sent > will be changed. > > *Might require your unchanged fields to all be stored, though. > > > -Greg > > > On Fri, Aug 23, 2013 at 7:08 PM, Lance Norskog > > > wrote: > > > Solr does not by default generate unique IDs. It uses what you give as > > your unique field, usually called 'id'. > > > > What software do you use to index data from your RSS feeds? Maybe that is > > creating a new 'id' field? > > > > There is no partial update, Solr (Lucene) always rewrites the complete > > document. > > > > > > On 08/23/2013 09:03 AM, Greg Preston wrote: > > > >> Perhaps an atomic update that only changes the fields you want to > change? > >> > >> -Greg > >> > >> > >> On Fri, Aug 23, 2013 at 4:16 AM, Luís Portela Afonso > >> > wrote: > >> > >>> Hi thanks by the answer, but the uniqueId is generated by me. But when > >>> solr indexes and there is an update in a doc, it deletes the doc and > >>> creates a new one, so it generates a new UUID. > >>> It is not suitable for me, because i want that solr just updates some > >>> fields, because the UUID is the key that i use to map it to an user in > my > >>> database. > >>> > >>> Right now i'm using information that comes from the source and never > >>> chages, as my uniqueId, like for example the guid, that exists in some > rss > >>> feeds, or if it doesn't exists i use link. > >>> > >>> I think there is any simple solution for me, because for what i have > >>> read, when an update to a doc exists, SOLR deletes the old one and > create a > >>> new one, right? > >>> > >>> On Aug 23, 2013, at 12:07 PM, Erick Erickson > >>> > > > >>> wrote: > >>> > >>> Well, not much in the way of help because you can't do what you > want AFAIK. I don't think UUID is suitable for your use-case. Why not > use your ? > > Or generate something yourself... > > Best > Erick > > > On Thu, Aug 22, 2013 at 5:56 PM, Luís Portela Afonso < > meligalet...@gmail.com > > > wrote: > > Hi, > > > > How can i prevent solr from update some fields when updating a doc? > > The problem is, i have an uuid with the field name uuid, but it is > not > > an > > unique key. When a rss source updates a feed, solr will update the > doc > > with > > the same link but it generates a new uuid. This is not the desired > > because > > this id is used by me to relate feeds with an user. > > > > Can someone help me? > > > > Many Thanks > > > > > > -- Sent from Gmail Mobile