Re: How to Manage RAM Usage at Heavy Indexing

2013-08-24 Thread Erick Erickson
This is sounding like an XY problem. What are you measuring
when you say RAM usage is 99%? is this virtual memory? See:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

What errors are you seeing when you say: "my node stops to receiving
documents"?

How are you sending 10M documents? All at once in a huge packet
or some smaller number at a time? From where? How?

And what does Hadoop have to do with anything? Are you putting
the Solr index on Hadoop? How? The recent contrib?

In short, you haven't provided very many details. You've been around
long enough that I'm surprised you're saying "it doesn't work, how can
I fix it?" without providing much in the way of details to help us help
you.

Best
Erick



On Sat, Aug 24, 2013 at 1:52 PM, Furkan KAMACI wrote:

> I make a test at my SolrCloud. I try to send 100 millions documents into my
> node which has no replica via Hadoop. When document count send to that node
> is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap Usage
> is not 99%, it uses just 3GB - 4GB of RAM). After a time later my node
> stops to receiving documents to index and the Indexer Job fails as well.
>
> How can I force to clean OS cache (if it is OS cache that blocks) me or
> what should I do (maybe sending 10 million documents and waiting a little
> etc.) What fellows do at heavy indexing situations?
>


How to Manage RAM Usage at Heavy Indexing

2013-08-24 Thread Furkan KAMACI
I make a test at my SolrCloud. I try to send 100 millions documents into my
node which has no replica via Hadoop. When document count send to that node
is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap Usage
is not 99%, it uses just 3GB - 4GB of RAM). After a time later my node
stops to receiving documents to index and the Indexer Job fails as well.

How can I force to clean OS cache (if it is OS cache that blocks) me or
what should I do (maybe sending 10 million documents and waiting a little
etc.) What fellows do at heavy indexing situations?


Re: SOLR Prevent solr of modifying fields when update doc

2013-08-24 Thread Erick Erickson
bq:  but the uniqueId is generated by me. But when solr indexes and there
is an update in a doc, it deletes the doc and creates a new one, so it
generates a new UUID.

right, this is why I was saying that a UUID field may not fit your use
case. The _point_ of a UUID field is to generate a unique entry for every
added document, there's no concept of "only generate the UUID once per
 indexed" which seems to be what you want.

So I'd do something like just use the  field rather than a
separate UUID field. That doesn't change by definition. What advantage do
you think you get from the UUID field over just using your 
field?

Best,
Erick


On Sat, Aug 24, 2013 at 6:26 AM, Luis Portela Afonso  wrote:

> Hi,
>
> The uuid, that was been used like the id of a document, it's generated by
> solr using an updatechain.
> I just use the recommend method to generate uuid's.
>
> I think an atomic update is not suitable for me, because I want that solr
> indexes the feeds and not me. I don't want to send information to solr, I
> want that indexes it each 15 minutes, for example, and now it's doing that.
>
> Lance, I don't understand what you want to say with, software that I use to
> index.
> I just use solr. I have a configuration with two entities. One that selects
> my rss sources from a database and then the main entity that get
> information from an URL and processes it.
>
> Thank you all for the answers.
> Much appreciated
>
> On Saturday, August 24, 2013, Greg Preston wrote:
>
> > But there is an API for sending a delta over the wire, and server side it
> > does a read, overlay, delete, and insert.  And only the fields you sent
> > will be changed.
> >
> > *Might require your unchanged fields to all be stored, though.
> >
> >
> > -Greg
> >
> >
> > On Fri, Aug 23, 2013 at 7:08 PM, Lance Norskog  >
> > wrote:
> >
> > > Solr does not by default generate unique IDs. It uses what you give as
> > > your unique field, usually called 'id'.
> > >
> > > What software do you use to index data from your RSS feeds? Maybe that
> is
> > > creating a new 'id' field?
> > >
> > > There is no partial update, Solr (Lucene) always rewrites the complete
> > > document.
> > >
> > >
> > > On 08/23/2013 09:03 AM, Greg Preston wrote:
> > >
> > >> Perhaps an atomic update that only changes the fields you want to
> > change?
> > >>
> > >> -Greg
> > >>
> > >>
> > >> On Fri, Aug 23, 2013 at 4:16 AM, Luís Portela Afonso
> > >> > wrote:
> > >>
> > >>> Hi thanks by the answer, but the uniqueId is generated by me. But
> when
> > >>> solr indexes and there is an update in a doc, it deletes the doc and
> > >>> creates a new one, so it generates a new UUID.
> > >>> It is not suitable for me, because i want that solr just updates some
> > >>> fields, because the UUID is the key that i use to map it to an user
> in
> > my
> > >>> database.
> > >>>
> > >>> Right now i'm using information that comes from the source and never
> > >>> chages, as my uniqueId, like for example the guid, that exists in
> some
> > rss
> > >>> feeds, or if it doesn't exists i use link.
> > >>>
> > >>> I think there is any simple solution for me, because for what i have
> > >>> read, when an update to a doc exists, SOLR deletes the old one and
> > create a
> > >>> new one, right?
> > >>>
> > >>> On Aug 23, 2013, at 12:07 PM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > >>> wrote:
> > >>>
> > >>>  Well, not much in the way of help because you can't do what you
> >  want AFAIK. I don't think UUID is suitable for your use-case. Why
> not
> >  use your ?
> > 
> >  Or generate something yourself...
> > 
> >  Best
> >  Erick
> > 
> > 
> >  On Thu, Aug 22, 2013 at 5:56 PM, Luís Portela Afonso <
> >  meligalet...@gmail.com 
> > 
> > > wrote:
> > > Hi,
> > >
> > > How can i prevent solr from update some fields when updating a doc?
> > > The problem is, i have an uuid with the field name uuid, but it is
> > not
> > > an
> > > unique key. When a rss source updates a feed, solr will update the
> > doc
> > > with
> > > the same link but it generates a new uuid. This is not the desired
> > > because
> > > this id is used by me to relate feeds with an user.
> > >
> > > Can someone help me?
> > >
> > > Many Thanks
> > >
> > 
> > >
> >
>
>
> --
> Sent from Gmail Mobile
>


Re: Schema.xml definition problem

2013-08-24 Thread Erick Erickson
Solr does not index arbitrary XML, it only indexes XML in a very
specific format.

You could write some kind of SolrJ program that parsed your XML
docs and constructed the appropriate SolrInputDocuments.

You could use DIH with some of the XML/XSL transformations,
but be aware that the XSLT bits don't implement the full
specification.

Best,
Erick


On Fri, Aug 23, 2013 at 2:34 PM, Everton Garcia  wrote:

> Hello
> I want to index the XML below with multivalued fields.
> What better way to set the schema.xml since there are nested data?
> Thank you.
>
>   
>
>
>
>
>
>
>
>
>  //String
>
>
>
>
>
>
>  //String
>
>
>
>
>
>
>  //Date
>
>
>
>
>
>
>  //String
>
>
>
>
>
>
>  //Multivalued
>
>
>
>
>
>
>
>  //First register
>
>
>
>
>
>
>
>  //String
>
>
>
>
>
>
>  //String
>
>
>
>
>
>
>  //String
>
>
>
>
>
>
>  //Multivalued
>
>
>
>
>
>
>
>  //First register
>
>
>
>
>
>
>
>  //String
>
>
>
>
>
>
>  //String
>
>
>
>
>
>
>  //Multivalued
>
>
>
>
>
>
>
>  //First register
>
>
>
>
>
>
>
>  //String
>
>
>
>
>
> 
>
>
>
>
>
>
> 
>
>
>
>
>
>
> 
>
>
>
>
>
>
> 
>
>
>
>
>
>
> 
>
>
>
>
>
>
> 
>
>
>
>
>
>
>  
>
>
>
>
>
>
>
>
>
>
>
>
> --
> *Everton Rodrigues Garcia*
>


Re: Storing query results

2013-08-24 Thread Erick Erickson
bq:  Also, my boss told me it unequivocally has to be this way :p

Pesky bosses .

But how often is the index changing? If you're not doing any updates
to it, then the problem is moot the other way to approach this problem
is to just control when the index changes. Would it suffice to only have the
data (possibly) change once every hour? Day? whatever?

FWIW,
Erick


On Fri, Aug 23, 2013 at 11:57 AM, jfeist  wrote:

> I completely agree.  I would prefer to just rerun the search each time.
> However, we are going to be replacing our rdb based search with something
> like Solr, and the application currently behaves this way.  Our users
> understand that the search is essentially a snapshot (and I would guess
> many
> prefer this over changing results) and we don't want to change existing
> behavior and confuse anyone.  Also, my boss told me it unequivocally has to
> be this way :p
>
> Thanks for your input though, looks like I'm going to have to do something
> like you've suggested within our application.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Storing-query-results-tp4086182p4086349.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Indexing status when one tomcat goes down

2013-08-24 Thread Erick Erickson
Well, "it depends". If the tomcat that went down contains all
the replicas (leader and follower) then indexing will halt,
searching should continue with indications that you're getting
partial results back.

If at least one node for each shard is still active, you should
be fine. There may be some chatter while the new leader is
elected, but the cluster should adjust and you should continue.

Best,
Erick


On Fri, Aug 23, 2013 at 5:47 AM, Prasi S  wrote:

> hi all,
> Im running solr cloud with solr 4.4 . I have 2 tomcat instances with 4
> shards ( 2 in each).
>
> What will happen if one of the tomcats go down during indexing. The otehr
> tomcat throws status as " Leader not active" in the logs.
>
> Regards,
> Prasi
>


Re: Problem with SolrCloud + Zookeeper

2013-08-24 Thread Erick Erickson
Usually that error means you have a mix of old and new jars
in your classpath somehow. How that's only being triggered
when you have multiple nodes I'm not sure. By chance have
you copied any jars into different places somehow?

Best
Erick


On Fri, Aug 23, 2013 at 2:48 AM, 兴涛孙  wrote:

> hello,guys:
>
>
>
> I've encounted a problem about configuring many cluested nodes with
> solr,so i want to ask for your help,thanks in advance!
>
> The problems lists as follows:
>
> 1.Installation platform:
>
> solr4.3.1,zookeeper 3.4.5 and tomcat 7 with jdk1.7
>
> 2.when i configured single node with DIH to build index,it can work
> properly,but when i configured one more(three nodes present) shard nodes
> with solr,then import data with DIH,some errors occured,i cannot find the
> reason,error information lists as follows:
>
> 2013-08-09 08:19:24,349 : ERROR [http-bio-8983-exec-72]
> null:java.lang.ClassCastException:
> org.apache.lucene.codecs.BlockTreeTermsWriter$PendingTerm cannot be cast to
> org.apache.lucene.codecs.BlockTreeTermsWriter$PendingBlock
>
> 3.It can be successfully import on each node independently.
>
> 4.Details information and error messages enclosed with this message,for
> more information,please refer to it.
>
>
>
>  Looking forward to hearing from
> you,thanks a lot.
>
>
>
> Yours,
>
>
>


Re: SOLR Prevent solr of modifying fields when update doc

2013-08-24 Thread Luis Portela Afonso
Hi,

The uuid, that was been used like the id of a document, it's generated by
solr using an updatechain.
I just use the recommend method to generate uuid's.

I think an atomic update is not suitable for me, because I want that solr
indexes the feeds and not me. I don't want to send information to solr, I
want that indexes it each 15 minutes, for example, and now it's doing that.

Lance, I don't understand what you want to say with, software that I use to
index.
I just use solr. I have a configuration with two entities. One that selects
my rss sources from a database and then the main entity that get
information from an URL and processes it.

Thank you all for the answers.
Much appreciated

On Saturday, August 24, 2013, Greg Preston wrote:

> But there is an API for sending a delta over the wire, and server side it
> does a read, overlay, delete, and insert.  And only the fields you sent
> will be changed.
>
> *Might require your unchanged fields to all be stored, though.
>
>
> -Greg
>
>
> On Fri, Aug 23, 2013 at 7:08 PM, Lance Norskog 
> >
> wrote:
>
> > Solr does not by default generate unique IDs. It uses what you give as
> > your unique field, usually called 'id'.
> >
> > What software do you use to index data from your RSS feeds? Maybe that is
> > creating a new 'id' field?
> >
> > There is no partial update, Solr (Lucene) always rewrites the complete
> > document.
> >
> >
> > On 08/23/2013 09:03 AM, Greg Preston wrote:
> >
> >> Perhaps an atomic update that only changes the fields you want to
> change?
> >>
> >> -Greg
> >>
> >>
> >> On Fri, Aug 23, 2013 at 4:16 AM, Luís Portela Afonso
> >> > wrote:
> >>
> >>> Hi thanks by the answer, but the uniqueId is generated by me. But when
> >>> solr indexes and there is an update in a doc, it deletes the doc and
> >>> creates a new one, so it generates a new UUID.
> >>> It is not suitable for me, because i want that solr just updates some
> >>> fields, because the UUID is the key that i use to map it to an user in
> my
> >>> database.
> >>>
> >>> Right now i'm using information that comes from the source and never
> >>> chages, as my uniqueId, like for example the guid, that exists in some
> rss
> >>> feeds, or if it doesn't exists i use link.
> >>>
> >>> I think there is any simple solution for me, because for what i have
> >>> read, when an update to a doc exists, SOLR deletes the old one and
> create a
> >>> new one, right?
> >>>
> >>> On Aug 23, 2013, at 12:07 PM, Erick Erickson 
> >>> 
> >
> >>> wrote:
> >>>
> >>>  Well, not much in the way of help because you can't do what you
>  want AFAIK. I don't think UUID is suitable for your use-case. Why not
>  use your ?
> 
>  Or generate something yourself...
> 
>  Best
>  Erick
> 
> 
>  On Thu, Aug 22, 2013 at 5:56 PM, Luís Portela Afonso <
>  meligalet...@gmail.com 
> 
> > wrote:
> > Hi,
> >
> > How can i prevent solr from update some fields when updating a doc?
> > The problem is, i have an uuid with the field name uuid, but it is
> not
> > an
> > unique key. When a rss source updates a feed, solr will update the
> doc
> > with
> > the same link but it generates a new uuid. This is not the desired
> > because
> > this id is used by me to relate feeds with an user.
> >
> > Can someone help me?
> >
> > Many Thanks
> >
> 
> >
>


-- 
Sent from Gmail Mobile