Adding DocValues after or in the middle of indexing

2013-11-15 Thread Otis Gospodnetic
Hi,

Can one introduce DocValues (by adding them to the schema.xml) post facto?
If that is done, do newly added documents end up using DocValues,
while the old ones remain without DocValues?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


Very long warmup query vs. frequent soft commit with new searcher

2013-11-15 Thread Otis Gospodnetic
Hi,

What happens when one has a *single* vry long *warming* running
query that takes, say, 10 minutes, and a soft commit that opens a new
searcher happening every 1 minute?

Could one run into a situation where each soft commit triggers the
same long warming query, thus queueing them one after the other and
making the queue endless, so to speak?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


Re: Suspicious message with attachment

2013-11-15 Thread N.Emrah ALBULAK


Sent from my iPad

> On 15 Kas 2013, at 11:00, h...@ssww.com wrote:
> 
> The following message addressed to you was quarantined because it likely 
> contains a virus:
> 
> Subject: Solr Core Reload causing JVM Memory Leak through 
> FieldCache/LRUCache/LFUCache
> From: Umesh Prasad 
> 
> However, if you know the sender and are expecting an attachment, please reply 
> to this message, and we will forward the quarantined message to you.


RE: SolrCloud question

2013-11-15 Thread Beale, Jim (US-KOP)
Hi Mark,

Thanks for the reply.

I am struggling a bit here. Sorry if these are basic questions!  I can't find 
the answers anywhere.

I modified my solr.xml on all boxes to comment out the core definition for 'tp'.
Then, I used /admin/collections?action=CREATE&name=tp&numShards=1 against one 
of the boxes.  That created 'shard1' for the tp index.

(1) It named the dir 'tp_shard1_replica1'
(2) The core seems to be using the same config as the bn core
(3) I am unable to create a similar core on the other boxes.

When I use replicationFactor=5, it creates replicas of the index on the other 
boxes.

Can I then copy a pre-existing LCN index into the data/index directory and have 
it replicate to the other boxes?

Thanks!

Jim



-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Friday, November 15, 2013 11:55 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud question

We are moving away from pre defining SolrCores for SolrCloud. The correct 
approach would be to use thew Collections API - then it is quite simple to 
change the number of shards for each collection you create.

Hopefully our examples will move to doing this before long.

- Mark

On Nov 15, 2013, at 11:47 AM, Beale, Jim (US-KOP)  wrote:

> Hello all,
>
> I am trying to set up a SolrCloud deployment consisting of 5 boxes each of 
> which is running Solr under jetty.  A zookeeper ensemble is running 
> separately on 3 of the boxes.
>
> Each Solr instance has 2 cores, one of which is sharded across the five boxes 
> and the other not sharded at all because it is a much smaller index.  
> numShards is set to 5 in the command to start jetty, -DnumShards=5.
>
> It turns out that getting this configuration to work is not as easy as I had 
> hoped.  According to JIRA SOLR-3186, "If you are bootstrapping a multi-core 
> setup, you currently have to settle for the same
> numShards for every core."  Unfortunately that JIRA was closed without any 
> implementation.
>
> Is this limitation still in effect?  Does the new core discovery mode offer 
> anything in this regard?
>
> Is there any way at all to deploy two cores with different numShards?
>
> How hard would it be to implement this?  Is it compatible with the 
> architecture of Solr 5?
>
> Thanks,
> Jim Beale
>
>
> The information contained in this email message, including any attachments, 
> is intended solely for use by the individual or entity named above and may be 
> confidential. If the reader of this message is not the intended recipient, 
> you are hereby notified that you must not read, use, disclose, distribute or 
> copy any part of this communication. If you have received this communication 
> in error, please immediately notify me by email and destroy the original 
> message, including any attachments. Thank you.

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you.


Re: Solr Grouping

2013-11-15 Thread tamanjit.bin...@yahoo.co.in
My question is fro group.format=simple. In normal grouping i know
group.offset would work



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Grouping-tp4101313p4101316.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr grouping performance porblem

2013-11-15 Thread shamik
Thanks for the update Shawn, will look forward to the release.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-grouping-performance-porblem-tp4098565p4101314.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Grouping

2013-11-15 Thread tamanjit.bin...@yahoo.co.in
Hi,
In Grouping we can group docs as per a field. Can we also have something
like pagination within a group.

For eg.
G1 has G1D1,G1D2,G1D3
G2 has G2D1, G2D2
G3 has G3D1, G3D2, G3D3, G3D4.

Can I fetch the results like (if group.format=simple)

Page1:
G1D1
G2D1
G3D1

Page 2:
G1D2
G2D2
G3D2

Page3:
G1D3
G2D2
G3D3

Page4:
G3D4







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Grouping-tp4101313.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCoreAware

2013-11-15 Thread Steven Bower
And the close hook will basically only be fired once during shutdown?


On Fri, Nov 15, 2013 at 1:07 PM, Chris Hostetter
wrote:

>
> : So for a given instance of a handler it will only be called once during
> the
> : lifetime of that handler?
>
> correct (unless there is a bug somewhere)
>
> : Also, when the core is passed in as part of inform() is it guaranteed to
> be
> : ready to go? (ie I can start feeding content at this point?)
>
> Right, that's the point of the interface: way back in the day we had
> people writting plugins that were trying to use SolrCore from their init()
> methods and the SolrCore wasn't fully initialized yet (didn't have
> DirectUpdateHandler yet, didn't have all of hte RequestHandler's
> initialized, didn't have an openSearcher, etc...)
>
> the inform(SolrCore) method is called after the SolrCore is initialized,
> and all plugins hanging off of it have been init()ed ... you can still get
> into trouble if you write FooA.inform(SolrCore) such that it asks the
> SolrCore for for a pointer to some FooB plugin and expect that FooB's
> infomr(SolrCore) method has already been called -- because there is no
> garunteed order -- but the basic functionality and basic plugin
> initialization has all been done at that point.
>
>
> -Hoss
>


Re: SolrCoreAware

2013-11-15 Thread Chris Hostetter

: So for a given instance of a handler it will only be called once during the
: lifetime of that handler?

correct (unless there is a bug somewhere)

: Also, when the core is passed in as part of inform() is it guaranteed to be
: ready to go? (ie I can start feeding content at this point?)

Right, that's the point of the interface: way back in the day we had 
people writting plugins that were trying to use SolrCore from their init() 
methods and the SolrCore wasn't fully initialized yet (didn't have 
DirectUpdateHandler yet, didn't have all of hte RequestHandler's 
initialized, didn't have an openSearcher, etc...)

the inform(SolrCore) method is called after the SolrCore is initialized, 
and all plugins hanging off of it have been init()ed ... you can still get 
into trouble if you write FooA.inform(SolrCore) such that it asks the 
SolrCore for for a pointer to some FooB plugin and expect that FooB's 
infomr(SolrCore) method has already been called -- because there is no 
garunteed order -- but the basic functionality and basic plugin 
initialization has all been done at that point.


-Hoss


Re: SolrCoreAware

2013-11-15 Thread Steven Bower
>>> it should be called only once during hte lifetime of a given plugin,
>>> usually not long after construction -- but it could be called many, many
>>> times in the lifetime of the solr process.

So for a given instance of a handler it will only be called once during the
lifetime of that handler?

Also, when the core is passed in as part of inform() is it guaranteed to be
ready to go? (ie I can start feeding content at this point?)

thanks,

steve




On Fri, Nov 15, 2013 at 12:52 PM, Chris Hostetter
wrote:

>
> : So its something that can happen multiple times during the lifetime of
> : process, but i'm guessing something not occuring very often?
>
> it should be called only once during hte lifetime of a given plugin,
> usually not long after construction -- but it could be called many, many
> times in the lifetime of the solr process.
>
> : Also is there a way to hook the shutdown of the core?
>
> any object (SolrCoreAware or otherwise) can ask the SolrCore to add a
> CloseHook at anytime...
>
>
> https://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/core/SolrCore.html#addCloseHook%28org.apache.solr.core.CloseHook%29
>
>
> -Hoss
>


Re: SolrCoreAware

2013-11-15 Thread Chris Hostetter

: So its something that can happen multiple times during the lifetime of
: process, but i'm guessing something not occuring very often?

it should be called only once during hte lifetime of a given plugin, 
usually not long after construction -- but it could be called many, many 
times in the lifetime of the solr process.

: Also is there a way to hook the shutdown of the core?

any object (SolrCoreAware or otherwise) can ask the SolrCore to add a 
CloseHook at anytime...

https://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/core/SolrCore.html#addCloseHook%28org.apache.solr.core.CloseHook%29


-Hoss


Re: SolrCoreAware

2013-11-15 Thread Shalin Shekhar Mangar
On Fri, Nov 15, 2013 at 11:19 PM, Steven Bower  wrote:
> So its something that can happen multiple times during the lifetime of
> process, but i'm guessing something not occuring very often?

That's right.

>
> Also is there a way to hook the shutdown of the core?

You can use SolrCore.addCloseHook method.

>
> steve
>
>
> On Fri, Nov 15, 2013 at 12:08 PM, Alan Woodward  wrote:
>
>> Hi Steven,
>>
>> It's called when the handler is created, either at SolrCore construction
>> time (solr startup or core reload) or the first time the handler is
>> requested if it's a lazy-loading handler.
>>
>> Alan Woodward
>> www.flax.co.uk
>>
>>
>> On 15 Nov 2013, at 15:40, Steven Bower wrote:
>>
>> > Under what circumstances will a handler that implements SolrCoreAware
>> have
>> > its inform() method called?
>> >
>> > thanks,
>> >
>> > steve
>>
>>



-- 
Regards,
Shalin Shekhar Mangar.


Re: SolrCoreAware

2013-11-15 Thread Steven Bower
So its something that can happen multiple times during the lifetime of
process, but i'm guessing something not occuring very often?

Also is there a way to hook the shutdown of the core?

steve


On Fri, Nov 15, 2013 at 12:08 PM, Alan Woodward  wrote:

> Hi Steven,
>
> It's called when the handler is created, either at SolrCore construction
> time (solr startup or core reload) or the first time the handler is
> requested if it's a lazy-loading handler.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 15 Nov 2013, at 15:40, Steven Bower wrote:
>
> > Under what circumstances will a handler that implements SolrCoreAware
> have
> > its inform() method called?
> >
> > thanks,
> >
> > steve
>
>


Re: PDF indexing issues

2013-11-15 Thread Furkan KAMACI
You should check the Apache PDFBox project. A similar question:
https://issues.apache.org/jira/browse/PDFBOX-940


2013/11/15 Marcello Lorenzi 

> Hi,
> during you testing of Apache SOLR 4.3, we have noticed some errors
> occurred for PDF indexing:
>
> ERROR - 2013-11-15 15:14:26.248; org.apache.pdfbox.pdmodel.font.PDCIDFont;
> Error: Could not parse predefined CMAP file for 'PDFXC30-Indentity0-UCS2'
> ERROR - 2013-11-15 15:14:36.108; org.apache.pdfbox.pdmodel.font.PDCIDFont;
> Error: Could not parse predefined CMAP file for '--UCS2'
>
> and
>
> ERROR - 2013-11-15 15:12:18.928; org.apache.pdfbox.filter.FlateFilter;
> FlateFilter: stop reading corrupt stream due to a DataFormatException
>
> Could these errors related to PDF  files format?
>
> Thanks,
> Marcello
>


Re: SolrCoreAware

2013-11-15 Thread Alan Woodward
Hi Steven,

It's called when the handler is created, either at SolrCore construction time 
(solr startup or core reload) or the first time the handler is requested if 
it's a lazy-loading handler.  

Alan Woodward
www.flax.co.uk


On 15 Nov 2013, at 15:40, Steven Bower wrote:

> Under what circumstances will a handler that implements SolrCoreAware have
> its inform() method called?
> 
> thanks,
> 
> steve



Re: SolrCloud question

2013-11-15 Thread Mark Miller
We are moving away from pre defining SolrCores for SolrCloud. The correct 
approach would be to use thew Collections API - then it is quite simple to 
change the number of shards for each collection you create.

Hopefully our examples will move to doing this before long.

- Mark

On Nov 15, 2013, at 11:47 AM, Beale, Jim (US-KOP)  wrote:

> Hello all,
> 
> I am trying to set up a SolrCloud deployment consisting of 5 boxes each of 
> which is running Solr under jetty.  A zookeeper ensemble is running 
> separately on 3 of the boxes.
> 
> Each Solr instance has 2 cores, one of which is sharded across the five boxes 
> and the other not sharded at all because it is a much smaller index.  
> numShards is set to 5 in the command to start jetty, -DnumShards=5.
> 
> It turns out that getting this configuration to work is not as easy as I had 
> hoped.  According to JIRA SOLR-3186, "If you are bootstrapping a multi-core 
> setup, you currently have to settle for the same
> numShards for every core."  Unfortunately that JIRA was closed without any 
> implementation.
> 
> Is this limitation still in effect?  Does the new core discovery mode offer 
> anything in this regard?
> 
> Is there any way at all to deploy two cores with different numShards?
> 
> How hard would it be to implement this?  Is it compatible with the 
> architecture of Solr 5?
> 
> Thanks,
> Jim Beale
> 
> 
> The information contained in this email message, including any attachments, 
> is intended solely for use by the individual or entity named above and may be 
> confidential. If the reader of this message is not the intended recipient, 
> you are hereby notified that you must not read, use, disclose, distribute or 
> copy any part of this communication. If you have received this communication 
> in error, please immediately notify me by email and destroy the original 
> message, including any attachments. Thank you.



SolrCloud question

2013-11-15 Thread Beale, Jim (US-KOP)
Hello all,

I am trying to set up a SolrCloud deployment consisting of 5 boxes each of 
which is running Solr under jetty.  A zookeeper ensemble is running separately 
on 3 of the boxes.

Each Solr instance has 2 cores, one of which is sharded across the five boxes 
and the other not sharded at all because it is a much smaller index.  numShards 
is set to 5 in the command to start jetty, -DnumShards=5.

It turns out that getting this configuration to work is not as easy as I had 
hoped.  According to JIRA SOLR-3186, "If you are bootstrapping a multi-core 
setup, you currently have to settle for the same
numShards for every core."  Unfortunately that JIRA was closed without any 
implementation.

Is this limitation still in effect?  Does the new core discovery mode offer 
anything in this regard?

Is there any way at all to deploy two cores with different numShards?

How hard would it be to implement this?  Is it compatible with the architecture 
of Solr 5?

Thanks,
Jim Beale


The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you.


PDF indexing issues

2013-11-15 Thread Marcello Lorenzi

Hi,
during you testing of Apache SOLR 4.3, we have noticed some errors 
occurred for PDF indexing:


ERROR - 2013-11-15 15:14:26.248; 
org.apache.pdfbox.pdmodel.font.PDCIDFont; Error: Could not parse 
predefined CMAP file for 'PDFXC30-Indentity0-UCS2'
ERROR - 2013-11-15 15:14:36.108; 
org.apache.pdfbox.pdmodel.font.PDCIDFont; Error: Could not parse 
predefined CMAP file for '--UCS2'


and

ERROR - 2013-11-15 15:12:18.928; org.apache.pdfbox.filter.FlateFilter; 
FlateFilter: stop reading corrupt stream due to a DataFormatException


Could these errors related to PDF  files format?

Thanks,
Marcello


Re: field collapsing performance in sharded environment

2013-11-15 Thread Paul Masurel
That's not the way grouping is done.
On a first round all shards return their 10 best group (represented as
their 10 best grouping values).

As a result it's a three round thing instead of the two round for regular
search, so observing an increasing in latency is normal but not in the
realm of what you are seeing here.

Most probably it is due to the performance issue of TermAllGroupsCollector
which you can patch very easily.


On Thu, Nov 14, 2013 at 3:56 PM, Erick Erickson wrote:

> bq:   Of the 10k docs,
> most have a unique near duplicate hash value, so there are about 10k unique
> values for the field that I'm grouping on.
>
> I suspect (but don't know the grouping code well) that this is the issue.
> You're
> getting the top N groups, right? But in the general case, you can't insure
> that the
> topN from shard1 has any relation to the topN from shard2. So I _suspect_
> that
> the code returns all of the groups. Say that shard1 for group 5 has 3 docs,
> but
> for shard2 has 3,000 docs. Do get the true top N, you need to collate all
> the values
> from all the groups; you can't just return the top 10 groups from each
> shard and
> get correct counts.
>
> Since your group cardinality is about 10K/shard, you're pushing 10 packets
> each
> containing 10K entries back to the originating shard, which has to
> combine/sort
> them all to get the true top N. At least that's my theory.
>
> Your situation is special in that you say that your groups don't appear on
> more than
> one shard, so you'd probably have to write something that aborted this
> behavior and
> returned only the top N, if I'm right.
>
> But that begs the question of why you're doing this. What purpose is served
> by
> grouping on documents that probably only have 1 member?
>
> Best,
> Erick
>
>
> On Wed, Nov 13, 2013 at 2:46 PM, David Anthony Troiano <
> dtroi...@basistech.com> wrote:
>
> > Hello,
> >
> > I'm hitting a performance issue when using field collapsing in a
> > distributed Solr setup and I'm wondering if others have seen it and if
> > anyone has an idea to work around. it.
> >
> > I'm using field collapsing to deduplicate documents that have the same
> near
> > duplicate hash value, and deduplicating at query time (as opposed to
> > filtering at index time) is a requirement.  I have a sharded setup with
> 10
> > cores (not SolrCloud), each having ~1000 documents each.  Of the 10k
> docs,
> > most have a unique near duplicate hash value, so there are about 10k
> unique
> > values for the field that I'm grouping on.  The grouping parameters that
> > I'm using are:
> >
> > group=true
> > group.field=
> > group.main=true
> >
> > I'm attempting distributed queries (&shards=s1,s2,...,s10) where the only
> > difference is the absence or presence of these three grouping parameters
> > and I'm consistently seeing a marked difference in performance (as a
> > representative data point, 200ms latency without grouping and 1600ms with
> > grouping).  Interestingly, if I put all 10k docs on the same core and
> query
> > that core independently with and without grouping, I don't see much of a
> > latency difference, so the performance degradation seems to exist only in
> > the sharded setup.
> >
> > Is there a known performance issue when field collapsing in a sharded
> setup
> > (perhaps only manifests when the grouping field has many unique values),
> or
> > have other people observed this?  Any ideas for a workaround?  Note that
> > docs in my sharded setup can only have the same signature if they're in
> the
> > same shard, so perhaps that can be used to boost perf, though I don't see
> > an exposed way to do so.
> >
> > A follow-on question is whether we're likely to see the same issue if /
> > when we move to SolrCloud.
> >
> > Thanks,
> > Dave
> >
>



-- 
__

 Masurel Paul
 e-mail: paul.masu...@gmail.com


SolrCoreAware

2013-11-15 Thread Steven Bower
Under what circumstances will a handler that implements SolrCoreAware have
its inform() method called?

thanks,

steve


Re: Document routing question.

2013-11-15 Thread Yago Riveiro
Joel,

Thanks for the explanation.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, November 15, 2013 at 2:14 PM, Joel Bernstein wrote:

> Yago,
> 
> Now that I look back at this blog, I see how this can be confusing.
> 
> This is how to breakdown the composite id: tenant1/4!docXXX
> 
> "tenant1" is the shardkey.
> 
> "/" is a separator between the shardkey and bits to use from the shardkey.
> 
> "4" is the number of bits taken from the shardkey to create the composite
> 32 bit hashcode. The other 28 bits come from the unique document ID.
> 
> "!" separates the shardkey from the unique doc ID
> 
> "docXXX" is the unique document ID
> 
> This is taken from the blog:
> 
> "This will take 4 bits from the shard key and 28 bits from the unique doc
> id, spreading the tenant over 1/16th of the shards in the collection.
> 
> 3 bits would spread the tenant over 1/8th of the collection.
> 2 bits would spread the tenant over 1/4th of the collection.
> 1 bit would spread the tenant over 1/2 the collection.
> 0 bits would spread the tenant across the entire collection."
> 
> 
> You do have to specify the bits at query time as well so Solr knows which
> shards to query.
> 
> Joel
> 
> 
> 
> 
> 
> On Thu, Nov 14, 2013 at 10:46 AM, yriveiro  (mailto:yago.rive...@gmail.com)> wrote:
> 
> > Hi,
> > 
> > I read this post
> > http://searchhub.org/2013/06/13/solr-cloud-document-routing
> > and I have some questions.
> > 
> > When a tenant is too large to fit on one shard, we can specify the number
> > of
> > bit from the shardKey that we want to use.
> > 
> > If we set a doc's key as "tenant1/4!docXXX" we are saying to spread the
> > docs
> > over the 1/4th of the collection. If the collection has 4 shards this means
> > that all docs with the same shardKey will go to the same shard, or we will
> > spread 25% in each shard?
> > 
> > Other question is: at query time, we must configurate shardKeys param as
> > "shard.keys=tenant1!" or as "shard.keys=tenant1/4!"
> > 
> > /Yago
> > 
> > 
> > 
> > -
> > Best regards
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Document-routing-question-tp4100938.html
> > Sent from the Solr - User mailing list archive at Nabble.com 
> > (http://Nabble.com).
> > 
> 
> 
> 
> 
> -- 
> Joel Bernstein
> Search Engineer at Heliosearch
> 
> 




Re: Date range faceting with various gap sizes?

2013-11-15 Thread jimi.hullegard
> Chris Hostetter wrote:
>
> You can see that in the resulting URL you got the params are duplicated -- the
> problem is that when expressed this way, Solr doesn't know when the
> different values of the start/end/gap params should be applied -- it just
> loops over each of the facet.range fields (in your case: the same field
> twice) and then looks for a coorisponding start/end/gap value and finds the
> first one since there are duplicates.

OK, that explains it. I thought that it would match the params, so that it 
would match the first parameter "facet.range=scheduledate_start_tdate" with the 
first parameter 
"f.scheduledate_start_tdate.facet.range.start=1990-01-01T11:00:00.000Z", the 
second parameter "facet.range=scheduledate_start_tdate" with the second 
parameter 
"f.scheduledate_start_tdate.facet.range.start=2011-01-01T11:00:00.000Z" and so 
on.

> what you want to do can be accomplished (as of Solr 4.3 - see SOLR-1351) by
> using "local params" in the facet.range (or facet.date) params...
> 
> http://localhost:8983/solr/select?q=*:*&rows=0&facet=true&facet.range={!
> facet.range.start=NOW/MONTH%20facet.range.end=NOW/MONTH%2B1M
> ONTH%20facet.range.gap=%2B1DAY}manufacturedate_dt&facet.range={!fa
> cet.range.start=NOW/MONTH%20facet.range.end=NOW/MONTH%2B1MO
> NTH%20facet.range.gap=%2B5DAY}manufacturedate_dt

Thanks for this info. I'm not sure it is easy to upgrade Solr for us though, 
since it is more or less integrated into the CMS we use.

But I actually realized that for this particular case, we don't need different 
gap sizes. We can use the "before" and "after" metadata instead.
 
Regards
/Jimi




Re: An UpdateHandler to run following a MySql DataImport

2013-11-15 Thread Dileepa Jayakody
I found out that you can configure any requestHandler to run a
requestProcessor chain.
So in my /dataimport requestHandler I just called my custom requestHandler
as a chain;

eg:

 

data-config.xml
*stanbolInterceptor*

   

It works.

Thanks,
Dileepa


On Fri, Nov 15, 2013 at 6:08 PM, Erick Erickson wrote:

> Hmmm, don't quite know the answer to that, but when things
> start getting complex with DIH, you should seriously consider
> a SolrJ solution unless someone comes up with a quick fix.
> Here's an example.
>
> http://searchhub.org/2012/02/14/indexing-with-solrj/
>
> Best,
> Erick
>
>
> On Fri, Nov 15, 2013 at 2:48 AM, Dileepa Jayakody <
> dileepajayak...@gmail.com
> > wrote:
>
> > Hi All,
> >
> > I have written a custom update request handler to do some custom
> processing
> > of documents and configured the /update handler to use my custom handler
> in
> > the default: update.chain.
> >
> > The same requestHandler should be configured for the data-import-handler
> > when it loads documents to solr index.
> > Is there a way configure the dataimport handler to use my custom
> > updatehandler in a update.chain?
> >
> > If not how can I perform the required custom processing of the document
> > while importing data from a mysql database?
> >
> > Thanks,
> > Dileepa
> >
>


Re: Document routing question.

2013-11-15 Thread Joel Bernstein
Yago,

Now that I look back at this blog, I see how this can be confusing.

This is how to breakdown the composite id: tenant1/4!docXXX

"tenant1" is the shardkey.

"/" is a separator between the shardkey and bits to use from the shardkey.

"4" is the number of bits taken from the shardkey to create the composite
32 bit hashcode. The other 28 bits come from the unique document ID.

"!" separates the shardkey from the unique doc ID

"docXXX" is the unique document ID

This is taken from the blog:

"This will take 4 bits from the shard key and 28 bits from the unique doc
id, spreading the tenant over 1/16th of the shards in the collection.

3 bits would spread the tenant over 1/8th of the collection.
2 bits would spread the tenant over 1/4th of the collection.
1 bit would spread the tenant  over 1/2 the collection.
0 bits would spread the tenant across the entire collection."


You do have to specify the bits at query time as well so Solr knows which
shards to query.

Joel





On Thu, Nov 14, 2013 at 10:46 AM, yriveiro  wrote:

> Hi,
>
> I read this post
> http://searchhub.org/2013/06/13/solr-cloud-document-routing
> and I have some questions.
>
> When a tenant is too large to fit on one shard, we can specify the number
> of
> bit from the shardKey that we want to use.
>
> If we set a doc's key as "tenant1/4!docXXX" we are saying to spread the
> docs
> over the 1/4th of the collection. If the collection has 4 shards this means
> that all docs with the same shardKey will go to the same shard, or we will
> spread 25% in each shard?
>
> Other question is: at query time, we must configurate shardKeys param as
> "shard.keys=tenant1!" or as "shard.keys=tenant1/4!"
>
> /Yago
>
>
>
> -
> Best regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Document-routing-question-tp4100938.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Joel Bernstein
Search Engineer at Heliosearch


Re: Solr xml img parsing exception

2013-11-15 Thread Marcello Lorenzi

Hi Jack,
we have analyzed the issue and there were duplicated jar into the tomcat 
classpath for Tika. After the removal of the dulicated library now the 
search engine works as expected.


Thanks for the support,
Marcello

On 11/14/2013 05:24 PM, Jack Krupansky wrote:

The actual error appears to be:

Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
105; The element type "img" must be terminated by the matching end-tag
"".

So, check the input document at line 91, column 105. There should be 
an  tag there, but SAX is complaining that there is no matching 
.


-- Jack Krupansky

-Original Message- From: Marcello Lorenzi
Sent: Thursday, November 14, 2013 9:26 AM
To: solr-user@lucene.apache.org
Subject: Solr xml img parsing exception

Hi,
I have installed a Solr 4.3 instance and we have configured manifoldcf
to pass web content to the shard collection, but during the crawling we
have noticed a lot of this exception:

ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: XML parse error
at
com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:150) 


at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) 


at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) 


at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) 


at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) 


at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) 


at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) 


at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) 


at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) 


at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221) 


at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:107) 


at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) 


at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:76) 


at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:934)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:90) 


at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:515) 


at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1012) 


at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:642) 


at
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) 


at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) 


at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555) 


at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 


at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 


at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.tika.exception.TikaException: XML parse error
at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:147) 


... 24 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
105; The element type "img" must be terminated by the matching end-tag
"".
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198) 


at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) 


at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) 


at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) 


at
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388) 


at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1753) 


at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragment

Re: exceeded limit of maxWarmingSearchers ERROR

2013-11-15 Thread Erick Erickson
That's a fine place to start. This form:

${solr.autoCommit.maxTime:15000}

just allows you to define a sysvar to override the 15 second default, like
java -Dsolr.autoCommti.maxTime=3 -jar start.jar


On Fri, Nov 15, 2013 at 8:11 AM, Loka  wrote:

> Hi Erickson,
>
> I have seen the following also from google, can I use the same in
> :
>  false
>
> If the above one is correct to add, can I add the below tags aslo in
>  along with the above tag:
>
> 
> 3
>   
>
>   
> 1
>   
>
>
> so finally, it will look like as:
>
> 
> 
> 3
>   
>
>   
> 1
>   
>  false
>
> 
>
>
> Is the above one fine?
>
>
> Regards,
> Lokanadham Ganta
>
>
>
>
> - Original Message -
> From: "Lokanadham Ganta" 
> To: "Erick Erickson [via Lucene]" <
> ml-node+s472066n4101203...@n3.nabble.com>
> Sent: Friday, November 15, 2013 6:33:20 PM
> Subject: Re: exceeded limit of maxWarmingSearchers ERROR
>
> Erickson,
>
> Thanks for your reply, before your reply, I have googled and found the
> following and added under
>  tag of solrconfig.xml
> file.
>
>
> 
> 3
>   
>
>   
> 1
>   
>
> Is the above one is fine or should I go strictly as per ypur suggestion
> means as below:
>
> 
>${solr.autoCommit.maxTime:15000}
>false
>  
>
> 
>
>  
>${solr.autoSoftCommit.maxTime:1}
>  
>
>
>
> Please confirm me.
>
> But how can I check how much autowarming that Iam doing, as of now I have
> set the maxWarmingSearchers as 2, should I increase the value?
>
>
> Regards,
> Lokanadham Ganta
>
>
> - Original Message -
> From: "Erick Erickson [via Lucene]" <
> ml-node+s472066n4101203...@n3.nabble.com>
> To: "Loka" 
> Sent: Friday, November 15, 2013 6:07:12 PM
> Subject: Re: exceeded limit of maxWarmingSearchers ERROR
>
> Where did you get that syntax? I've never seen that before.
>
> What you want to configure is the "maxTime" in your
> autocommit and autosoftcommit sections of solrconfig.xml,
> as:
>
>  
>${solr.autoCommit.maxTime:15000}
>false
>  
>
> 
>
>  
>${solr.autoSoftCommit.maxTime:1}
>  
>
> And you do NOT want to commit from your client.
>
> Depending on how long autowarm takes, you may still see this error,
> so check how much autowarming you're doing, i.e. how you've
> configured the caches in solrconfig.xml and what you
> have for newSearcher and firstSearcher.
>
> I'd start with autowarm numbers of, maybe, 16 or so at most.
>
> Best,
> Erick
>
>
> On Fri, Nov 15, 2013 at 2:46 AM, Loka < [hidden email] > wrote:
>
>
> > Hi Erickson,
> >
> > Thanks for your reply, basically, I used commitWithin tag as below in
> > solrconfig.xml file
> >
> >
> >  
> >
> >  dedupe
> >
> > 
> >  
> >
> > 
> >  > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> >   true
> >   id
> >   false
> >   name,features,cat
> >>
> name="signatureClass">org.apache.solr.update.processor.Lookup3Signature
> > 
> > 
> > 
> >   
> >
> >
> > But this fix did not solve my problem, I mean I again got the same error.
> > PFA of schema.xml and solrconfig.xml file, solr-spring.xml,
> > messaging-spring.xml, can you sugest me where Iam doing wrong.
> >
> > Regards,
> > Lokanadham Ganta
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > - Original Message -
> > From: "Erick Erickson [via Lucene]" <
> > [hidden email] >
> > To: "Loka" < [hidden email] >
> > Sent: Thursday, November 14, 2013 8:38:17 PM
> > Subject: Re: exceeded limit of maxWarmingSearchers ERROR
> >
> > CommitWithin is either configured in solrconfig.xml for the
> >  or  tags as the maxTime tag. I
> > recommend you do use this.
> >
> > The other way you can do it is if you're using SolrJ, one of the
> > forms of the server.add() method takes a number of milliseconds
> > to force a commit.
> >
> > You really, really do NOT want to use ridiculously short times for this
> > like a few milliseconds. That will cause new searchers to be
> > warmed, and when too many of them are warming at once you
> > get this error.
> >
> > Seriously, make your commitWithin or autocommit parameters
> > as long as you can, for many reasons.
> >
> > Here's a bunch of background:
> >
> >
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> > Best,
> > Erick
> >
> >
> > On Thu, Nov 14, 2013 at 5:13 AM, Loka < [hidden email] > wrote:
> >
> >
> > > Hi Naveen,
> > > Iam also getting the similar problem where I do not know how to use the
> > > commitWithin Tag, can you help me how to use commitWithin Tag. can you
> > give
> > > me the example
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4100864.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> >
> >
> >
> > If you reply to

Re: exceeded limit of maxWarmingSearchers ERROR

2013-11-15 Thread Loka
Hi Erickson,

I have seen the following also from google, can I use the same in 
:
 false

If the above one is correct to add, can I add the below tags aslo in 
 along with the above tag:

 
3 
  

   
1 
  


so finally, it will look like as:

 
 
3 
  

   
1 
  
 false




Is the above one fine?


Regards,
Lokanadham Ganta




- Original Message -
From: "Lokanadham Ganta" 
To: "Erick Erickson [via Lucene]" 
Sent: Friday, November 15, 2013 6:33:20 PM
Subject: Re: exceeded limit of maxWarmingSearchers ERROR

Erickson,

Thanks for your reply, before your reply, I have googled and found the 
following and added under 
 tag of solrconfig.xml file.


 
3 
  

   
1 
  

Is the above one is fine or should I go strictly as per ypur suggestion means 
as below:

 
   ${solr.autoCommit.maxTime:15000} 
   false 
  

 

  
   ${solr.autoSoftCommit.maxTime:1} 
  



Please confirm me.

But how can I check how much autowarming that Iam doing, as of now I have set 
the maxWarmingSearchers as 2, should I increase the value?


Regards,
Lokanadham Ganta


- Original Message -
From: "Erick Erickson [via Lucene]" 
To: "Loka" 
Sent: Friday, November 15, 2013 6:07:12 PM
Subject: Re: exceeded limit of maxWarmingSearchers ERROR

Where did you get that syntax? I've never seen that before. 

What you want to configure is the "maxTime" in your 
autocommit and autosoftcommit sections of solrconfig.xml, 
as: 

      
       ${solr.autoCommit.maxTime:15000} 
       false 
      

     

      
       ${solr.autoSoftCommit.maxTime:1} 
      

And you do NOT want to commit from your client. 

Depending on how long autowarm takes, you may still see this error, 
so check how much autowarming you're doing, i.e. how you've 
configured the caches in solrconfig.xml and what you 
have for newSearcher and firstSearcher. 

I'd start with autowarm numbers of, maybe, 16 or so at most. 

Best, 
Erick 


On Fri, Nov 15, 2013 at 2:46 AM, Loka < [hidden email] > wrote: 


> Hi Erickson, 
> 
> Thanks for your reply, basically, I used commitWithin tag as below in 
> solrconfig.xml file 
> 
> 
>   
>             
>              dedupe 
>             
>              
>           
> 
>  
>      class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> 
>       true 
>       id 
>       false 
>       name,features,cat 
>        name="signatureClass">org.apache.solr.update.processor.Lookup3Signature 
>      
>      
>      
>    
> 
> 
> But this fix did not solve my problem, I mean I again got the same error. 
> PFA of schema.xml and solrconfig.xml file, solr-spring.xml, 
> messaging-spring.xml, can you sugest me where Iam doing wrong. 
> 
> Regards, 
> Lokanadham Ganta 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> - Original Message - 
> From: "Erick Erickson [via Lucene]" < 
> [hidden email] > 
> To: "Loka" < [hidden email] > 
> Sent: Thursday, November 14, 2013 8:38:17 PM 
> Subject: Re: exceeded limit of maxWarmingSearchers ERROR 
> 
> CommitWithin is either configured in solrconfig.xml for the 
>  or  tags as the maxTime tag. I 
> recommend you do use this. 
> 
> The other way you can do it is if you're using SolrJ, one of the 
> forms of the server.add() method takes a number of milliseconds 
> to force a commit. 
> 
> You really, really do NOT want to use ridiculously short times for this 
> like a few milliseconds. That will cause new searchers to be 
> warmed, and when too many of them are warming at once you 
> get this error. 
> 
> Seriously, make your commitWithin or autocommit parameters 
> as long as you can, for many reasons. 
> 
> Here's a bunch of background: 
> 
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>  
> 
> Best, 
> Erick 
> 
> 
> On Thu, Nov 14, 2013 at 5:13 AM, Loka < [hidden email] > wrote: 
> 
> 
> > Hi Naveen, 
> > Iam also getting the similar problem where I do not know how to use the 
> > commitWithin Tag, can you help me how to use commitWithin Tag. can you 
> give 
> > me the example 
> > 
> > 
> > 
> > -- 
> > View this message in context: 
> > 
> http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4100864.html
>  
> > Sent from the Solr - User mailing list archive at Nabble.com. 
> > 
> 
> 
> 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below: 
> http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4100924.html
>  
> To unsubscribe from exceeded limit of maxWarmingSearchers ERROR, click 
> here . 
> NAML 
> 
> solr-spring.xml (2K) < 
> http://lucene.472066.n3.nabble.com/attachment/4101152/0/solr-spring.xml > 
> messaging-spring.xml (2K) < 
> http://lucene.472066.n3.nabble.com/attachment/4101152/1/messaging-spring.xml 
> > 
> schema.xml (6K) < 
> http://lucene.472066.n3.nabble.com/attachment/4101152/2/schema.xml > 
> solrconfig.xml (61K)

Re: exceeded limit of maxWarmingSearchers ERROR

2013-11-15 Thread Loka
Erickson,

Thanks for your reply, before your reply, I have googled and found the 
following and added under 
 tag of solrconfig.xml file.


 
3 
  

   
1 
  

Is the above one is fine or should I go strictly as per ypur suggestion means 
as below:

 
   ${solr.autoCommit.maxTime:15000} 
   false 
  

 

  
   ${solr.autoSoftCommit.maxTime:1} 
  



Please confirm me.

But how can I check how much autowarming that Iam doing, as of now I have set 
the maxWarmingSearchers as 2, should I increase the value?


Regards,
Lokanadham Ganta


- Original Message -
From: "Erick Erickson [via Lucene]" 
To: "Loka" 
Sent: Friday, November 15, 2013 6:07:12 PM
Subject: Re: exceeded limit of maxWarmingSearchers ERROR

Where did you get that syntax? I've never seen that before. 

What you want to configure is the "maxTime" in your 
autocommit and autosoftcommit sections of solrconfig.xml, 
as: 

      
       ${solr.autoCommit.maxTime:15000} 
       false 
      

     

      
       ${solr.autoSoftCommit.maxTime:1} 
      

And you do NOT want to commit from your client. 

Depending on how long autowarm takes, you may still see this error, 
so check how much autowarming you're doing, i.e. how you've 
configured the caches in solrconfig.xml and what you 
have for newSearcher and firstSearcher. 

I'd start with autowarm numbers of, maybe, 16 or so at most. 

Best, 
Erick 


On Fri, Nov 15, 2013 at 2:46 AM, Loka < [hidden email] > wrote: 


> Hi Erickson, 
> 
> Thanks for your reply, basically, I used commitWithin tag as below in 
> solrconfig.xml file 
> 
> 
>   
>             
>              dedupe 
>             
>              
>           
> 
>  
>      class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> 
>       true 
>       id 
>       false 
>       name,features,cat 
>        name="signatureClass">org.apache.solr.update.processor.Lookup3Signature 
>      
>      
>      
>    
> 
> 
> But this fix did not solve my problem, I mean I again got the same error. 
> PFA of schema.xml and solrconfig.xml file, solr-spring.xml, 
> messaging-spring.xml, can you sugest me where Iam doing wrong. 
> 
> Regards, 
> Lokanadham Ganta 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> - Original Message - 
> From: "Erick Erickson [via Lucene]" < 
> [hidden email] > 
> To: "Loka" < [hidden email] > 
> Sent: Thursday, November 14, 2013 8:38:17 PM 
> Subject: Re: exceeded limit of maxWarmingSearchers ERROR 
> 
> CommitWithin is either configured in solrconfig.xml for the 
>  or  tags as the maxTime tag. I 
> recommend you do use this. 
> 
> The other way you can do it is if you're using SolrJ, one of the 
> forms of the server.add() method takes a number of milliseconds 
> to force a commit. 
> 
> You really, really do NOT want to use ridiculously short times for this 
> like a few milliseconds. That will cause new searchers to be 
> warmed, and when too many of them are warming at once you 
> get this error. 
> 
> Seriously, make your commitWithin or autocommit parameters 
> as long as you can, for many reasons. 
> 
> Here's a bunch of background: 
> 
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>  
> 
> Best, 
> Erick 
> 
> 
> On Thu, Nov 14, 2013 at 5:13 AM, Loka < [hidden email] > wrote: 
> 
> 
> > Hi Naveen, 
> > Iam also getting the similar problem where I do not know how to use the 
> > commitWithin Tag, can you help me how to use commitWithin Tag. can you 
> give 
> > me the example 
> > 
> > 
> > 
> > -- 
> > View this message in context: 
> > 
> http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4100864.html
>  
> > Sent from the Solr - User mailing list archive at Nabble.com. 
> > 
> 
> 
> 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below: 
> http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4100924.html
>  
> To unsubscribe from exceeded limit of maxWarmingSearchers ERROR, click 
> here . 
> NAML 
> 
> solr-spring.xml (2K) < 
> http://lucene.472066.n3.nabble.com/attachment/4101152/0/solr-spring.xml > 
> messaging-spring.xml (2K) < 
> http://lucene.472066.n3.nabble.com/attachment/4101152/1/messaging-spring.xml 
> > 
> schema.xml (6K) < 
> http://lucene.472066.n3.nabble.com/attachment/4101152/2/schema.xml > 
> solrconfig.xml (61K) < 
> http://lucene.472066.n3.nabble.com/attachment/4101152/3/solrconfig.xml > 
> 
> 
> 
> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4101152.html
>  
> Sent from the Solr - User mailing list archive at Nabble.com. 
> 





If you reply to this email, your message will be added to the discussion below: 
http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4101203.html
 
To unsubscribe from exceeded limit of maxWarmingSearchers ERROR, c

Re: An UpdateHandler to run following a MySql DataImport

2013-11-15 Thread Erick Erickson
Hmmm, don't quite know the answer to that, but when things
start getting complex with DIH, you should seriously consider
a SolrJ solution unless someone comes up with a quick fix.
Here's an example.

http://searchhub.org/2012/02/14/indexing-with-solrj/

Best,
Erick


On Fri, Nov 15, 2013 at 2:48 AM, Dileepa Jayakody  wrote:

> Hi All,
>
> I have written a custom update request handler to do some custom processing
> of documents and configured the /update handler to use my custom handler in
> the default: update.chain.
>
> The same requestHandler should be configured for the data-import-handler
> when it loads documents to solr index.
> Is there a way configure the dataimport handler to use my custom
> updatehandler in a update.chain?
>
> If not how can I perform the required custom processing of the document
> while importing data from a mysql database?
>
> Thanks,
> Dileepa
>


Re: exceeded limit of maxWarmingSearchers ERROR

2013-11-15 Thread Erick Erickson
Where did you get that syntax? I've never seen that before.

What you want to configure is the "maxTime" in your
autocommit and autosoftcommit sections of solrconfig.xml,
as:

 
   ${solr.autoCommit.maxTime:15000}
   false
 



 
   ${solr.autoSoftCommit.maxTime:1}
 

And you do NOT want to commit from your client.

Depending on how long autowarm takes, you may still see this error,
so check how much autowarming you're doing, i.e. how you've
configured the caches in solrconfig.xml and what you
have for newSearcher and firstSearcher.

I'd start with autowarm numbers of, maybe, 16 or so at most.

Best,
Erick


On Fri, Nov 15, 2013 at 2:46 AM, Loka  wrote:

> Hi Erickson,
>
> Thanks for your reply, basically, I used commitWithin tag as below in
> solrconfig.xml file
>
>
>  
>
>  dedupe
>
> 
>  
>
> 
>  class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
>   true
>   id
>   false
>   name,features,cat
>name="signatureClass">org.apache.solr.update.processor.Lookup3Signature
> 
> 
> 
>   
>
>
> But this fix did not solve my problem, I mean I again got the same error.
> PFA of schema.xml and solrconfig.xml file, solr-spring.xml,
> messaging-spring.xml, can you sugest me where Iam doing wrong.
>
> Regards,
> Lokanadham Ganta
>
>
>
>
>
>
>
>
>
>
> - Original Message -
> From: "Erick Erickson [via Lucene]" <
> ml-node+s472066n4100924...@n3.nabble.com>
> To: "Loka" 
> Sent: Thursday, November 14, 2013 8:38:17 PM
> Subject: Re: exceeded limit of maxWarmingSearchers ERROR
>
> CommitWithin is either configured in solrconfig.xml for the
>  or  tags as the maxTime tag. I
> recommend you do use this.
>
> The other way you can do it is if you're using SolrJ, one of the
> forms of the server.add() method takes a number of milliseconds
> to force a commit.
>
> You really, really do NOT want to use ridiculously short times for this
> like a few milliseconds. That will cause new searchers to be
> warmed, and when too many of them are warming at once you
> get this error.
>
> Seriously, make your commitWithin or autocommit parameters
> as long as you can, for many reasons.
>
> Here's a bunch of background:
>
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best,
> Erick
>
>
> On Thu, Nov 14, 2013 at 5:13 AM, Loka < [hidden email] > wrote:
>
>
> > Hi Naveen,
> > Iam also getting the similar problem where I do not know how to use the
> > commitWithin Tag, can you help me how to use commitWithin Tag. can you
> give
> > me the example
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4100864.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
>
>
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4100924.html
> To unsubscribe from exceeded limit of maxWarmingSearchers ERROR, click
> here .
> NAML
>
> solr-spring.xml (2K) <
> http://lucene.472066.n3.nabble.com/attachment/4101152/0/solr-spring.xml>
> messaging-spring.xml (2K) <
> http://lucene.472066.n3.nabble.com/attachment/4101152/1/messaging-spring.xml
> >
> schema.xml (6K) <
> http://lucene.472066.n3.nabble.com/attachment/4101152/2/schema.xml>
> solrconfig.xml (61K) <
> http://lucene.472066.n3.nabble.com/attachment/4101152/3/solrconfig.xml>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-ERROR-tp3252844p4101152.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: facet method=enum and uninvertedfield limitations

2013-11-15 Thread Lemke, Michael SZ/HZA-ZSW
On Thu, November 14, 2013 7:26 PM, Yonik Seeley wrote:
>On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael  SZ/HZA-ZSW
> wrote:
>> I am running into performance problems with faceted queries.
>> If I do a
>>
>> q=word&facet.field=CONTENT&facet=true&facet.limit=10&facet.mincount=1&facet.method=fc&facet.prefix=a&rows=0
>>
>> I am getting an exception:
>> org.apache.solr.common.SolrException: Too many values for UnInvertedField 
>> faceting on field CONTENT
>> at 
>> org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384)
>> at 
>> org.apache.solr.request.UnInvertedField.(UnInvertedField.java:178)
>> at 
>> org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
>> ...
>>
>> I understand it's got something to do with a 24bit limit somewhere
>> in the code but I don't understand enough of it to be able to construct
>> a specialized index that can be queried with facet.method=enum.
>
>You shouldn't need to do anything differently to try facet.method=enum
>(just replace facet.method=fc with facet.method=enum)

This is true and facet.method=enum does work indeed.  The problem is
runtime.  In particular queries with an empty facet.prefix= run many
seconds if not minutes.  I initially asked about this here:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3c33ec3398272fbe47b64ee3b3e98f69a761427...@de011521.schaeffler.com%3E

It was suggested that fc is much faster than enum and I'd like to
test that.  We are still fairly free to design the index such that
it performs well.  But to do that we need to understand what is
killing it.

>
>You may also want to add the parameter
>facet.enum.cache.minDf=10
>to lower memory usage by only usiing the filter cache for terms that
>match more than 100K docs.

That helped a little, cut down my particular test from 10 sec to 5 sec.
But still too slow.  Mind you this is for an autosuggest feature.

Thanks for your reply.

Michael



Re: Solr Core Reload causing JVM Memory Leak through FieldCache/LRUCache/LFUCache

2013-11-15 Thread Umesh Prasad
Mailing list by default removes attachments. So uploaded it to google drive
..

https://drive.google.com/file/d/0B-RnB4e-vaJhX280NVllMUdHYWs/edit?usp=sharing



On Fri, Nov 15, 2013 at 2:28 PM, Umesh Prasad  wrote:

> Hi All,
> We are seeing memory leaks in our Search application whenever core
> reload happens after replication.
>We are using Solr 3.6.2 and I have observed this consistently on all
> servers.
>
> The leak suspect analysis from MAT is attached with the mail.
>
>  <#1425afb4a706064b_>  Problem Suspect 1
>
> One instance of *"org.apache.lucene.search.FieldCacheImpl"*loaded by 
> *"org.apache.catalina.loader.WebappClassLoader
> @ 0x7f7b0a5b8b30"* occupies *8,726,099,312 (35.49%)* bytes. The memory is
> accumulated in one instance of*"java.util.HashMap$Entry[]"* loaded by 
> *" class loader>"*.
>
> *Keywords*
> org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
> java.util.HashMap$Entry[]
> org.apache.lucene.search.FieldCacheImpl
>
> Problem Suspect 2
>
> 69 instances of *"org.apache.solr.util.ConcurrentLRUCache"*, loaded by 
> *"org.apache.catalina.loader.WebappClassLoader
> @ 0x7f7b0a5b8b30"* occupy *6,309,187,392 (25.66%)* bytes.
>
> Biggest instances:
>
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7fe74ef120 - 755,575,672 (3.07%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7e74b7a068 - 728,731,344 (2.96%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7d0a6bd1b8 - 711,828,392 (2.90%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7c6c12e800 - 708,657,624 (2.88%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7fcb092058 - 568,473,352 (2.31%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7f268cb2f0 - 568,400,040 (2.31%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7e31b60c58 - 544,078,600 (2.21%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7e65c2b2d8 - 489,578,480 (1.99%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7d81ea8538 - 467,833,720 (1.90%) bytes.
>- org.apache.solr.util.ConcurrentLRUCache @
>0x7f7f31996508 - 444,383,992 (1.81%) bytes.
>
>
>
> *Keywords*
> org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
> org.apache.solr.util.ConcurrentLRUCache
> Details » 
>
> 194 instances of *"org.apache.solr.util.ConcurrentLFUCache"*, loaded by 
> *"org.apache.catalina.loader.WebappClassLoader
> @ 0x7f7b0a5b8b30"* occupy *4,583,727,104 (18.64%)* bytes.
>
> Biggest instances:
>
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7cdd4735a0 - 410,628,176 (1.67%) bytes.
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7c7d48e180 - 390,690,864 (1.59%) bytes.
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7f1edfd008 - 348,193,312 (1.42%) bytes.
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7f37b01990 - 340,595,920 (1.39%) bytes.
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7fe02d8dd8 - 274,611,632 (1.12%) bytes.
>- org.apache.solr.util.ConcurrentLFUCache @
>0x7f7fa9dcfb20 - 253,848,232 (1.03%) bytes.
>
>
>
> *Keywords*
> org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
> org.apache.solr.util.ConcurrentLFUCache
>
>
> ---
> Thanks & Regards
> Umesh Prasad
>
> SDE @ Flipkart  : The Online Megastore at your doorstep ..
>



-- 
---
Thanks & Regards
Umesh Prasad


Suspicious message with attachment

2013-11-15 Thread help
The following message addressed to you was quarantined because it likely 
contains a virus:

Subject: Solr Core Reload causing JVM Memory Leak through 
FieldCache/LRUCache/LFUCache
From: Umesh Prasad 

However, if you know the sender and are expecting an attachment, please reply 
to this message, and we will forward the quarantined message to you.


Is there a max Size for synony-Definition?

2013-11-15 Thread Michael Bulla
Hi there,

yesterday I had a strange problem with using synonyms in Solr 4.3.0

In my schema there is the default-configuration for synonyms defined

  








  

Everything works fine with that config, except this line

Combustion, combustión, gases, gas, humo, humos, analizador, analizadores, 
emisiones, contaminante, O2, oxígeno, oxigeno, carbono, NOx, NO, NO2, SO2, CO, 
H2S, HC, hidrocarburos, inquemados, quemador, caldera, horno, chimenea

When searching for any of that terms, I don't get any result. Removing special 
chars didn't make it better, removing numeric didn't make it better. When 
shorting that list down to ~90 characters (10 terms) I got results again. Is 
there some kind of length constraint when using synonyms?

Regards,
Michael


+++

Michael Bulla
___
iteratec GmbH
Am Sandtorkai 73
20457 Hamburg

mailto: michael.bu...@iteratec.de
phone: +49 40 28 46 830 - 31
fax: +49 40 28 46 830 - 10
http://www.iteratec.de
http://twitter.com/iteratec

iteratec ist Hamburgs Bester IT-Arbeitgeber 2012 

___
Sitz und Registergericht der iteratec GmbH: München HRB 113 519
Geschäftsführer: Klaus Eberhardt, Mark Goerke, Inge Hanschke