subject:"Delete by query"

Re: collection exists but delete by query fails

2019-05-10 Thread Zheng Lin Edwin Yeo

Hi,

What is the query or code that you are using when you encounter the error?

Regards,
Edwin

On Thu, 9 May 2019 at 10:36, Aroop Ganguly  wrote:

>
>
> Hi
>
> I am on Solr 7.5 and I am issuing a delete-by-query using CloudSolrClient
> The collection exists but issuing a deletebyquery is failing every single
> time.
> I am wondering what is happening, and how to debug this.
>
> org.apache.solr.client.solrj.SolrServerException:
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:995)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:816)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at
> java.util.Collections$UnmodifiableList.get(Collections.java:1309)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:486)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1012)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:883)
> ... 6 more
>

collection exists but delete by query fails

2019-05-08 Thread Aroop Ganguly



Hi 

I am on Solr 7.5 and I am issuing a delete-by-query using CloudSolrClient
The collection exists but issuing a deletebyquery is failing every single time.
I am wondering what is happening, and how to debug this.

org.apache.solr.client.solrj.SolrServerException: 
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:995)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:816)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at java.util.Collections$UnmodifiableList.get(Collections.java:1309)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:486)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1012)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:883)
... 6 more

Re: Delete by query in SOLR 6.3

2018-11-15 Thread Emir Arnautović

Hi Rakesh,
Since Solr has to maintain eventual consistency of all replicas, it has to 
block updates while DBQ is running. Here is blog post with high level 
explaination of the issue: 
http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html 
<http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>

You should do query and delete by ids in order to avoid issues caused by DBQ.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 15 Nov 2018, at 06:09, RAKESH KOTE  wrote:
> 
> Hi,   We are using SOLR 6.3 in cloud and we have created 2 collections in a 
> single SOLR cluster consisting of 20 shards and 3 replicas each(overall 20X3 
> = 60 instances). The first collection has close to 2.5 billion records and 
> the second collection has 350 million records. Both the collection uses the 
> same instances which has 4 cores and 26 GB RAM (10 -12 GB assigned for Heap 
> and 14 GB assigned for OS).The first collection's index size is close to 50GB 
> and second collection index size is close to 5 GB in each of the instances. 
> We are using the default solrconfig values and the autoCommit and softCommits 
> are set to 5 minutes. The SOLR cluster is supported by 3 ZK.
> We are able to reach 5000/s updates and we are using solrj to index the data 
> to solr. We also delete the documents in each of the collection periodically 
> using solrj  delete by query method(we use a non-id filed in delete 
> query).(we are using java 1.8) The updates happens without much issues but 
> when we try to delete, it is taking considerable amount of time(close to 20 
> sec on an average but some of them takes more than 4-5 mins) which slows down 
> the whole application. We don't do an explicit commit after deletion and let 
> the autoCommit take care of it for every 5 mins. Since we are not doing a 
> commit we are wondering why the delete is taking more time comparing to 
> updates which are very fast and finishes in less than 50ms - 100 ms. Could 
> you please let us know the reason or how the deletes are different than the 
> updates operation in SOLR.
> with warm regards,RK.

Delete by query in SOLR 6.3

2018-11-14 Thread RAKESH KOTE

Hi,   We are using SOLR 6.3 in cloud and we have created 2 collections in a 
single SOLR cluster consisting of 20 shards and 3 replicas each(overall 20X3 = 
60 instances). The first collection has close to 2.5 billion records and the 
second collection has 350 million records. Both the collection uses the same 
instances which has 4 cores and 26 GB RAM (10 -12 GB assigned for Heap and 14 
GB assigned for OS).The first collection's index size is close to 50GB and 
second collection index size is close to 5 GB in each of the instances. We are 
using the default solrconfig values and the autoCommit and softCommits are set 
to 5 minutes. The SOLR cluster is supported by 3 ZK.
We are able to reach 5000/s updates and we are using solrj to index the data to 
solr. We also delete the documents in each of the collection periodically using 
solrj  delete by query method(we use a non-id filed in delete query).(we are 
using java 1.8) The updates happens without much issues but when we try to 
delete, it is taking considerable amount of time(close to 20 sec on an average 
but some of them takes more than 4-5 mins) which slows down the whole 
application. We don't do an explicit commit after deletion and let the 
autoCommit take care of it for every 5 mins. Since we are not doing a commit we 
are wondering why the delete is taking more time comparing to updates which are 
very fast and finishes in less than 50ms - 100 ms. Could you please let us know 
the reason or how the deletes are different than the updates operation in SOLR.
with warm regards,RK.

Re: Delete By Query issue followed by Delete By Id Issues

2018-07-05 Thread sujatha sankaran

Hi Emir,

We are deleting a larger subset of docs with a particular value which we
know based on the id and only updating a few of the deleted. Our document
is of the form
__, we need to delete all that has the same ,
that are no longer in DB and then update only a few that has been updated
in DB.

Thanks,
Sujatha



On Sun, Jun 24, 2018 at 8:59 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Sujatha,
> Did I get it right that you are deleting the same documents that will be
> updated afterward? If that’s the case, then you can simply skip deleting,
> and just send updated version of document. Solr (Lucene) does not have
> delete - it’ll just flag document as deleted. Updating document (assuming
> id is the same) will result in the same thing - old document will not be
> retrievable and will be removed from index when segments holding it is
> merged.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 21 Jun 2018, at 19:59, sujatha sankaran 
> wrote:
> >
> > Thanks,Shawn.
> >
> > Our use case is something like this in a batch load of  several 1000's of
> > documents,we do a delete first followed by update.Example delete all 1000
> > docs and send an update request for 1000.
> >
> > What we see is that there are many missing docs due to DBQ re-ordering of
> > the order of  deletes followed by updates.We also saw issue with nodes
> > going down
> > similar tot issue described here:
> > http://lucene.472066.n3.nabble.com/SolrCloud-Nodes-
> going-to-recovery-state-during-indexing-td4369396.html
> >
> > we see at the end of this batch process, many (several thousand ) missing
> > docs.
> >
> > Due to this and after reading above thread , we decided to move to DBI
> and
> > now are facing issues due to custom routing or implicit routing which we
> > have in place.So I don't think DBQ was working for us, but we did have
> > several such process ( DBQ followed by updates) for different activities
> in
> > the collection happening at the same time.
> >
> >
> > Sujatha
> >
> > On Thu, Jun 21, 2018 at 1:21 PM, Shawn Heisey 
> wrote:
> >
> >> On 6/21/2018 9:59 AM, sujatha sankaran wrote:
> >>> Currently from our business perspective we find that we are left with
> no
> >>> options for deleting docs in a batch load as :
> >>>
> >>> DBQ+ batch does not work well together
> >>> DBI+ custom routing (batch load / normal)would not work as well.
> >>
> >> I would expect DBQ to work, just with the caveat that if you are trying
> >> to do other indexing operations at the same time, you may run into
> >> significant delays, and if there are timeouts configured anywhere that
> >> are shorter than those delays, requests may return failure responses or
> >> log failures.
> >>
> >> If you are using DBQ, you just need to be sure that there are no other
> >> operations happening at the same time, or that your error handling is
> >> bulletproof.  Making sure that no other operations are happening at the
> >> same time as the DBQ is in my opinion a better option.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
>

Re: Delete By Query issue followed by Delete By Id Issues

2018-06-24 Thread Emir Arnautović

Hi Sujatha,
Did I get it right that you are deleting the same documents that will be 
updated afterward? If that’s the case, then you can simply skip deleting, and 
just send updated version of document. Solr (Lucene) does not have delete - 
it’ll just flag document as deleted. Updating document (assuming id is the 
same) will result in the same thing - old document will not be retrievable and 
will be removed from index when segments holding it is merged.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Jun 2018, at 19:59, sujatha sankaran  wrote:
> 
> Thanks,Shawn.
> 
> Our use case is something like this in a batch load of  several 1000's of
> documents,we do a delete first followed by update.Example delete all 1000
> docs and send an update request for 1000.
> 
> What we see is that there are many missing docs due to DBQ re-ordering of
> the order of  deletes followed by updates.We also saw issue with nodes
> going down
> similar tot issue described here:
> http://lucene.472066.n3.nabble.com/SolrCloud-Nodes-going-to-recovery-state-during-indexing-td4369396.html
> 
> we see at the end of this batch process, many (several thousand ) missing
> docs.
> 
> Due to this and after reading above thread , we decided to move to DBI and
> now are facing issues due to custom routing or implicit routing which we
> have in place.So I don't think DBQ was working for us, but we did have
> several such process ( DBQ followed by updates) for different activities in
> the collection happening at the same time.
> 
> 
> Sujatha
> 
> On Thu, Jun 21, 2018 at 1:21 PM, Shawn Heisey  wrote:
> 
>> On 6/21/2018 9:59 AM, sujatha sankaran wrote:
>>> Currently from our business perspective we find that we are left with no
>>> options for deleting docs in a batch load as :
>>> 
>>> DBQ+ batch does not work well together
>>> DBI+ custom routing (batch load / normal)would not work as well.
>> 
>> I would expect DBQ to work, just with the caveat that if you are trying
>> to do other indexing operations at the same time, you may run into
>> significant delays, and if there are timeouts configured anywhere that
>> are shorter than those delays, requests may return failure responses or
>> log failures.
>> 
>> If you are using DBQ, you just need to be sure that there are no other
>> operations happening at the same time, or that your error handling is
>> bulletproof.  Making sure that no other operations are happening at the
>> same time as the DBQ is in my opinion a better option.
>> 
>> Thanks,
>> Shawn
>> 
>>

Re: Delete By Query issue followed by Delete By Id Issues

2018-06-21 Thread sujatha sankaran

Thanks,Shawn.

Our use case is something like this in a batch load of  several 1000's of
documents,we do a delete first followed by update.Example delete all 1000
docs and send an update request for 1000.

What we see is that there are many missing docs due to DBQ re-ordering of
the order of  deletes followed by updates.We also saw issue with nodes
going down
similar tot issue described here:
http://lucene.472066.n3.nabble.com/SolrCloud-Nodes-going-to-recovery-state-during-indexing-td4369396.html

we see at the end of this batch process, many (several thousand ) missing
docs.

Due to this and after reading above thread , we decided to move to DBI and
now are facing issues due to custom routing or implicit routing which we
have in place.So I don't think DBQ was working for us, but we did have
several such process ( DBQ followed by updates) for different activities in
the collection happening at the same time.

Sujatha

On Thu, Jun 21, 2018 at 1:21 PM, Shawn Heisey  wrote:

> On 6/21/2018 9:59 AM, sujatha sankaran wrote:
> > Currently from our business perspective we find that we are left with no
> > options for deleting docs in a batch load as :
> >
> > DBQ+ batch does not work well together
> > DBI+ custom routing (batch load / normal)would not work as well.
>
> I would expect DBQ to work, just with the caveat that if you are trying
> to do other indexing operations at the same time, you may run into
> significant delays, and if there are timeouts configured anywhere that
> are shorter than those delays, requests may return failure responses or
> log failures.
>
> If you are using DBQ, you just need to be sure that there are no other
> operations happening at the same time, or that your error handling is
> bulletproof.  Making sure that no other operations are happening at the
> same time as the DBQ is in my opinion a better option.
>
> Thanks,
> Shawn
>
>

Re: Delete By Query issue followed by Delete By Id Issues

2018-06-21 Thread Shawn Heisey

On 6/21/2018 9:59 AM, sujatha sankaran wrote:
> Currently from our business perspective we find that we are left with no
> options for deleting docs in a batch load as :
>
> DBQ+ batch does not work well together
> DBI+ custom routing (batch load / normal)would not work as well.

I would expect DBQ to work, just with the caveat that if you are trying
to do other indexing operations at the same time, you may run into
significant delays, and if there are timeouts configured anywhere that
are shorter than those delays, requests may return failure responses or
log failures.

If you are using DBQ, you just need to be sure that there are no other
operations happening at the same time, or that your error handling is
bulletproof.  Making sure that no other operations are happening at the
same time as the DBQ is in my opinion a better option.

Thanks,
Shawn

Re: Delete By Query issue followed by Delete By Id Issues

2018-06-21 Thread sujatha sankaran

Thanks,Shawn.

Currently from our business perspective we find that we are left with no
options for deleting docs in a batch load as :

DBQ+ batch does not work well together
DBI+ custom routing (batch load / normal)would not work as well.

We are not sure how we can proceed unless we don't have to delete at all.

Thanks,
Sujatha

On Wed, Jun 20, 2018 at 8:31 PM, Shawn Heisey  wrote:

> On 6/20/2018 3:46 PM, sujatha sankaran wrote:
> > Thanks,Shawn.   Very useful information.
> >
> > Please find below the log details:-
>
> Is your collection using the implicit router?  You didn't say.  If it
> is, then I think you may not be able to use deleteById.  This is indeed
> a bug, one that has been reported at least once already, but hasn't been
> fixed yet.   I do not know why it hasn't been fixed yet.  Maybe the fix
> is very difficult, or maybe the reason for the problem is not yet fully
> understood.
>
> The log you shared shows an error trying to do an update -- the delete
> that failed.  This kind of error is indeed likely to cause SolrCloud to
> attempt index recovery, all in accordance with SolrCloud design goals.
>
> Thanks,
> Shawn
>
>

Re: Delete By Query issue followed by Delete By Id Issues

2018-06-20 Thread Shawn Heisey

On 6/20/2018 3:46 PM, sujatha sankaran wrote:
> Thanks,Shawn.   Very useful information.
>
> Please find below the log details:-

Is your collection using the implicit router?  You didn't say.  If it
is, then I think you may not be able to use deleteById.  This is indeed
a bug, one that has been reported at least once already, but hasn't been
fixed yet.   I do not know why it hasn't been fixed yet.  Maybe the fix
is very difficult, or maybe the reason for the problem is not yet fully
understood.

The log you shared shows an error trying to do an update -- the delete
that failed.  This kind of error is indeed likely to cause SolrCloud to
attempt index recovery, all in accordance with SolrCloud design goals.

Thanks,
Shawn

Re: Delete By Query issue followed by Delete By Id Issues

2018-06-20 Thread sujatha sankaran

Thanks,Shawn.   Very useful information.

Please find below the log details:-



2018-06-20 17:19:06.661 ERROR
(updateExecutor-2-thread-8226-processing-crm_v2_01_shard3_replica1
x:crm_v2_01_shard3_replica2 r:core_node4 n:masked:8983_solr s:shard3
c:crm_v2_01) [c:crm_v2_01 s:shard3 r:core_node4
x:crm_v2_01_shard3_replica2] o.a.s.u.StreamingSolrClients error

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at crm_v2_01_shard3_replica1: Bad Request

request:
crm_v2_01_shard3_replica1/update?update.chain=add-unknown-fields-to-the-schema=FROMLEADER=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F=javabin=2

Remote error message: missing _version_ on update from leader

   at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)

   at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)

   at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

   at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

   at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java)

   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

   at java.lang.Thread.run(Thread.java:748)

2018-06-20 17:19:06.662 WARN  (qtp1002191352-169102) [c:crm_v2_01 s:shard3
r:core_node4 x:crm_v2_01_shard3_replica2]
o.a.s.u.p.DistributedUpdateProcessor Error sending update to
http://masked:8983/solr

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://masked:8983/solr/crm_v2_01_shard3_replica3: Bad
Request

request:
http://masked:8983/solr/crm_v2_01_shard3_replica3/update?update.chain=add-unknown-fields-to-the-schema=FROMLEADER=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F=javabin=2

Remote error message: missing _version_ on update from leader

   at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)

   at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)

   at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

   at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

   at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java)

   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

   at java.lang.Thread.run(Thread.java:748)

2018-06-20 17:19:06.662 ERROR (qtp1002191352-169102) [c:crm_v2_01 s:shard3
r:core_node4 x:crm_v2_01_shard3_replica2]
o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on
replica http://masked:8983/solr/crm_v2_01_shard3_replica3/

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://masked:8983/solr/crm_v2_01_shard3_replica3: Bad
Request

request:
http://masked:8983/solr/crm_v2_01_shard3_replica3/update?update.chain=add-unknown-fields-to-the-schema=FROMLEADER=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F=javabin=2

Remote error message: missing _version_ on update from leader

   at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)

   at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)

   at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

   at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

   at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java)

   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

   at java.lang.Thread.run(Thread.java:748)

2018-06-20 17:19:06.662 INFO  (qtp1002191352-169102) [c:crm_v2_01 s:shard3
r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.c.ZkController Put replica
core=crm_v2_01_shard3_replica3 coreNodeName=core_node12 on masked:8983_solr
into leader-initiated recovery.

2018-06-20

Re: Delete By Query issue followed by Delete By Id Issues

2018-06-20 Thread Shawn Heisey


On 6/15/2018 3:14 PM, sujatha sankaran wrote:

We were initially having an issue with DBQ and heavy batch updates  which
used to result in many missing updates.

After reading many mails in mailing list which mentions that DBQ and batch
update do not work well together, we switched to DBI. But  we are seeing
issue as mentioned in this jira issue:
https://issues.apache.org/jira/browse/SOLR-7384


If you're using the implicit router on your multi-shard collection, 
deleting by ID may not work for you.  There are a number of issues in 
Jira discussing various aspects of the problem.  On a collection using 
the compositeId router, I would expect those deletes to work well.



Specifically we are seeing a pattern as :-

·There are several  ERRORs and WARNs about “missing _*version*_”
type of thing.

·ERROR message is typically single.

·There are several WARNs after that and after couple of WARNs there
is message that Leader initiated recovery has been kicked off .


Can you share these log entries?  The message on some of them is 
probably a dozen or more lines long, and may have multiple "Caused by" 
clauses that will also need to be included.  Seeing the whole log could 
be useful.



*Setup info*:

- Solr Cloud 6.6.2
--5 Node, 5 Shard, 3 replica setup
-~35million docs in the collection
-  Nodes have 90GB RAM 32 to JVM
-Soft commit interval 2 seconds, Hard commit (open searcher false) 15
seconds


Side notes:

Solr would actually have more heap memory available if you set the heap 
to 31GB instead of 32GB.


https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/

A 2 second soft commit interval is extremely aggressive.  If your soft 
commits are happening really quickly (far less that 1 second) then this 
might not be a problem, but with an index as large as yours, it is very 
likely that soft commits are taking much longer than 2 seconds.


Thanks,
Shawn

Delete By Query issue followed by Delete By Id Issues

2018-06-15 Thread sujatha sankaran

We were initially having an issue with DBQ and heavy batch updates  which
used to result in many missing updates.



After reading many mails in mailing list which mentions that DBQ and batch
update do not work well together, we switched to DBI. But  we are seeing
issue as mentioned in this jira issue:
https://issues.apache.org/jira/browse/SOLR-7384



Specifically we are seeing a pattern as :-

·There are several  ERRORs and WARNs about “missing _*version*_”
type of thing.

·ERROR message is typically single.

·There are several WARNs after that and after couple of WARNs there
is message that Leader initiated recovery has been kicked off .



Few scenarios:

   - Batch update with DBI where deletes are followed by updates for some
   documents in collection & Batch update with DBQ for some other docs =>
   results in missing docs across both types
   - Batch deletes with DBI   with route parameter, we see that about 20%
   of deletes are not happening. At this point there could be parallel batch
   updates with DBQ/ DBI
   - Pure DBI based updates where deletes are followed by updates , no DBQ
   here , but we are seeing missing version error and Leader initiated
   recovery, but deletes and  updates seem fine for individual docs update,
   yet to test  a batch with heavy load scenario

*Setup info*:

- Solr Cloud 6.6.2
--5 Node, 5 Shard, 3 replica setup
-~35million docs in the collection
-  Nodes have 90GB RAM 32 to JVM
-Soft commit interval 2 seconds, Hard commit (open searcher false) 15
seconds



Are there any solutions to missing version update for DBI followed by LIR
during heavy batch indexing  wehn using custom routing ?


Thanks,
Sujatha

Re: Delete by query, including negative filters

2016-04-09 Thread Robert Brown


Thanks Erick,

The *'s were accidental, if that makes any difference whatsoever.




On 09/04/16 15:42, Erick Erickson wrote:

Should work, or
-merchant_id:(12345 OR 9876*)

But do be aware that Solr is not strict boolean logic. The above is
close enough for this purpose. Here's an excellent writeup on this
subtlety:

https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/

Best,
Erick

On Sat, Apr 9, 2016 at 3:51 AM, Robert Brown <r...@intelcompute.com> wrote:

Hi,

I have this delete query: "*partner:pg AND market:us AND last_seen:[* TO
2016-04-09T02:01:06Z]*"

And would like to add "AND merchant_id != 12345 AND merchant_id != 98765"

Would this be done by including "*AND -merchant_id:12345 AND
-merchant_id:98765*" ?

Thanks,
Rob

Re: Delete by query, including negative filters

2016-04-09 Thread Erick Erickson

Should work, or
-merchant_id:(12345 OR 9876*)

But do be aware that Solr is not strict boolean logic. The above is
close enough for this purpose. Here's an excellent writeup on this
subtlety:

https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/

Best,
Erick

On Sat, Apr 9, 2016 at 3:51 AM, Robert Brown <r...@intelcompute.com> wrote:
> Hi,
>
> I have this delete query: "*partner:pg AND market:us AND last_seen:[* TO
> 2016-04-09T02:01:06Z]*"
>
> And would like to add "AND merchant_id != 12345 AND merchant_id != 98765"
>
> Would this be done by including "*AND -merchant_id:12345 AND
> -merchant_id:98765*" ?
>
> Thanks,
> Rob
>

Delete by query, including negative filters

2016-04-09 Thread Robert Brown


Hi,

I have this delete query: "*partner:pg AND market:us AND last_seen:[* TO 
2016-04-09T02:01:06Z]*"


And would like to add "AND merchant_id != 12345 AND merchant_id != 98765"

Would this be done by including "*AND -merchant_id:12345 AND 
-merchant_id:98765*" ?


Thanks,
Rob

Re: Delete by query using JSON?

2016-03-23 Thread Alexandre Rafalovitch

Ouch! I certainly did not mean to sound snippy. But perhaps it did, at
least to some people. I'll try to be even more PC in the future.

I am glad the problem was resolved in the end.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 23 March 2016 at 10:16, Robert Brown <r...@intelcompute.com> wrote:
> "why do you care? just do this ..."
>
> I see this a lot on mailing lists these days, it's usually a learning
> curve/task/question.  I know I fall into these types of questions/tasks
> regularly.
>
> Which usually leads to "don't tell me my approach is wrong, just explain
> what's going on, and why", or "just answer the straight-forward question I
> asked in first place.".
>
> Sorry for rambling, this just sounded familiar...
>
> :)
>
>
>
>
> On 22/03/16 22:50, Alexandre Rafalovitch wrote:
>>
>> Why do you care?
>>
>> The difference between Q and FQ are the scoring. For delete, you
>> delete all of them regardless of scoring and there is no difference.
>> Just chuck them all into Q.
>>
>> Regards,
>> Alex.
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 23 March 2016 at 06:07, Paul Hoffman <p...@flo.org> wrote:
>>>
>>> I've been struggling to find the right syntax for deleting by query
>>> using JSON, where the query includes an fq parameter.
>>>
>>> I know how to delete *all* documents, but how would I delete only
>>> documents with field doctype = "cres"?  I have tried the following along
>>> with a number of variations, all to no avail:
>>>
>>> $ curl -s -d @-
>>> 'http://localhost:8983/solr/blacklight-core/update?wt=json' <>> {
>>> "delete": { "query": "doctype:cres" }
>>> }
>>> EOS
>>>
>>> I can identify the documents like this:
>>>
>>> curl -s
>>> 'http://localhost:8983/solr/blacklight-core/select?q==doctype%3Acres=json=id'
>>>
>>> It seems like such a simple thing, but I haven't found any examples that
>>> use an fq.  Could someone post an example?
>>>
>>> Thanks in advance,
>>>
>>> Paul.
>>>
>>> --
>>> Paul Hoffman <p...@flo.org>
>>> Systems Librarian
>>> Fenway Libraries Online
>>> c/o Wentworth Institute of Technology
>>> 550 Huntington Ave.
>>> Boston, MA 02115
>>> (617) 442-2384 (FLO main number)
>
>

Re: Delete by query using JSON?

2016-03-23 Thread Paul Hoffman

On Tue, Mar 22, 2016 at 10:25:06PM -0400, Jack Krupansky wrote:
> See the correct syntax example here:
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-SendingJSONUpdateCommands
> 
> Your query is fine.

Thanks; I thought the query was wrong, but the example you pointed 
me to clued me in to the real problem: I had neglected to specify 
Content-Type: application.json (d'oh!).

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)

Re: Delete by query using JSON?

2016-03-23 Thread Paul Hoffman

On Tue, Mar 22, 2016 at 04:27:03PM -0700, Walter Underwood wrote:
> “Why do you care?” might not be the best way to say it, but it is 
> essential to understand the difference between selection (filtering) 
> and ranking.
> 
> As Solr params:
> 
> * q is ranking and filtering
> * fq is filtering only
> * bq is ranking only

Thanks, that is a very useful and concise synopsis.

> When deleting documents, ordering does not matter, which is why we ask 
> why you care about the ordering.
> 
> If the response is familiar to you, imagine how the questions sound to 
> people who have been working in search for twenty years. But even when 
> we are snippy, we still try to help.
> 
> Many, many times, the question is wrong. The most common difficulty on 
> this list is an “XY problem”, where the poster has problem X and has 
> assumed solution Y, which is not the right solution. But they ask 
> about Y. So we will tell people that their approach is wrong, because 
> that is the most helpful thing we can do.

Alex's response didn't seem snippy to me at all, and I agree 
wholeheartedly about the wrong-question problem -- in my case, not only 
was I asking the wrong question, but I shouldn't even have had to ask 
the (right) question at all!

Thanks again, everyone.

Paul.

-- 
Paul Hoffman 
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)

Re: Delete by query using JSON?

2016-03-22 Thread Jack Krupansky

See the correct syntax example here:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-SendingJSONUpdateCommands

Your query is fine.

-- Jack Krupansky

On Tue, Mar 22, 2016 at 3:07 PM, Paul Hoffman <p...@flo.org> wrote:

> I've been struggling to find the right syntax for deleting by query
> using JSON, where the query includes an fq parameter.
>
> I know how to delete *all* documents, but how would I delete only
> documents with field doctype = "cres"?  I have tried the following along
> with a number of variations, all to no avail:
>
> $ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json'
> < {
> "delete": { "query": "doctype:cres" }
> }
> EOS
>
> I can identify the documents like this:
>
> curl -s '
> http://localhost:8983/solr/blacklight-core/select?q==doctype%3Acres=json=id
> '
>
> It seems like such a simple thing, but I haven't found any examples that
> use an fq.  Could someone post an example?
>
> Thanks in advance,
>
> Paul.
>
> --
> Paul Hoffman <p...@flo.org>
> Systems Librarian
> Fenway Libraries Online
> c/o Wentworth Institute of Technology
> 550 Huntington Ave.
> Boston, MA 02115
> (617) 442-2384 (FLO main number)
>

Re: Delete by query using JSON?

2016-03-22 Thread Walter Underwood

“Why do you care?” might not be the best way to say it, but it is essential to 
understand the difference between selection (filtering) and ranking.

As Solr params:

* q is ranking and filtering
* fq is filtering only
* bq is ranking only

When deleting documents, ordering does not matter, which is why we ask why you 
care about the ordering.

If the response is familiar to you, imagine how the questions sound to people 
who have been working in search for twenty years. But even when we are snippy, 
we still try to help.

Many, many times, the question is wrong. The most common difficulty on this 
list is an “XY problem”, where the poster has problem X and has assumed 
solution Y, which is not the right solution. But they ask about Y. So we will 
tell people that their approach is wrong, because that is the most helpful 
thing we can do.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Mar 22, 2016, at 4:16 PM, Robert Brown <r...@intelcompute.com> wrote:
> 
> "why do you care? just do this ..."
> 
> I see this a lot on mailing lists these days, it's usually a learning 
> curve/task/question.  I know I fall into these types of questions/tasks 
> regularly.
> 
> Which usually leads to "don't tell me my approach is wrong, just explain 
> what's going on, and why", or "just answer the straight-forward question I 
> asked in first place.".
> 
> Sorry for rambling, this just sounded familiar...
> 
> :)
> 
> 
> 
> On 22/03/16 22:50, Alexandre Rafalovitch wrote:
>> Why do you care?
>> 
>> The difference between Q and FQ are the scoring. For delete, you
>> delete all of them regardless of scoring and there is no difference.
>> Just chuck them all into Q.
>> 
>> Regards,
>>Alex.
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>> 
>> 
>> On 23 March 2016 at 06:07, Paul Hoffman <p...@flo.org> wrote:
>>> I've been struggling to find the right syntax for deleting by query
>>> using JSON, where the query includes an fq parameter.
>>> 
>>> I know how to delete *all* documents, but how would I delete only
>>> documents with field doctype = "cres"?  I have tried the following along
>>> with a number of variations, all to no avail:
>>> 
>>> $ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json' 
>>> <>> {
>>> "delete": { "query": "doctype:cres" }
>>> }
>>> EOS
>>> 
>>> I can identify the documents like this:
>>> 
>>> curl -s 
>>> 'http://localhost:8983/solr/blacklight-core/select?q==doctype%3Acres=json=id'
>>> 
>>> It seems like such a simple thing, but I haven't found any examples that
>>> use an fq.  Could someone post an example?
>>> 
>>> Thanks in advance,
>>> 
>>> Paul.
>>> 
>>> --
>>> Paul Hoffman <p...@flo.org>
>>> Systems Librarian
>>> Fenway Libraries Online
>>> c/o Wentworth Institute of Technology
>>> 550 Huntington Ave.
>>> Boston, MA 02115
>>> (617) 442-2384 (FLO main number)
>

Re: Delete by query using JSON?

2016-03-22 Thread Robert Brown


"why do you care? just do this ..."

I see this a lot on mailing lists these days, it's usually a learning 
curve/task/question.  I know I fall into these types of questions/tasks 
regularly.


Which usually leads to "don't tell me my approach is wrong, just explain 
what's going on, and why", or "just answer the straight-forward question 
I asked in first place.".


Sorry for rambling, this just sounded familiar...

:)



On 22/03/16 22:50, Alexandre Rafalovitch wrote:

Why do you care?

The difference between Q and FQ are the scoring. For delete, you
delete all of them regardless of scoring and there is no difference.
Just chuck them all into Q.

Regards,
Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 23 March 2016 at 06:07, Paul Hoffman  wrote:

I've been struggling to find the right syntax for deleting by query
using JSON, where the query includes an fq parameter.

I know how to delete *all* documents, but how would I delete only
documents with field doctype = "cres"?  I have tried the following along
with a number of variations, all to no avail:

$ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json' 

Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)

Re: Delete by query using JSON?

2016-03-22 Thread Alexandre Rafalovitch

Why do you care?

The difference between Q and FQ are the scoring. For delete, you
delete all of them regardless of scoring and there is no difference.
Just chuck them all into Q.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 23 March 2016 at 06:07, Paul Hoffman <p...@flo.org> wrote:
> I've been struggling to find the right syntax for deleting by query
> using JSON, where the query includes an fq parameter.
>
> I know how to delete *all* documents, but how would I delete only
> documents with field doctype = "cres"?  I have tried the following along
> with a number of variations, all to no avail:
>
> $ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json' 
> < {
> "delete": { "query": "doctype:cres" }
> }
> EOS
>
> I can identify the documents like this:
>
> curl -s 
> 'http://localhost:8983/solr/blacklight-core/select?q==doctype%3Acres=json=id'
>
> It seems like such a simple thing, but I haven't found any examples that
> use an fq.  Could someone post an example?
>
> Thanks in advance,
>
> Paul.
>
> --
> Paul Hoffman <p...@flo.org>
> Systems Librarian
> Fenway Libraries Online
> c/o Wentworth Institute of Technology
> 550 Huntington Ave.
> Boston, MA 02115
> (617) 442-2384 (FLO main number)

Delete by query using JSON?

2016-03-22 Thread Paul Hoffman

I've been struggling to find the right syntax for deleting by query 
using JSON, where the query includes an fq parameter.

I know how to delete *all* documents, but how would I delete only 
documents with field doctype = "cres"?  I have tried the following along 
with a number of variations, all to no avail:

$ curl -s -d @- 'http://localhost:8983/solr/blacklight-core/update?wt=json' 

Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)

Re: CloudSolrClient send routed update/delete by query

2015-10-20 Thread Shalin Shekhar Mangar

Instead of using SolrQuery to make a query request, you can use the
UpdateRequest class. Something like the following should do the same
as your intended request:

CloudSolrClient solr = new
CloudSolrClient("127.0.0.1:2181,127.0.0.2:2181, 127.0.0.3:218/solr1");
solr.setIdField("cust");
solr.setDefaultCollection("asset");
UpdateRequest update = new UpdateRequest();
update.setParam("_route_", "b!");
update.setParam("json.filter", "cust:b");
update.deleteByQuery("cust:b");
update.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, true, true);
UpdateResponse response = update.process(solr);
System.out.println(response.toString());

On Tue, Oct 20, 2015 at 6:41 PM, Troy Collinsworth
<troycollinswo...@gmail.com> wrote:
> Is it possible to execute the following via CloudSolrClient? It works via
> curl.
>
>
> curl 
> 'http://localhost:8983/solr/asset/update/json?_route_="b!"="cust:b;'
> -H 'Content-type:application/json' -d
> '{params:{},"delete":{"query":"cust:b"},"commit": {},"optimize": {}}'
>
>
>
> I've tried the following, but the error indicates the URL path did not end
> in /update/json. How do you set that via CloudSolrClient API?
>
> @Test
> public void test() throws SolrServerException, IOException {
> CloudSolrClient solr = new CloudSolrClient("127.0.0.1:2181,127.0.0.2:2181,
> 127.0.0.3:218/solr1");
> solr.setIdField("cust");
> solr.setDefaultCollection("asset");
> SolrQuery q = new SolrQuery();
> q.setParam("_route_", "b!");
> q.setParam("json.filter", "cust:b");
> q.setQuery("{params:{},\"delete\":{\"query\":\"cust:b\"},\"commit\":
> {},\"optimize\": {}}");
> QueryResponse r = solr.query(q);
> System.out.println(r.toString());
> }
>
>
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://127.0.1.1:8984/solr/asset:
> org.apache.solr.search.SyntaxError: Cannot parse
> '{params:{},"delete":{"query":"cust:b"},"commit": {},"optimize": {}}':
> Encountered " "}" "} "" at line 1, column 9.
> Was expecting one of:
> "TO" ...
>  ...
>  ...
>
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)
> at
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:376)
> at
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:328)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1098)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:869)
> . . .



-- 
Regards,
Shalin Shekhar Mangar.

CloudSolrClient send routed update/delete by query

2015-10-20 Thread Troy Collinsworth

Is it possible to execute the following via CloudSolrClient? It works via
curl.


curl 
'http://localhost:8983/solr/asset/update/json?_route_="b!"="cust:b;'
-H 'Content-type:application/json' -d
'{params:{},"delete":{"query":"cust:b"},"commit": {},"optimize": {}}'



I've tried the following, but the error indicates the URL path did not end
in /update/json. How do you set that via CloudSolrClient API?

@Test
public void test() throws SolrServerException, IOException {
CloudSolrClient solr = new CloudSolrClient("127.0.0.1:2181,127.0.0.2:2181,
127.0.0.3:218/solr1");
solr.setIdField("cust");
solr.setDefaultCollection("asset");
SolrQuery q = new SolrQuery();
q.setParam("_route_", "b!");
q.setParam("json.filter", "cust:b");
q.setQuery("{params:{},\"delete\":{\"query\":\"cust:b\"},\"commit\":
{},\"optimize\": {}}");
QueryResponse r = solr.query(q);
System.out.println(r.toString());
}


org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://127.0.1.1:8984/solr/asset:
org.apache.solr.search.SyntaxError: Cannot parse
'{params:{},"delete":{"query":"cust:b"},"commit": {},"optimize": {}}':
Encountered " "}" "} "" at line 1, column 9.
Was expecting one of:
"TO" ...
 ...
 ...

at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:376)
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:328)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1098)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:869)
. . .

Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Renee Sun

Shawn,
thanks for the reply.

I have a sharded index. When I re-index a document (vs new index, which is
different process), I need to delete the old one first to avoid dup. We all
know that if there is only one core, the newly added document will replace
the old one, but with multiple core indexes, we will have to issue delete
command first to ALL shards since we do NOT know/remember which core the old
document was indexed to ... 

I also wanted to know if there is a better way handling this efficiently.

Anyways, we are sending delete to all cores of this customer, one of them
hit , others did not.

But consequently, when I need to decide about commit, I do NOT want blindly
commit to all cores, I want to know which one actually had the old doc so I
only send commit to that core.

I could alternatively use query first and skip if it did not hit, but delete
if it does, and I can't short circuit since we have dups :-( based on a
historical reason. 

any suggestion how to make this more efficiently?
 
thanks!






--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226788.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Shawn Heisey

On 9/2/2015 1:30 PM, Renee Sun wrote:
> Is there an easy way for me to get the actually deleted document number? I
> mean if the query did not hit any documents, I want to know that nothing got
> deleted. But if it did hit documents, i would like to know how many were
> delete...

I do this by issuing the same query that I plan to use for the delete,
before doing the delete.  If numFound is zero, I don't do the delete. 
Either way I know how many docs are getting deleted.  Since the program
that does this is the only thing updating the index, I know that the
info is completely accurate.

Thanks,
Shawn

is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Renee Sun

I run this curl trying to delete some messages :

curl
'http://localhost:8080/solr/mycore/update?commit=true=abacd'
| xmllint --format -

or

curl
'http://localhost:8080/solr/mycore/update?commit=true=myfield:mycriteria'
| xmllint --format -

the results I got is like:

  % Total% Received % Xferd  Average Speed   TimeTime Time 
Current
 Dload  Upload   Total   SpentLeft 
Speed
148   1480   1480 0  11402  0 --:--:-- --:--:-- --:--:--
14800


  
0
10
  


Is there an easy way for me to get the actually deleted document number? I
mean if the query did not hit any documents, I want to know that nothing got
deleted. But if it did hit documents, i would like to know how many were
delete...

thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Mark Ehle

Do a search with the same criteria before and after?

On Wed, Sep 2, 2015 at 3:30 PM, Renee Sun <renee_...@mcafee.com> wrote:

> I run this curl trying to delete some messages :
>
> curl
> 'http://localhost:8080/solr/mycore/update?commit=true=
> abacd'
> | xmllint --format -
>
> or
>
> curl
> 'http://localhost:8080/solr/mycore/update?commit=true=
> myfield:mycriteria'
> | xmllint --format -
>
> the results I got is like:
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current
>  Dload  Upload   Total   SpentLeft
> Speed
> 148   1480   1480 0  11402  0 --:--:-- --:--:-- --:--:--
> 14800
> 
> 
>   
> 0
> 10
>   
> 
>
> Is there an easy way for me to get the actually deleted document number? I
> mean if the query did not hit any documents, I want to know that nothing
> got
> deleted. But if it did hit documents, i would like to know how many were
> delete...
>
> thanks
> Renee
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Erick Erickson

bq: I have a sharded index. When I re-index a document (vs new index, which is
different process), I need to delete the old one first to avoid dup

No, you do not need to issue the delete in a sharded collection
_assuming_ that the doc has the same . Why
do you think you do? If it's in some doc somewhere we need
to fix it.

Docs are routed by a hash no the  in the default
case. So since it goes to the same shard, the fact that it's a
new version will be detected and it'll replace the old version.

Are you seeing anything different?

Best,
Erick

On Wed, Sep 2, 2015 at 1:24 PM, Renee Sun <renee_...@mcafee.com> wrote:
> Shawn,
> thanks for the reply.
>
> I have a sharded index. When I re-index a document (vs new index, which is
> different process), I need to delete the old one first to avoid dup. We all
> know that if there is only one core, the newly added document will replace
> the old one, but with multiple core indexes, we will have to issue delete
> command first to ALL shards since we do NOT know/remember which core the old
> document was indexed to ...
>
> I also wanted to know if there is a better way handling this efficiently.
>
> Anyways, we are sending delete to all cores of this customer, one of them
> hit , others did not.
>
> But consequently, when I need to decide about commit, I do NOT want blindly
> commit to all cores, I want to know which one actually had the old doc so I
> only send commit to that core.
>
> I could alternatively use query first and skip if it did not hit, but delete
> if it does, and I can't short circuit since we have dups :-( based on a
> historical reason.
>
> any suggestion how to make this more efficiently?
>
> thanks!
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226788.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Renee Sun

thanks Shawn...

on the other side, I have just created a thin layer webapp I deploy it with
solr/tomcat. this webapp provides RESTful api allow all kind of clients in
our system to call and request a commit on the certain core on that solr
server.

I put in with the idea to have a centre/final place to control the commit on
the cores in local solr server.

so far it works by reducing the arbitrary requests, such as that I will not
allow 2 commit requests from different clients to commit on same core happen
too close to each other, I will disregard the second request if the first
just being done like less than 5 minutes ago.

I am think enhance this webapp to check on physical index dir timestamp, and
drop the request if the core has not been changed since last commit. This
will prevent the client trying to commit on all cold cores blindly where
only one of them actually was updated.

I mean to ask: is there any solr admin meta data I can fetch through restful
api, to get data such as index last updated time, or something like that?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226818.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Renee Sun

Hi Erick... as Shawn pointed out... I am not using solrcloud, I am using a
more complicated sharding scheme, home grown... 

thanks for your response :-)
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226806.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Renee Sun

Hi Shawn,
I think we have similar structure where we use frontier/back instead of
hot/cold :-)

so yes we will probably have to do the same.

since we have large customers and some of them may have tera bytes data and
end up with hundreds of cold cores the blind delete broadcasting to all
of them is a performance kill.

I am thinking of adding a in-memory inventory of coreID : docID  so I can
identify which core the document is in efficiently... what do you think
about it?

thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226805.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Shawn Heisey

On 9/2/2015 3:32 PM, Renee Sun wrote:
> I think we have similar structure where we use frontier/back instead of
> hot/cold :-)
>
> so yes we will probably have to do the same.
>
> since we have large customers and some of them may have tera bytes data and
> end up with hundreds of cold cores the blind delete broadcasting to all
> of them is a performance kill.
>
> I am thinking of adding a in-memory inventory of coreID : docID  so I can
> identify which core the document is in efficiently... what do you think
> about it?

I could write code for the deleteByQuery method to figure out where to
send the requests.  Performance hasn't become a problem with the "send
to all shards" method.  If it does, then I know exactly what to do:

If the ID value that we use for sharding is larger than X, it goes to
the hot shard.  If not, then I would CRC32 hash the ID, mod the hash
value by the number of cold shards, and send it to the shard number (0
through 5 for our indexes) that comes out.

Our sharding ID field is actually not our uniqueKey field for Solr,
although it is the autoincrement primary key on the source MySQL
database.  Another way to think about this field is as the "delete id". 
Our Solr uniqueKey is a different field that has a unique-enforcing
index in MySQL.

If you want good performance with sharding operations, then you need a
sharding algorithm that is completely deterministic based on the key
value and the current shard layout.  If the shard layout changes then it
should not change frequently.  Our layout changes only once a day, at
which time the oldest documents are moved from the hot shard to the cold
shards.

Thanks,
Shawn

Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Shawn Heisey

On 9/2/2015 2:24 PM, Renee Sun wrote:
> I have a sharded index. When I re-index a document (vs new index, which is
> different process), I need to delete the old one first to avoid dup. We all
> know that if there is only one core, the newly added document will replace
> the old one, but with multiple core indexes, we will have to issue delete
> command first to ALL shards since we do NOT know/remember which core the old
> document was indexed to ... 
>
> I also wanted to know if there is a better way handling this efficiently.
>
> Anyways, we are sending delete to all cores of this customer, one of them
> hit , others did not.
>
> But consequently, when I need to decide about commit, I do NOT want blindly
> commit to all cores, I want to know which one actually had the old doc so I
> only send commit to that core.
>
> I could alternatively use query first and skip if it did not hit, but delete
> if it does, and I can't short circuit since we have dups :-( based on a
> historical reason. 
>
> any suggestion how to make this more efficiently?

I have a sharded index too.  It is a more complicated sharding mechanism
than you would get in a default SolrCloud install (and my servers are
NOT running in cloud mode).  It's a hot/cold shard system, with one hot
shard and six cold shards.  Even though the shard that contains any
given document is *always* something that can be calculated according to
a configuration that changes at most once a day, I send all deletes to
every shard like you do.  Each batch of documents in the delete list
(currently set to a batch size of 500) is sent to each shard.

The deleteByQuery method on my Core class (this is a java program)
queries the Solr core to see if any documents are found.  If they are,
then the delete request is sent to Solr.  Any successful Solr update
operation (add, delete, etc) will set a "commit" flag in the class
instance, which is checked by the commit method.  When a commit is
requested on the Core class, if the flag is true, a commit is sent to
Solr.  If the commit succeeds, the flag is cleared.

Thanks,
Shawn

Re: SolrCloud delete by query performance

2015-05-20 Thread Ryan Cutter

GC is operating the way I think it should but I am lacking memory.  I am
just surprised because indexing is performing fine (documents going in) but
deletions are really bad (documents coming out).

Is it possible these deletes are hitting many segments, each of which I
assume must be re-built?  And if there isn't much slack memory laying
around to begin with, there's a bunch of contention/swap?

Thanks Shawn!

On Wed, May 20, 2015 at 4:50 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/20/2015 5:41 PM, Ryan Cutter wrote:
  I have a collection with 1 billion documents and I want to delete 500 of
  them.  The collection has a dozen shards and a couple replicas.  Using
 Solr
  4.4.
 
  Sent the delete query via HTTP:
 
  http://hostname:8983/solr/my_collection/update?stream.body=
  deletequerysource:foo/query/delete
 
  Took a couple minutes and several replicas got knocked into Recovery
 mode.
  They eventually came back and the desired docs were deleted but the
 cluster
  wasn't thrilled (high load, etc).
 
  Is this expected behavior?  Is there a better way to delete documents
 that
  I'm missing?

 That's the correct way to do the delete.  Before you'll see the change,
 a commit must happen in one way or another.  Hopefully you already knew
 that.

 I believe that your setup has some performance issues that are making it
 very slow and knocking out your Solr nodes temporarily.

 The most common root problems with SolrCloud and indexes going into
 recovery are:  1) Your heap is enormous but your garbage collection is
 not tuned.  2) You don't have enough RAM, separate from your Java heap,
 for adequate index caching.  With a billion documents in your
 collection, you might even be having problems with both.

 Here's a wiki page that includes some info on both of these problems,
 plus a few others:

 http://wiki.apache.org/solr/SolrPerformanceProblems

 Thanks,
 Shawn

SolrCloud delete by query performance

2015-05-20 Thread Ryan Cutter

I have a collection with 1 billion documents and I want to delete 500 of
them.  The collection has a dozen shards and a couple replicas.  Using Solr
4.4.

Sent the delete query via HTTP:

http://hostname:8983/solr/my_collection/update?stream.body=
deletequerysource:foo/query/delete

Took a couple minutes and several replicas got knocked into Recovery mode.
They eventually came back and the desired docs were deleted but the cluster
wasn't thrilled (high load, etc).

Is this expected behavior?  Is there a better way to delete documents that
I'm missing?

Thanks, Ryan

Re: SolrCloud delete by query performance

2015-05-20 Thread Shawn Heisey

On 5/20/2015 5:41 PM, Ryan Cutter wrote:
 I have a collection with 1 billion documents and I want to delete 500 of
 them.  The collection has a dozen shards and a couple replicas.  Using Solr
 4.4.
 
 Sent the delete query via HTTP:
 
 http://hostname:8983/solr/my_collection/update?stream.body=
 deletequerysource:foo/query/delete
 
 Took a couple minutes and several replicas got knocked into Recovery mode.
 They eventually came back and the desired docs were deleted but the cluster
 wasn't thrilled (high load, etc).
 
 Is this expected behavior?  Is there a better way to delete documents that
 I'm missing?

That's the correct way to do the delete.  Before you'll see the change,
a commit must happen in one way or another.  Hopefully you already knew
that.

I believe that your setup has some performance issues that are making it
very slow and knocking out your Solr nodes temporarily.

The most common root problems with SolrCloud and indexes going into
recovery are:  1) Your heap is enormous but your garbage collection is
not tuned.  2) You don't have enough RAM, separate from your Java heap,
for adequate index caching.  With a billion documents in your
collection, you might even be having problems with both.

Here's a wiki page that includes some info on both of these problems,
plus a few others:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Re: SolrCloud delete by query performance

2015-05-20 Thread Shawn Heisey

On 5/20/2015 5:57 PM, Ryan Cutter wrote:
 GC is operating the way I think it should but I am lacking memory.  I am
 just surprised because indexing is performing fine (documents going in) but
 deletions are really bad (documents coming out).
 
 Is it possible these deletes are hitting many segments, each of which I
 assume must be re-built?  And if there isn't much slack memory laying
 around to begin with, there's a bunch of contention/swap?

A deleteByQuery must first query the entire index to determine which IDs
to delete.  That's going to hit every segment.  In the case of
SolrCloud, it will also hit at least one replica of every single shard
in the collection.

If the data required to satisfy the query is not already sitting in the
OS disk cache, then the actual disk must be read.  When RAM is extremely
tight, any disk operation will erase relevant data out of the OS disk
cache, so the next time it is needed, it must be read off the disk
again.  Disks are SLOW.  What I am describing is not swap, but the
performance impact is similar to swapping.

The actual delete operation (once the IDs are known) doesn't touch any
segments ... it writes Lucene document identifiers to a .del file, and
that file is consulted on all queries.  Any deleted documents found in
the query results are removed.

Thanks,
Shawn

Re: SolrCloud delete by query performance

2015-05-20 Thread Ryan Cutter

Shawn, thank you very much for that explanation.  It helps a lot.

Cheers, Ryan

On Wed, May 20, 2015 at 5:07 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/20/2015 5:57 PM, Ryan Cutter wrote:
  GC is operating the way I think it should but I am lacking memory.  I am
  just surprised because indexing is performing fine (documents going in)
 but
  deletions are really bad (documents coming out).
 
  Is it possible these deletes are hitting many segments, each of which I
  assume must be re-built?  And if there isn't much slack memory laying
  around to begin with, there's a bunch of contention/swap?

 A deleteByQuery must first query the entire index to determine which IDs
 to delete.  That's going to hit every segment.  In the case of
 SolrCloud, it will also hit at least one replica of every single shard
 in the collection.

 If the data required to satisfy the query is not already sitting in the
 OS disk cache, then the actual disk must be read.  When RAM is extremely
 tight, any disk operation will erase relevant data out of the OS disk
 cache, so the next time it is needed, it must be read off the disk
 again.  Disks are SLOW.  What I am describing is not swap, but the
 performance impact is similar to swapping.

 The actual delete operation (once the IDs are known) doesn't touch any
 segments ... it writes Lucene document identifiers to a .del file, and
 that file is consulted on all queries.  Any deleted documents found in
 the query results are removed.

 Thanks,
 Shawn

Re: Delete By query on a multi-value field

2015-02-03 Thread Jean-Sebastien Vachon

Hi Lokesh, 

thanks for the information. 

I forgot to mention that the system I am working on is still using 3.5 so I 
will probably have to reindex the whole set of documents.
Unless someone knows how to get around this...



From: Lokesh Chhaparwal xyzlu...@gmail.com
Sent: Monday, February 2, 2015 11:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Delete By query on a multi-value field

Hi Jean,

Please see the issues
https://issues.apache.org/jira/browse/SOLR-3862
https://issues.apache.org/jira/browse/SOLR-5992

Both of them are resolved. The *remove *clause (atomic update) has been
added to 4.9.0 release. Haven't checked it though.

Thanks,
Lokesh


On Tue, Feb 3, 2015 at 7:26 AM, Jean-Sebastien Vachon 
jean-sebastien.vac...@wantedanalytics.com wrote:

 Hi All,


 Is there a way to delete a value from a Multi-value field without
 reindexing anything?


 Lets say I have three documents A,B and C with field XYZ set to 1,2,3,
 2,3,4 and 1. I'd like to remove anything that has the value '1' in the
 field XYZ. That is I want to remove the value '1' from the field, deleting
 the document only if '1' is the only value present.


 Deleting documents such as C (single value) is easy with a Delete by query
 through the update handler but what about document A?



 Thanks for any hint

Delete By query on a multi-value field

2015-02-02 Thread Jean-Sebastien Vachon

Hi All,


Is there a way to delete a value from a Multi-value field without reindexing 
anything?


Lets say I have three documents A,B and C with field XYZ set to 1,2,3, 
2,3,4 and 1. I'd like to remove anything that has the value '1' in the 
field XYZ. That is I want to remove the value '1' from the field, deleting the 
document only if '1' is the only value present.


Deleting documents such as C (single value) is easy with a Delete by query 
through the update handler but what about document A?



Thanks for any hint

Re: Delete By query on a multi-value field

2015-02-02 Thread Lokesh Chhaparwal

Hi Jean,

Please see the issues
https://issues.apache.org/jira/browse/SOLR-3862
https://issues.apache.org/jira/browse/SOLR-5992

Both of them are resolved. The *remove *clause (atomic update) has been
added to 4.9.0 release. Haven't checked it though.

Thanks,
Lokesh


On Tue, Feb 3, 2015 at 7:26 AM, Jean-Sebastien Vachon 
jean-sebastien.vac...@wantedanalytics.com wrote:

 Hi All,


 Is there a way to delete a value from a Multi-value field without
 reindexing anything?


 Lets say I have three documents A,B and C with field XYZ set to 1,2,3,
 2,3,4 and 1. I'd like to remove anything that has the value '1' in the
 field XYZ. That is I want to remove the value '1' from the field, deleting
 the document only if '1' is the only value present.


 Deleting documents such as C (single value) is easy with a Delete by query
 through the update handler but what about document A?



 Thanks for any hint

Re: Delete by query with soft commit

2014-04-12 Thread Furkan KAMACI

Hi Jess;

Could you check here first:
http://search-lucene.com/m/QTPaSxpsW/Commit+Within+and+%252Fupdate%252Fextract+handlersubj=Re+Commit+Within+and+update+extract+handler
and
then here:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks;
Furkan KAMACI


2014-04-09 0:24 GMT+03:00 youknow...@heroicefforts.net 
youknow...@heroicefforts.net:

 It appears that UpdateResponse.setCommitWithin is not honored when
 executing a delete query against SolrCloud (SolrJ 4.6).  However, setting
 the hard commit parameter functions as expected.  Is this a known bug?

 Thanks,

 -Jess

Delete by query with soft commit

2014-04-08 Thread youknow...@heroicefforts.net

It appears that UpdateResponse.setCommitWithin is not honored when executing a 
delete query against SolrCloud (SolrJ 4.6).  However, setting the hard commit 
parameter functions as expected.  Is this a known bug?

Thanks, 

-Jess

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner

Here is the other server when it's locked:
https://gist.github.com/3529b7b6415756ead413

To be clear, neither is really the replica, I have 32 shards and each
physical server is the leader for 16, and the replica for 16.

Also, related to the max threads hunch: my working cluster has many, many
fewer shards per Solr instance. I'm going to do some migration dancing on
this cluster today to have more Solr JVMs each with fewer cores, and see
how it affects the deletes.


On Wed, Mar 6, 2013 at 5:40 PM, Mark Miller markrmil...@gmail.com wrote:

 Any chance you can grab the stack trace of a replica as well? (also when
 it's locked up of course).

 - Mark

 On Mar 6, 2013, at 3:34 PM, Brett Hoerner br...@bretthoerner.com wrote:

  If there's anything I can try, let me know. Interestingly, I think I have
  noticed that if I stop my indexer, do my delete, and restart the indexer
  then I'm fine. Which goes along with the update thread contention theory.
 
 
  On Wed, Mar 6, 2013 at 5:03 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  This is what I see:
 
  We currently limit the number of outstanding update requests at one time
  to avoid a crazy number of threads being used.
 
  It looks like a bunch of update requests are stuck in socket reads and
 are
  taking up the available threads. It looks like the deletes are hanging
 out
  waiting for a free thread.
 
  It seems the question is, why are the requests stuck in socket reads. I
  don't have an answer at the moment.
 
  We should probably get this into a JIRA issue though.
 
  - Mark
 
 
  On Mar 6, 2013, at 2:15 PM, Alexandre Rafalovitch arafa...@gmail.com
  wrote:
 
  It does not look like a deadlock, though it could be a distributed one.
  Or
  it could be a livelock, though that's less likely.
 
  Here is what we used to recommend in similar situations for large Java
  systems (BEA Weblogic):
  1) Do thread dump of both systems before anything. As simultaneous as
 you
  can make it.
  2) Do the first delete. Do a thread dump every 2 minutes on both
 servers
  (so, say 3 dumps in that 5 minute wait)
  3) Do the second delete and do thread dumps every 30 seconds on both
  servers from just before and then during. Preferably all the way until
  the
  problem shows itself. Every 5 seconds if the problem shows itself
 really
  quick.
 
  That gives you a LOT of thread dumps. But it also gives you something
  that
  allows to compare thread state before and after the problem starting
  showing itself and to identify moving (or unnaturally still) threads. I
  even wrote a tool long time ago that parsed those thread dumps
  automatically and generated pretty deadlock graphs of those.
 
 
  Regards,
   Alex.
 
 
 
 
 
  Personal blog: http://blog.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
 
 
  On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller markrmil...@gmail.com
  wrote:
 
  Thans Brett, good stuff (though not a good problem).
 
  We def need to look into this.
 
  - Mark
 
  On Mar 6, 2013, at 1:53 PM, Brett Hoerner br...@bretthoerner.com
  wrote:
 
  Here is a dump after the delete, indexing has been stopped:
  https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e
 
  An interesting hint that I forgot to mention: it doesn't always
 happen
  on
  the first delete. I manually ran the delete cron, and the server
  continued
  to work. I waited about 5 minutes and ran it again and it stalled the
  indexer (as seen from indexer process):
 http://i.imgur.com/1Tt35u0.png
 
  Another thing I forgot to mention. To bring the cluster back to life
 I:
 
  1) stop my indexer
  2) stop server1, start server1
  3) stop server2, start start2
  4) manually rebalance half of the shards to be mastered on server2
  (unload/create on server1)
  5) restart indexer
 
  And it works again until a delete eventually kills it.
 
  To be clear again, select queries continue to work indefinitely.
 
  Thanks,
  Brett
 
 
  On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller markrmil...@gmail.com
  wrote:
 
  Which version of Solr?
 
  Can you use jconsole, visualvm, or jstack to get some stack traces
 and
  see
  where things are halting?
 
  - Mark
 
  On Mar 6, 2013, at 11:45 AM, Brett Hoerner br...@bretthoerner.com
  wrote:
 
  I have a SolrCloud cluster (2 machines, 2 Solr instances, 32
 shards,
  replication factor of 2) that I've been using for over a month now
 in
  production.
 
  Suddenly, the hourly cron I run that dispatches a delete by query
  completely halts all indexing. Select queries still run (and
  quickly),
  there is no CPU or disk I/O happening, but suddenly my indexer
 (which
  runs
  at ~400 doc/sec steady) pauses, and everything blocks indefinitely.
 
  To clarify some on the schema, this is a moving window of data
  (imagine
  messages that don't matter after a 24 hour period) which

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Mark Miller


On Mar 7, 2013, at 9:03 AM, Brett Hoerner br...@bretthoerner.com wrote:

 To be clear, neither is really the replica, I have 32 shards and each
 physical server is the leader for 16, and the replica for 16.

Ah, interesting. That actually could be part of the issue - some brain cells 
are firing. I'm away from home till this weekend, but I can try and duplicate 
this when I get to my home base setup.

- Mark

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner

As a side note, do you think that was a poor idea? I figured it's better to
spread the master load around?


On Thu, Mar 7, 2013 at 11:29 AM, Mark Miller markrmil...@gmail.com wrote:


 On Mar 7, 2013, at 9:03 AM, Brett Hoerner br...@bretthoerner.com wrote:

  To be clear, neither is really the replica, I have 32 shards and each
  physical server is the leader for 16, and the replica for 16.

 Ah, interesting. That actually could be part of the issue - some brain
 cells are firing. I'm away from home till this weekend, but I can try and
 duplicate this when I get to my home base setup.

 - Mark

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Mark Miller

No, not a poor idea at all, definitely a valid setup.

- Mark

On Mar 7, 2013, at 9:30 AM, Brett Hoerner br...@bretthoerner.com wrote:

 As a side note, do you think that was a poor idea? I figured it's better to
 spread the master load around?
 
 
 On Thu, Mar 7, 2013 at 11:29 AM, Mark Miller markrmil...@gmail.com wrote:
 
 
 On Mar 7, 2013, at 9:03 AM, Brett Hoerner br...@bretthoerner.com wrote:
 
 To be clear, neither is really the replica, I have 32 shards and each
 physical server is the leader for 16, and the replica for 16.
 
 Ah, interesting. That actually could be part of the issue - some brain
 cells are firing. I'm away from home till this weekend, but I can try and
 duplicate this when I get to my home base setup.
 
 - Mark

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Brett Hoerner

 that dispatches a delete by query
  completely halts all indexing. Select queries still run (and
  quickly),
  there is no CPU or disk I/O happening, but suddenly my indexer
 (which
  runs
  at ~400 doc/sec steady) pauses, and everything blocks
 indefinitely.
 
  To clarify some on the schema, this is a moving window of data
  (imagine
  messages that don't matter after a 24 hour period) which are
  regularly
  chopped off by my hourly cron (deleting messages over 24 hours
 old)
  to
  keep the index size reasonable.
 
  There are no errors (log level warn) in the logs. I'm not sure
 what
  to
  look
  into. As I've said this has been running (delete included) for
 about
  a
  month.
 
  I'll also note that I have another cluster much like this one
 where I
  do
  the very same thing... it has 4 machines, and indexes 10x the
  documents
  per
  second, with more indexes... and yet I delete on a cron without
  issue...
 
  Any ideas on where to start, or other information I could provide?
 
  Thanks much.

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-07 Thread Mark Miller

 shards,
 replication factor of 2) that I've been using for over a month
 now in
 production.
 
 Suddenly, the hourly cron I run that dispatches a delete by query
 completely halts all indexing. Select queries still run (and
 quickly),
 there is no CPU or disk I/O happening, but suddenly my indexer
 (which
 runs
 at ~400 doc/sec steady) pauses, and everything blocks
 indefinitely.
 
 To clarify some on the schema, this is a moving window of data
 (imagine
 messages that don't matter after a 24 hour period) which are
 regularly
 chopped off by my hourly cron (deleting messages over 24 hours
 old)
 to
 keep the index size reasonable.
 
 There are no errors (log level warn) in the logs. I'm not sure
 what
 to
 look
 into. As I've said this has been running (delete included) for
 about
 a
 month.
 
 I'll also note that I have another cluster much like this one
 where I
 do
 the very same thing... it has 4 machines, and indexes 10x the
 documents
 per
 second, with more indexes... and yet I delete on a cron without
 issue...
 
 Any ideas on where to start, or other information I could provide?
 
 Thanks much.

Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner

I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
replication factor of 2) that I've been using for over a month now in
production.

Suddenly, the hourly cron I run that dispatches a delete by query
completely halts all indexing. Select queries still run (and quickly),
there is no CPU or disk I/O happening, but suddenly my indexer (which runs
at ~400 doc/sec steady) pauses, and everything blocks indefinitely.

To clarify some on the schema, this is a moving window of data (imagine
messages that don't matter after a 24 hour period) which are regularly
chopped off by my hourly cron (deleting messages over 24 hours old) to
keep the index size reasonable.

There are no errors (log level warn) in the logs. I'm not sure what to look
into. As I've said this has been running (delete included) for about a
month.

I'll also note that I have another cluster much like this one where I do
the very same thing... it has 4 machines, and indexes 10x the documents per
second, with more indexes... and yet I delete on a cron without issue...

Any ideas on where to start, or other information I could provide?

Thanks much.

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Mark Miller

Which version of Solr?

Can you use jconsole, visualvm, or jstack to get some stack traces and see 
where things are halting?

- Mark

On Mar 6, 2013, at 11:45 AM, Brett Hoerner br...@bretthoerner.com wrote:

 I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
 replication factor of 2) that I've been using for over a month now in
 production.
 
 Suddenly, the hourly cron I run that dispatches a delete by query
 completely halts all indexing. Select queries still run (and quickly),
 there is no CPU or disk I/O happening, but suddenly my indexer (which runs
 at ~400 doc/sec steady) pauses, and everything blocks indefinitely.
 
 To clarify some on the schema, this is a moving window of data (imagine
 messages that don't matter after a 24 hour period) which are regularly
 chopped off by my hourly cron (deleting messages over 24 hours old) to
 keep the index size reasonable.
 
 There are no errors (log level warn) in the logs. I'm not sure what to look
 into. As I've said this has been running (delete included) for about a
 month.
 
 I'll also note that I have another cluster much like this one where I do
 the very same thing... it has 4 machines, and indexes 10x the documents per
 second, with more indexes... and yet I delete on a cron without issue...
 
 Any ideas on where to start, or other information I could provide?
 
 Thanks much.

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner

4.1, I'll induce it again and run jstack.


On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller markrmil...@gmail.com wrote:

 Which version of Solr?

 Can you use jconsole, visualvm, or jstack to get some stack traces and see
 where things are halting?

 - Mark

 On Mar 6, 2013, at 11:45 AM, Brett Hoerner br...@bretthoerner.com wrote:

  I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
  replication factor of 2) that I've been using for over a month now in
  production.
 
  Suddenly, the hourly cron I run that dispatches a delete by query
  completely halts all indexing. Select queries still run (and quickly),
  there is no CPU or disk I/O happening, but suddenly my indexer (which
 runs
  at ~400 doc/sec steady) pauses, and everything blocks indefinitely.
 
  To clarify some on the schema, this is a moving window of data (imagine
  messages that don't matter after a 24 hour period) which are regularly
  chopped off by my hourly cron (deleting messages over 24 hours old) to
  keep the index size reasonable.
 
  There are no errors (log level warn) in the logs. I'm not sure what to
 look
  into. As I've said this has been running (delete included) for about a
  month.
 
  I'll also note that I have another cluster much like this one where I do
  the very same thing... it has 4 machines, and indexes 10x the documents
 per
  second, with more indexes... and yet I delete on a cron without issue...
 
  Any ideas on where to start, or other information I could provide?
 
  Thanks much.

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner

Here is a dump after the delete, indexing has been stopped:
https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e

An interesting hint that I forgot to mention: it doesn't always happen on
the first delete. I manually ran the delete cron, and the server continued
to work. I waited about 5 minutes and ran it again and it stalled the
indexer (as seen from indexer process): http://i.imgur.com/1Tt35u0.png

Another thing I forgot to mention. To bring the cluster back to life I:

1) stop my indexer
2) stop server1, start server1
3) stop server2, start start2
4) manually rebalance half of the shards to be mastered on server2
(unload/create on server1)
5) restart indexer

And it works again until a delete eventually kills it.

To be clear again, select queries continue to work indefinitely.

Thanks,
Brett


On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller markrmil...@gmail.com wrote:

 Which version of Solr?

 Can you use jconsole, visualvm, or jstack to get some stack traces and see
 where things are halting?

 - Mark

 On Mar 6, 2013, at 11:45 AM, Brett Hoerner br...@bretthoerner.com wrote:

  I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
  replication factor of 2) that I've been using for over a month now in
  production.
 
  Suddenly, the hourly cron I run that dispatches a delete by query
  completely halts all indexing. Select queries still run (and quickly),
  there is no CPU or disk I/O happening, but suddenly my indexer (which
 runs
  at ~400 doc/sec steady) pauses, and everything blocks indefinitely.
 
  To clarify some on the schema, this is a moving window of data (imagine
  messages that don't matter after a 24 hour period) which are regularly
  chopped off by my hourly cron (deleting messages over 24 hours old) to
  keep the index size reasonable.
 
  There are no errors (log level warn) in the logs. I'm not sure what to
 look
  into. As I've said this has been running (delete included) for about a
  month.
 
  I'll also note that I have another cluster much like this one where I do
  the very same thing... it has 4 machines, and indexes 10x the documents
 per
  second, with more indexes... and yet I delete on a cron without issue...
 
  Any ideas on where to start, or other information I could provide?
 
  Thanks much.

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Mark Miller

Thans Brett, good stuff (though not a good problem).

We def need to look into this. 

- Mark

On Mar 6, 2013, at 1:53 PM, Brett Hoerner br...@bretthoerner.com wrote:

 Here is a dump after the delete, indexing has been stopped:
 https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e
 
 An interesting hint that I forgot to mention: it doesn't always happen on
 the first delete. I manually ran the delete cron, and the server continued
 to work. I waited about 5 minutes and ran it again and it stalled the
 indexer (as seen from indexer process): http://i.imgur.com/1Tt35u0.png
 
 Another thing I forgot to mention. To bring the cluster back to life I:
 
 1) stop my indexer
 2) stop server1, start server1
 3) stop server2, start start2
 4) manually rebalance half of the shards to be mastered on server2
 (unload/create on server1)
 5) restart indexer
 
 And it works again until a delete eventually kills it.
 
 To be clear again, select queries continue to work indefinitely.
 
 Thanks,
 Brett
 
 
 On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller markrmil...@gmail.com wrote:
 
 Which version of Solr?
 
 Can you use jconsole, visualvm, or jstack to get some stack traces and see
 where things are halting?
 
 - Mark
 
 On Mar 6, 2013, at 11:45 AM, Brett Hoerner br...@bretthoerner.com wrote:
 
 I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
 replication factor of 2) that I've been using for over a month now in
 production.
 
 Suddenly, the hourly cron I run that dispatches a delete by query
 completely halts all indexing. Select queries still run (and quickly),
 there is no CPU or disk I/O happening, but suddenly my indexer (which
 runs
 at ~400 doc/sec steady) pauses, and everything blocks indefinitely.
 
 To clarify some on the schema, this is a moving window of data (imagine
 messages that don't matter after a 24 hour period) which are regularly
 chopped off by my hourly cron (deleting messages over 24 hours old) to
 keep the index size reasonable.
 
 There are no errors (log level warn) in the logs. I'm not sure what to
 look
 into. As I've said this has been running (delete included) for about a
 month.
 
 I'll also note that I have another cluster much like this one where I do
 the very same thing... it has 4 machines, and indexes 10x the documents
 per
 second, with more indexes... and yet I delete on a cron without issue...
 
 Any ideas on where to start, or other information I could provide?
 
 Thanks much.

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Alexandre Rafalovitch

It does not look like a deadlock, though it could be a distributed one. Or
it could be a livelock, though that's less likely.

Here is what we used to recommend in similar situations for large Java
systems (BEA Weblogic):
1) Do thread dump of both systems before anything. As simultaneous as you
can make it.
2) Do the first delete. Do a thread dump every 2 minutes on both servers
(so, say 3 dumps in that 5 minute wait)
3) Do the second delete and do thread dumps every 30 seconds on both
servers from just before and then during. Preferably all the way until the
problem shows itself. Every 5 seconds if the problem shows itself really
quick.

That gives you a LOT of thread dumps. But it also gives you something that
allows to compare thread state before and after the problem starting
showing itself and to identify moving (or unnaturally still) threads. I
even wrote a tool long time ago that parsed those thread dumps
automatically and generated pretty deadlock graphs of those.


Regards,
   Alex.





Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller markrmil...@gmail.com wrote:

 Thans Brett, good stuff (though not a good problem).

 We def need to look into this.

 - Mark

 On Mar 6, 2013, at 1:53 PM, Brett Hoerner br...@bretthoerner.com wrote:

  Here is a dump after the delete, indexing has been stopped:
  https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e
 
  An interesting hint that I forgot to mention: it doesn't always happen on
  the first delete. I manually ran the delete cron, and the server
 continued
  to work. I waited about 5 minutes and ran it again and it stalled the
  indexer (as seen from indexer process): http://i.imgur.com/1Tt35u0.png
 
  Another thing I forgot to mention. To bring the cluster back to life I:
 
  1) stop my indexer
  2) stop server1, start server1
  3) stop server2, start start2
  4) manually rebalance half of the shards to be mastered on server2
  (unload/create on server1)
  5) restart indexer
 
  And it works again until a delete eventually kills it.
 
  To be clear again, select queries continue to work indefinitely.
 
  Thanks,
  Brett
 
 
  On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Which version of Solr?
 
  Can you use jconsole, visualvm, or jstack to get some stack traces and
 see
  where things are halting?
 
  - Mark
 
  On Mar 6, 2013, at 11:45 AM, Brett Hoerner br...@bretthoerner.com
 wrote:
 
  I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
  replication factor of 2) that I've been using for over a month now in
  production.
 
  Suddenly, the hourly cron I run that dispatches a delete by query
  completely halts all indexing. Select queries still run (and quickly),
  there is no CPU or disk I/O happening, but suddenly my indexer (which
  runs
  at ~400 doc/sec steady) pauses, and everything blocks indefinitely.
 
  To clarify some on the schema, this is a moving window of data (imagine
  messages that don't matter after a 24 hour period) which are regularly
  chopped off by my hourly cron (deleting messages over 24 hours old)
 to
  keep the index size reasonable.
 
  There are no errors (log level warn) in the logs. I'm not sure what to
  look
  into. As I've said this has been running (delete included) for about a
  month.
 
  I'll also note that I have another cluster much like this one where I
 do
  the very same thing... it has 4 machines, and indexes 10x the documents
  per
  second, with more indexes... and yet I delete on a cron without
 issue...
 
  Any ideas on where to start, or other information I could provide?
 
  Thanks much.

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Mark Miller

This is what I see:

We currently limit the number of outstanding update requests at one time to 
avoid a crazy number of threads being used.

It looks like a bunch of update requests are stuck in socket reads and are 
taking up the available threads. It looks like the deletes are hanging out 
waiting for a free thread.

It seems the question is, why are the requests stuck in socket reads. I don't 
have an answer at the moment.

We should probably get this into a JIRA issue though.

- Mark


On Mar 6, 2013, at 2:15 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 It does not look like a deadlock, though it could be a distributed one. Or
 it could be a livelock, though that's less likely.
 
 Here is what we used to recommend in similar situations for large Java
 systems (BEA Weblogic):
 1) Do thread dump of both systems before anything. As simultaneous as you
 can make it.
 2) Do the first delete. Do a thread dump every 2 minutes on both servers
 (so, say 3 dumps in that 5 minute wait)
 3) Do the second delete and do thread dumps every 30 seconds on both
 servers from just before and then during. Preferably all the way until the
 problem shows itself. Every 5 seconds if the problem shows itself really
 quick.
 
 That gives you a LOT of thread dumps. But it also gives you something that
 allows to compare thread state before and after the problem starting
 showing itself and to identify moving (or unnaturally still) threads. I
 even wrote a tool long time ago that parsed those thread dumps
 automatically and generated pretty deadlock graphs of those.
 
 
 Regards,
   Alex.
 
 
 
 
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller markrmil...@gmail.com wrote:
 
 Thans Brett, good stuff (though not a good problem).
 
 We def need to look into this.
 
 - Mark
 
 On Mar 6, 2013, at 1:53 PM, Brett Hoerner br...@bretthoerner.com wrote:
 
 Here is a dump after the delete, indexing has been stopped:
 https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e
 
 An interesting hint that I forgot to mention: it doesn't always happen on
 the first delete. I manually ran the delete cron, and the server
 continued
 to work. I waited about 5 minutes and ran it again and it stalled the
 indexer (as seen from indexer process): http://i.imgur.com/1Tt35u0.png
 
 Another thing I forgot to mention. To bring the cluster back to life I:
 
 1) stop my indexer
 2) stop server1, start server1
 3) stop server2, start start2
 4) manually rebalance half of the shards to be mastered on server2
 (unload/create on server1)
 5) restart indexer
 
 And it works again until a delete eventually kills it.
 
 To be clear again, select queries continue to work indefinitely.
 
 Thanks,
 Brett
 
 
 On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 Which version of Solr?
 
 Can you use jconsole, visualvm, or jstack to get some stack traces and
 see
 where things are halting?
 
 - Mark
 
 On Mar 6, 2013, at 11:45 AM, Brett Hoerner br...@bretthoerner.com
 wrote:
 
 I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
 replication factor of 2) that I've been using for over a month now in
 production.
 
 Suddenly, the hourly cron I run that dispatches a delete by query
 completely halts all indexing. Select queries still run (and quickly),
 there is no CPU or disk I/O happening, but suddenly my indexer (which
 runs
 at ~400 doc/sec steady) pauses, and everything blocks indefinitely.
 
 To clarify some on the schema, this is a moving window of data (imagine
 messages that don't matter after a 24 hour period) which are regularly
 chopped off by my hourly cron (deleting messages over 24 hours old)
 to
 keep the index size reasonable.
 
 There are no errors (log level warn) in the logs. I'm not sure what to
 look
 into. As I've said this has been running (delete included) for about a
 month.
 
 I'll also note that I have another cluster much like this one where I
 do
 the very same thing... it has 4 machines, and indexes 10x the documents
 per
 second, with more indexes... and yet I delete on a cron without
 issue...
 
 Any ideas on where to start, or other information I could provide?
 
 Thanks much.

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Brett Hoerner

If there's anything I can try, let me know. Interestingly, I think I have
noticed that if I stop my indexer, do my delete, and restart the indexer
then I'm fine. Which goes along with the update thread contention theory.


On Wed, Mar 6, 2013 at 5:03 PM, Mark Miller markrmil...@gmail.com wrote:

 This is what I see:

 We currently limit the number of outstanding update requests at one time
 to avoid a crazy number of threads being used.

 It looks like a bunch of update requests are stuck in socket reads and are
 taking up the available threads. It looks like the deletes are hanging out
 waiting for a free thread.

 It seems the question is, why are the requests stuck in socket reads. I
 don't have an answer at the moment.

 We should probably get this into a JIRA issue though.

 - Mark


 On Mar 6, 2013, at 2:15 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

  It does not look like a deadlock, though it could be a distributed one.
 Or
  it could be a livelock, though that's less likely.
 
  Here is what we used to recommend in similar situations for large Java
  systems (BEA Weblogic):
  1) Do thread dump of both systems before anything. As simultaneous as you
  can make it.
  2) Do the first delete. Do a thread dump every 2 minutes on both servers
  (so, say 3 dumps in that 5 minute wait)
  3) Do the second delete and do thread dumps every 30 seconds on both
  servers from just before and then during. Preferably all the way until
 the
  problem shows itself. Every 5 seconds if the problem shows itself really
  quick.
 
  That gives you a LOT of thread dumps. But it also gives you something
 that
  allows to compare thread state before and after the problem starting
  showing itself and to identify moving (or unnaturally still) threads. I
  even wrote a tool long time ago that parsed those thread dumps
  automatically and generated pretty deadlock graphs of those.
 
 
  Regards,
Alex.
 
 
 
 
 
  Personal blog: http://blog.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
  On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Thans Brett, good stuff (though not a good problem).
 
  We def need to look into this.
 
  - Mark
 
  On Mar 6, 2013, at 1:53 PM, Brett Hoerner br...@bretthoerner.com
 wrote:
 
  Here is a dump after the delete, indexing has been stopped:
  https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e
 
  An interesting hint that I forgot to mention: it doesn't always happen
 on
  the first delete. I manually ran the delete cron, and the server
  continued
  to work. I waited about 5 minutes and ran it again and it stalled the
  indexer (as seen from indexer process): http://i.imgur.com/1Tt35u0.png
 
  Another thing I forgot to mention. To bring the cluster back to life I:
 
  1) stop my indexer
  2) stop server1, start server1
  3) stop server2, start start2
  4) manually rebalance half of the shards to be mastered on server2
  (unload/create on server1)
  5) restart indexer
 
  And it works again until a delete eventually kills it.
 
  To be clear again, select queries continue to work indefinitely.
 
  Thanks,
  Brett
 
 
  On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller markrmil...@gmail.com
  wrote:
 
  Which version of Solr?
 
  Can you use jconsole, visualvm, or jstack to get some stack traces and
  see
  where things are halting?
 
  - Mark
 
  On Mar 6, 2013, at 11:45 AM, Brett Hoerner br...@bretthoerner.com
  wrote:
 
  I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
  replication factor of 2) that I've been using for over a month now in
  production.
 
  Suddenly, the hourly cron I run that dispatches a delete by query
  completely halts all indexing. Select queries still run (and
 quickly),
  there is no CPU or disk I/O happening, but suddenly my indexer (which
  runs
  at ~400 doc/sec steady) pauses, and everything blocks indefinitely.
 
  To clarify some on the schema, this is a moving window of data
 (imagine
  messages that don't matter after a 24 hour period) which are
 regularly
  chopped off by my hourly cron (deleting messages over 24 hours old)
  to
  keep the index size reasonable.
 
  There are no errors (log level warn) in the logs. I'm not sure what
 to
  look
  into. As I've said this has been running (delete included) for about
 a
  month.
 
  I'll also note that I have another cluster much like this one where I
  do
  the very same thing... it has 4 machines, and indexes 10x the
 documents
  per
  second, with more indexes... and yet I delete on a cron without
  issue...
 
  Any ideas on where to start, or other information I could provide?
 
  Thanks much.

Re: Delete By Query suddenly halts indexing on SolrCloud cluster

2013-03-06 Thread Mark Miller

Any chance you can grab the stack trace of a replica as well? (also when it's 
locked up of course).

- Mark

On Mar 6, 2013, at 3:34 PM, Brett Hoerner br...@bretthoerner.com wrote:

 If there's anything I can try, let me know. Interestingly, I think I have
 noticed that if I stop my indexer, do my delete, and restart the indexer
 then I'm fine. Which goes along with the update thread contention theory.
 
 
 On Wed, Mar 6, 2013 at 5:03 PM, Mark Miller markrmil...@gmail.com wrote:
 
 This is what I see:
 
 We currently limit the number of outstanding update requests at one time
 to avoid a crazy number of threads being used.
 
 It looks like a bunch of update requests are stuck in socket reads and are
 taking up the available threads. It looks like the deletes are hanging out
 waiting for a free thread.
 
 It seems the question is, why are the requests stuck in socket reads. I
 don't have an answer at the moment.
 
 We should probably get this into a JIRA issue though.
 
 - Mark
 
 
 On Mar 6, 2013, at 2:15 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
 It does not look like a deadlock, though it could be a distributed one.
 Or
 it could be a livelock, though that's less likely.
 
 Here is what we used to recommend in similar situations for large Java
 systems (BEA Weblogic):
 1) Do thread dump of both systems before anything. As simultaneous as you
 can make it.
 2) Do the first delete. Do a thread dump every 2 minutes on both servers
 (so, say 3 dumps in that 5 minute wait)
 3) Do the second delete and do thread dumps every 30 seconds on both
 servers from just before and then during. Preferably all the way until
 the
 problem shows itself. Every 5 seconds if the problem shows itself really
 quick.
 
 That gives you a LOT of thread dumps. But it also gives you something
 that
 allows to compare thread state before and after the problem starting
 showing itself and to identify moving (or unnaturally still) threads. I
 even wrote a tool long time ago that parsed those thread dumps
 automatically and generated pretty deadlock graphs of those.
 
 
 Regards,
  Alex.
 
 
 
 
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Wed, Mar 6, 2013 at 5:04 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 Thans Brett, good stuff (though not a good problem).
 
 We def need to look into this.
 
 - Mark
 
 On Mar 6, 2013, at 1:53 PM, Brett Hoerner br...@bretthoerner.com
 wrote:
 
 Here is a dump after the delete, indexing has been stopped:
 https://gist.github.com/bretthoerner/c7ea3bf3dc9e676a3f0e
 
 An interesting hint that I forgot to mention: it doesn't always happen
 on
 the first delete. I manually ran the delete cron, and the server
 continued
 to work. I waited about 5 minutes and ran it again and it stalled the
 indexer (as seen from indexer process): http://i.imgur.com/1Tt35u0.png
 
 Another thing I forgot to mention. To bring the cluster back to life I:
 
 1) stop my indexer
 2) stop server1, start server1
 3) stop server2, start start2
 4) manually rebalance half of the shards to be mastered on server2
 (unload/create on server1)
 5) restart indexer
 
 And it works again until a delete eventually kills it.
 
 To be clear again, select queries continue to work indefinitely.
 
 Thanks,
 Brett
 
 
 On Wed, Mar 6, 2013 at 1:50 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 Which version of Solr?
 
 Can you use jconsole, visualvm, or jstack to get some stack traces and
 see
 where things are halting?
 
 - Mark
 
 On Mar 6, 2013, at 11:45 AM, Brett Hoerner br...@bretthoerner.com
 wrote:
 
 I have a SolrCloud cluster (2 machines, 2 Solr instances, 32 shards,
 replication factor of 2) that I've been using for over a month now in
 production.
 
 Suddenly, the hourly cron I run that dispatches a delete by query
 completely halts all indexing. Select queries still run (and
 quickly),
 there is no CPU or disk I/O happening, but suddenly my indexer (which
 runs
 at ~400 doc/sec steady) pauses, and everything blocks indefinitely.
 
 To clarify some on the schema, this is a moving window of data
 (imagine
 messages that don't matter after a 24 hour period) which are
 regularly
 chopped off by my hourly cron (deleting messages over 24 hours old)
 to
 keep the index size reasonable.
 
 There are no errors (log level warn) in the logs. I'm not sure what
 to
 look
 into. As I've said this has been running (delete included) for about
 a
 month.
 
 I'll also note that I have another cluster much like this one where I
 do
 the very same thing... it has 4 machines, and indexes 10x the
 documents
 per
 second, with more indexes... and yet I delete on a cron without
 issue...
 
 Any ideas on where to start, or other information I could provide?
 
 Thanks much.

Delete by Query not working properly

2012-12-18 Thread Dixline

Hi,

I've deleted a document using http://localhost:8983/solr/update?stream.body=
deletequeryskills_s:Perl/query/delete and the committed the delete
also. Again if search using q=perl i'm able to see the same document but if
i search using q=skills_s:Perl it is not returning any results. Can someone
explain is this how delete by query works?

Thanks,
Dixline.M



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-by-Query-not-working-properly-tp4027681.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete by Query not working properly

2012-12-18 Thread Upayavira

Surely you are deleting documents that have the term Perl in the
skills_s field, but are leaving behind another document that has Perl in
the default field (usually 'text'). Before doing the delete, do 'q=Perl
skills_s:Perl' and see if you get more than one document.

That's what it looks like anyhow.

Upayavira

On Tue, Dec 18, 2012, at 06:03 AM, Dixline wrote:
 Hi,
 
 I've deleted a document using
 http://localhost:8983/solr/update?stream.body=
 deletequeryskills_s:Perl/query/delete and the committed the
 delete
 also. Again if search using q=perl i'm able to see the same document but
 if
 i search using q=skills_s:Perl it is not returning any results. Can
 someone
 explain is this how delete by query works?
 
 Thanks,
 Dixline.M
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Delete-by-Query-not-working-properly-tp4027681.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete by Query not working properly

2012-12-18 Thread Jack Krupansky


Make sure that your curl command includes:

 -H Content-Type: text/xml

For example,

curl http://localhost:8983/solr/update?commit=true -H Content-Type: 
text/xml --data-binary '

deletequeryid:doc-2012-12-18/query/delete'

-- Jack Krupansky

-Original Message- 
From: Dixline

Sent: Tuesday, December 18, 2012 1:03 AM
To: solr-user@lucene.apache.org
Subject: Delete by Query not working properly

Hi,

I've deleted a document using http://localhost:8983/solr/update?stream.body=
deletequeryskills_s:Perl/query/delete and the committed the delete
also. Again if search using q=perl i'm able to see the same document but if
i search using q=skills_s:Perl it is not returning any results. Can someone
explain is this how delete by query works?

Thanks,
Dixline.M



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-by-Query-not-working-properly-tp4027681.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete by Query not working properly

2012-12-18 Thread Jack Krupansky

Oh, and since the field name suggests that the field type is string, be 
sure that you have the case exact - string fields are case sensitive.


So, change:

   deletequeryskills_s:Perl/query/delete

to

   deletequeryskills_s:perl/query/delete

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Tuesday, December 18, 2012 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Delete by Query not working properly

Make sure that your curl command includes:

 -H Content-Type: text/xml

For example,

curl http://localhost:8983/solr/update?commit=true -H Content-Type:
text/xml --data-binary '
deletequeryid:doc-2012-12-18/query/delete'

-- Jack Krupansky

-Original Message- 
From: Dixline

Sent: Tuesday, December 18, 2012 1:03 AM
To: solr-user@lucene.apache.org
Subject: Delete by Query not working properly

Hi,

I've deleted a document using http://localhost:8983/solr/update?stream.body=
deletequeryskills_s:Perl/query/delete and the committed the delete
also. Again if search using q=perl i'm able to see the same document but if
i search using q=skills_s:Perl it is not returning any results. Can someone
explain is this how delete by query works?

Thanks,
Dixline.M



--
View this message in context:
http://lucene.472066.n3.nabble.com/Delete-by-Query-not-working-properly-tp4027681.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete by Query not working properly

2012-12-18 Thread Walter Underwood

Never use text/xml, always application/xml.

wunder

On Dec 18, 2012, at 6:03 AM, Jack Krupansky wrote:

 Make sure that your curl command includes:
 
 -H Content-Type: text/xml
 
 For example,
 
 curl http://localhost:8983/solr/update?commit=true -H Content-Type: 
 text/xml --data-binary '
 deletequeryid:doc-2012-12-18/query/delete'
 
 -- Jack Krupansky
 
 -Original Message- From: Dixline
 Sent: Tuesday, December 18, 2012 1:03 AM
 To: solr-user@lucene.apache.org
 Subject: Delete by Query not working properly
 
 Hi,
 
 I've deleted a document using http://localhost:8983/solr/update?stream.body=
 deletequeryskills_s:Perl/query/delete and the committed the delete
 also. Again if search using q=perl i'm able to see the same document but if
 i search using q=skills_s:Perl it is not returning any results. Can someone
 explain is this how delete by query works?
 
 Thanks,
 Dixline.M
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Delete-by-Query-not-working-properly-tp4027681.html
 Sent from the Solr - User mailing list archive at Nabble.com. 

--
Walter Underwood
wun...@wunderwood.org

Re: Delete by Query not working properly

2012-12-18 Thread Jack Krupansky


Yes, text/xml is deprecated. But it is less typing for curl commands!

Why Solr doesn't default to application/xml is a mystery.

-- Jack Krupansky

-Original Message- 
From: Walter Underwood

Sent: Tuesday, December 18, 2012 10:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Delete by Query not working properly

Never use text/xml, always application/xml.

wunder

On Dec 18, 2012, at 6:03 AM, Jack Krupansky wrote:


Make sure that your curl command includes:

-H Content-Type: text/xml

For example,

curl http://localhost:8983/solr/update?commit=true -H Content-Type: 
text/xml --data-binary '

deletequeryid:doc-2012-12-18/query/delete'

-- Jack Krupansky

-Original Message- From: Dixline
Sent: Tuesday, December 18, 2012 1:03 AM
To: solr-user@lucene.apache.org
Subject: Delete by Query not working properly

Hi,

I've deleted a document using 
http://localhost:8983/solr/update?stream.body=

deletequeryskills_s:Perl/query/delete and the committed the delete
also. Again if search using q=perl i'm able to see the same document but 
if
i search using q=skills_s:Perl it is not returning any results. Can 
someone

explain is this how delete by query works?

Thanks,
Dixline.M



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-by-Query-not-working-properly-tp4027681.html

Sent from the Solr - User mailing list archive at Nabble.com.


--
Walter Underwood
wun...@wunderwood.org

Re: Solr 4.0.0 - index version and generation not changed after delete by query on master

2012-10-24 Thread Bill Au

I just filed a bug with all the details:

https://issues.apache.org/jira/browse/SOLR-3681

Bill

On Tue, Oct 23, 2012 at 2:47 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

 : Just discovered that the replication admin REST API reports the correct
 : index version and generation:
 :
 : http://master_host:port/solr/replication?command=indexversion
 :
 : So is this a bug in the admin UI?

 Ya gotta be specific Bill: where in the admin UI do you think it's
 displaying the incorrect information?

 The Admin UI just adds pretty markup to information fetched from the
 admin handlers using javascript, so if there is a problem it's either in
 the admin handlers, or in the javascript possibly caching the olds values.

 Off the cuff, this reminds me of...

 https://issues.apache.org/jira/browse/SOLR-3681

 The root confusion there was that /admin/replication explicitly shows data
 about the commit point available for replication -- not the current commit
 point being searched on the master.

 So if you are seeing a disconnect, then perhaps it's just that same
 descrepency? -- allthough if you are *only* seeing a disconnect after a
 deleteByQuery (and not after document adds, or a deleteById) then that
 does smell fishy, and makes me wonder if there is a code path where the
 userData for the commits aren't being set properly.

 Can you file a bug with a unit test to reproduce?  or at the very list a
 set of specific commands to run against the solr example including what
 request handler URLs to hit (so there's no risk of confusion about the ui
 javascript behavior) to see the problem?


 -Hoss

Re: Solr 4.0.0 - index version and generation not changed after delete by query on master

2012-10-24 Thread Bill Au

Sorry, I had copy/paste the wrong link before.  Here is the correct one:

https://issues.apache.org/jira/browse/SOLR-3986

Bill

On Wed, Oct 24, 2012 at 10:26 AM, Bill Au bill.w...@gmail.com wrote:

 I just filed a bug with all the details:

 https://issues.apache.org/jira/browse/SOLR-3681

 Bill


 On Tue, Oct 23, 2012 at 2:47 PM, Chris Hostetter hossman_luc...@fucit.org
  wrote:

 : Just discovered that the replication admin REST API reports the correct
 : index version and generation:
 :
 : http://master_host:port/solr/replication?command=indexversion
 :
 : So is this a bug in the admin UI?

 Ya gotta be specific Bill: where in the admin UI do you think it's
 displaying the incorrect information?

 The Admin UI just adds pretty markup to information fetched from the
 admin handlers using javascript, so if there is a problem it's either in
 the admin handlers, or in the javascript possibly caching the olds values.

 Off the cuff, this reminds me of...

 https://issues.apache.org/jira/browse/SOLR-3681

 The root confusion there was that /admin/replication explicitly shows data
 about the commit point available for replication -- not the current commit
 point being searched on the master.

 So if you are seeing a disconnect, then perhaps it's just that same
 descrepency? -- allthough if you are *only* seeing a disconnect after a
 deleteByQuery (and not after document adds, or a deleteById) then that
 does smell fishy, and makes me wonder if there is a code path where the
 userData for the commits aren't being set properly.

 Can you file a bug with a unit test to reproduce?  or at the very list a
 set of specific commands to run against the solr example including what
 request handler URLs to hit (so there's no risk of confusion about the ui
 javascript behavior) to see the problem?


 -Hoss

Re: Solr 4.0.0 - index version and generation not changed after delete by query on master

2012-10-23 Thread Chris Hostetter

: Just discovered that the replication admin REST API reports the correct
: index version and generation:
: 
: http://master_host:port/solr/replication?command=indexversion
: 
: So is this a bug in the admin UI?

Ya gotta be specific Bill: where in the admin UI do you think it's 
displaying the incorrect information?

The Admin UI just adds pretty markup to information fetched from the 
admin handlers using javascript, so if there is a problem it's either in 
the admin handlers, or in the javascript possibly caching the olds values.

Off the cuff, this reminds me of...

https://issues.apache.org/jira/browse/SOLR-3681

The root confusion there was that /admin/replication explicitly shows data 
about the commit point available for replication -- not the current commit 
point being searched on the master.

So if you are seeing a disconnect, then perhaps it's just that same 
descrepency? -- allthough if you are *only* seeing a disconnect after a 
deleteByQuery (and not after document adds, or a deleteById) then that 
does smell fishy, and makes me wonder if there is a code path where the 
userData for the commits aren't being set properly.

Can you file a bug with a unit test to reproduce?  or at the very list a 
set of specific commands to run against the solr example including what 
request handler URLs to hit (so there's no risk of confusion about the ui 
javascript behavior) to see the problem?


-Hoss

Re: Solr 4.0.0 - index version and generation not changed after delete by query on master

2012-10-19 Thread Erick Erickson

I wonder if you're getting hit by the browser caching the admin page and
serving up the old version? What happens if you try from a different
browser or purge the browser cache?

Of course you have to refresh the master admin page, there's no
automatic update but I assume you did that.

Best
Erick

On Thu, Oct 18, 2012 at 1:59 PM, Bill Au bill.w...@gmail.com wrote:
 Just discovered that the replication admin REST API reports the correct
 index version and generation:

 http://master_host:port/solr/replication?command=indexversion

 So is this a bug in the admin UI?

 Bill

 On Thu, Oct 18, 2012 at 11:34 AM, Bill Au bill.w...@gmail.com wrote:

 I just upgraded to Solr 4.0.0.  I noticed that after a delete by query,
 the index version, generation, and size remain unchanged on the master even
 though the documents have been deleted (num docs changed and those deleted
 documents no longer show up in query responses).  But on the slave both the
 index version, generation, and size are updated.  So I though the master
 and slave were out of sync but in reality that is not true.

 What's going on here?

 Bill

Re: Solr 4.0.0 - index version and generation not changed after delete by query on master

2012-10-19 Thread Bill Au

It's not the browser cache.  I have tried reloading the admin page and
accessing the admin page from another machine.  Both show the older index
version and generation.  On the slave, replication did kicked in and show
the new index version and generation for the slave.  But the slave admin
page also shows the older index version and generation for the master.

If I do a second delete by query on the master, the master index generation
reported the admin UI does go up by one on both the master and slave.  But
it is still one generation behind.

Bill

On Fri, Oct 19, 2012 at 7:09 AM, Erick Erickson erickerick...@gmail.comwrote:

 I wonder if you're getting hit by the browser caching the admin page and
 serving up the old version? What happens if you try from a different
 browser or purge the browser cache?

 Of course you have to refresh the master admin page, there's no
 automatic update but I assume you did that.

 Best
 Erick

 On Thu, Oct 18, 2012 at 1:59 PM, Bill Au bill.w...@gmail.com wrote:
  Just discovered that the replication admin REST API reports the correct
  index version and generation:
 
  http://master_host:port/solr/replication?command=indexversion
 
  So is this a bug in the admin UI?
 
  Bill
 
  On Thu, Oct 18, 2012 at 11:34 AM, Bill Au bill.w...@gmail.com wrote:
 
  I just upgraded to Solr 4.0.0.  I noticed that after a delete by query,
  the index version, generation, and size remain unchanged on the master
 even
  though the documents have been deleted (num docs changed and those
 deleted
  documents no longer show up in query responses).  But on the slave both
 the
  index version, generation, and size are updated.  So I though the master
  and slave were out of sync but in reality that is not true.
 
  What's going on here?
 
  Bill

Solr 4.0.0 - index version and generation not changed after delete by query on master

2012-10-18 Thread Bill Au

I just upgraded to Solr 4.0.0.  I noticed that after a delete by query, the
index version, generation, and size remain unchanged on the master even
though the documents have been deleted (num docs changed and those deleted
documents no longer show up in query responses).  But on the slave both the
index version, generation, and size are updated.  So I though the master
and slave were out of sync but in reality that is not true.

What's going on here?

Bill

Re: Solr 4.0.0 - index version and generation not changed after delete by query on master

2012-10-18 Thread Bill Au

Just discovered that the replication admin REST API reports the correct
index version and generation:

http://master_host:port/solr/replication?command=indexversion

So is this a bug in the admin UI?

Bill

On Thu, Oct 18, 2012 at 11:34 AM, Bill Au bill.w...@gmail.com wrote:

 I just upgraded to Solr 4.0.0.  I noticed that after a delete by query,
 the index version, generation, and size remain unchanged on the master even
 though the documents have been deleted (num docs changed and those deleted
 documents no longer show up in query responses).  But on the slave both the
 index version, generation, and size are updated.  So I though the master
 and slave were out of sync but in reality that is not true.

 What's going on here?

 Bill

Problem with delete by query in Solr 4.0 beta

2012-10-10 Thread Andrew Groh

I cannot seem to get delete by query working in my simple setup in Solr 4.0 
beta.

I have a single collection and I want to delete old documents from it.  There 
is a single solr node in the config (no replication, not distributed). This is 
something that I previously did in Solr 3.x

My collection is called dine, so I do:

curl  http://localhost:8080/solr/dine/update; -s -H 'Content-type:text/xml; 
charset=utf-8' -d deletequerytimestamp_dt:[2012-09-01T00:00:00Z TO 
2012-09-27T00:00:00Z]/query/delete

and then a commit.

The problem is that the documents are not delete.  When I run the same query in 
the admin page, it still returns documents.

I walked through the code and find the code in 
DistributedUpdateProcessor::doDeleteByQuery to be suspicious.

Specifically, vinfo is not null, but I have no version field, so versionsStored 
is false.

So it gets to line 786, which looks like:
if (versionsStored) {

That then skips to line 813 (the finally clause) skipping all calls to 
doLocalDelete

Now, I do confess I don't understand exactly how this code should work.  
However, in the add code, the check for versionsStored does not skip the call 
to doLocalAdd.

Any suggestions would be welcome.

Andrew

Re: Problem with delete by query in Solr 4.0 beta

2012-10-10 Thread Ahmet Arslan


 Do you have a _version_ field in
 your schema. I believe SOLR 4.0
 Beta requires that field.

Probably he is hitting this https://issues.apache.org/jira/browse/SOLR-3432

Re: delete by query don't work

2012-06-19 Thread vidhya

In order to clear all the indexed data please try to use this code
 private void Btn_Delete_Click(object sender, EventArgs e)
{
  
var solrUrl = this.textBoxSolrUrl.Text;
indexer.FixtureSetup(solrUrl);
indexer.Delete();

MessageBox.Show(Delete of files is completed);
}

 public void Delete()
{
var solr =
ServiceLocator.Current.GetInstanceISolrOperationslt;Document();

solr.Delete(new SolrQueryByField(id, *:*));

solr.Commit();
}

 Use this code to delete indivisual document

 solr.Delete(new SolrQueryByField(id, SP2514N));

Here particular id=SP2514N will be removed from the Indexed data. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/delete-by-query-don-t-work-tp3990077p3990243.html
Sent from the Solr - User mailing list archive at Nabble.com.

delete by query don't work

2012-06-18 Thread ramzesua

Hi all. I am using solr 4.0 and trying to clear index by query. At first I
use deletequery*:*/query/delete with commit, but index is still not
empty. I tried another queries, but it not help me. Then I tried delete by
`id`. It works fine, but I need clear all index. Can anyone help me?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/delete-by-query-don-t-work-tp3990077.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: delete by query don't work

2012-06-18 Thread Erick Erickson

Well, it would help if you defined what behavior you're seeing. When you
say delete-by-query doesn't work, what is the symptom? What does empty
mean? Because if you're just looking at your index directory and expecting
to see files disappear, you'll be disappointed.

When you delete documents in Solr, the docs are just marked as deleted, they
aren't physically removed until segments are merged. Does a query for *:* return
any documents after you delete-by-query?

Running an optimize after you do the delete will force merging to happen BTW.

If this doesn't help, please post the exact URLs you use, and what
your evidence that
the index isn't empty is.

Best
Erick

On Mon, Jun 18, 2012 at 5:45 AM, ramzesua michaelnaza...@gmail.com wrote:
 Hi all. I am using solr 4.0 and trying to clear index by query. At first I
 use deletequery*:*/query/delete with commit, but index is still not
 empty. I tried another queries, but it not help me. Then I tried delete by
 `id`. It works fine, but I need clear all index. Can anyone help me?


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/delete-by-query-don-t-work-tp3990077.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: delete by query don't work

2012-06-18 Thread Toke Eskildsen

On Mon, 2012-06-18 at 11:45 +0200, ramzesua wrote:
 Hi all. I am using solr 4.0 and trying to clear index by query. At first I
 use deletequery*:*/query/delete with commit, but index is still not
 empty. I tried another queries, but it not help me. Then I tried delete by
 `id`. It works fine, but I need clear all index. Can anyone help me?

It's a subtle bug/problem in the default schema. Fortunately it is
easily fixable. See https://issues.apache.org/jira/browse/SOLR-3432

Re: Delete by Query with limited number of rows

2011-11-14 Thread mikr00

Hi Erick, hi Yury,

thanks to your input I found a perfect solution for my case. Even though
this is not a solr-only solution, I will just briefly describe how it works
since it might be of interest to others:

I have put up a mysql database holding two tables. The first only has a
primarykey with auto-increment and nothing else. The second has a primarykey
but without auto-increment and also fields for the content I store in solr. 

Now, before I add something to the solr core, I add an entry to the first
mysql database. After the insertion, I get the primarykey for the action. I
check, whether it is above my limit of documents. If so, I empty the first
mysql table and reset the auto-increment to zero. I than insert a mysql
entry to the second table using the primarykey taken from the first table
(if the primarykey exists, I do not add an entry but update the existing
one). And finally I have a solr core which holds my searchable data and has
a uniquekey field. Into this core I add a new document by using the
primarykey from the first mysql table for the uniquekey field.

The solution has two main benefits for me:

- I can precisely control the number of documents in my solr core.
- I do now also have a backup of my data in mysql

Thank you very much for your help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-by-Query-with-limited-number-of-rows-tp3503094p3506380.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete by Query with limited number of rows

2011-11-13 Thread mikr00

Hi Yury,

thank you very much for your quick reply. Currently I have a timestamp field
(solr.DateField) and every time I add a document I use NOW for the
timestamp field. I only commit documents on the core every four hours. This
works fine with the timestamp since I can use NOW. However, I couldn't
figure out, how to define some kind of auto-increment for a particular
field. I think I can't handle this from outside since I can have several
adds in parallel from different clients. So I was wondering, whether there
could be a field type that could actually automatically increase it's value
for each added (commited) document? So that I could use a placeholder like
NOW in the case of the DateField to indicate that I would like to
auto-increment the field.

Cheers

Michael

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-by-Query-with-limited-number-of-rows-tp3503094p3504924.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete by Query with limited number of rows

2011-11-13 Thread Erick Erickson

There's nothing built into Solr that lets you do this automatically. About
the best you can do is probably a delete by query going back some fixed
time interval. So rather than keeping the last N documents, you keep
documents that are, say, no more than 1 month old (or whatever you
determine your interval is that allows you to keep around 1M docs
around).

Then you'd have to monitor this on a running system to see how close
you were to your target numbers

Best
Erick

On Sun, Nov 13, 2011 at 1:23 PM, mikr00 kruppa.mich...@gmail.com wrote:
Hi Yury,

thank you very much for your quick reply. Currently I have a timestamp field
(solr.DateField) and every time I add a document I use NOW for the
timestamp field. I only commit documents on the core every four hours. This
works fine with the timestamp since I can use NOW. However, I couldn't
figure out, how to define some kind of auto-increment for a particular
field. I think I can't handle this from outside since I can have several
adds in parallel from different clients. So I was wondering, whether there
could be a field type that could actually automatically increase it's value
for each added (commited) document? So that I could use a placeholder like
NOW in the case of the DateField to indicate that I would like to
auto-increment the field.

Cheers

Michael

--
View this message in context:
http://lucene.472066.n3.nabble.com/Delete-by-Query-with-limited-number-of-rows-tp3503094p3504924.html
Sent from the Solr - User mailing list archive at Nabble.com.

Delete by Query with limited number of rows

2011-11-12 Thread mikr00

I have the following problem and can't seem to find a solution:

I'm building up a frequently updated solr index. In order to deal with
limited ressources I would like to limit the total number of documents in
the index. In other words: I would like to declare that no more than (for
example) 1.000.000 documents should be in the index. Whenever new documents
are added (or better: when newly added documents are being committed), I
would like to:

- check, whether the limit is exceeded
- delete as many of the oldest documents from the index as necessary, such
that the limit is no longer exceeded.

Similar to a first in first out list. The problem is: It's easy to check the
limit, but how can I delete the oldest documents to go again below the
limit? Can I do it with a delete by query request? In that case, I would
probably have to limit the number of rows? But I can't seem to find a way to
do that. Or would you see a different solution (maybe there is a way to
configure the solr core such that it automatically behaves as desribed?)?

I would very much appreciate any help!

Thanks in Advance.

Cheers

Michael

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-by-Query-with-limited-number-of-rows-tp3503094p3503094.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete by Query with limited number of rows

2011-11-12 Thread Yury Kats

On 11/12/2011 4:08 PM, mikr00 wrote:
 Similar to a first in first out list. The problem is: It's easy to check the
 limit, but how can I delete the oldest documents to go again below the
 limit? Can I do it with a delete by query request? In that case, I would
 probably have to limit the number of rows? But I can't seem to find a way to
 do that. Or would you see a different solution (maybe there is a way to
 configure the solr core such that it automatically behaves as desribed?)?

You can certainly delete a set of documents using delete by query,
but you need to somehow identify what documents you want to have deleted.
For that, you'd need to have a field, such as a sequence number or a timestamp
when the document was added.

Alternatively, if you can control the uniqueKey field when adding documents,
you can just cycle it between 1 and 1,000,000. When you reach 1,000,000
set the uniqueKey back to 1 and keep adding. The new document will automatically
replace the old document with the key of 1.

delete by query

2011-03-28 Thread Gastone Penzo

Hi,
i want to use delete by query method to delete indexes.
i try for example:

http://10.0.0.178:8983/solr/update?stream.body=
deletequeryfield1:value/query/delete

and it works

but how can delete indexes by 2 filters?

http://10.0.0.178:8983/solr/update?stream.body=deletequeryfield1:value1
AND field2:value2/query/delete

it doesn't work. i need a logic AND cause i want solr delete indexes which
have field1 with value1 and field2 with value2.
is it possible?

thanx


-- 
Gastone Penzo
*www.solr-italia.it*
*The first italian blog about Apache Solr*

Re: delete by query

2011-03-28 Thread Gastone Penzo

i resolved:
http://10.0.0.178:8983/solr/update?stream.body=
deletequery(field1:value1)AND(field2:value2)/query/delete

Thanx

2011/3/28 Gastone Penzo gastone.pe...@gmail.com

 Hi,
 i want to use delete by query method to delete indexes.
 i try for example:

 http://10.0.0.178:8983/solr/update?stream.body=
 deletequeryfield1:value/query/delete

 and it works

 but how can delete indexes by 2 filters?

 http://10.0.0.178:8983/solr/update?stream.body=deletequeryfield1:value1
 AND field2:value2/query/delete

 it doesn't work. i need a logic AND cause i want solr delete indexes which
 have field1 with value1 and field2 with value2.
 is it possible?

 thanx


 --
 Gastone Penzo
 *www.solr-italia.it*
 *The first italian blog about Apache Solr*




-- 
Gastone Penzo
*www.solr-italia.it*
*The first italian blog about Apache Solr*

Re: Delete by query or Id very slow

2010-12-10 Thread bbarani


Hi,

As Tom suggested removing optimize and passing the ids as list (instead of
for loop) will surely increase the speed of deletion.

We have a program which fetches complete list of ID from back end (around 10
million) and compares it with the complete list of id's present in SOLR
document and deletes the document corresponding to the id's that are not
present in the DB from SOLR document.. The whole program takes just couple
of minutes..

You can have a separate batch program to optimize your index.. Optimization
will be really useful when you are using terms component as the terms wont
get refreshed unless you optimize your index..

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-by-query-or-Id-very-slow-tp2053699p2067952.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete by query or Id very slow

2010-12-09 Thread Ravi Kiran

Thank you Tom for responding. On an average the docs are around 25-35 KB.
The code is as follows, Kindly let me know if you see anything weird, a
second pair of eyes always helps :-)

public ListString deleteDocs(ListString ids) throws
SolrCustomException {
CommonsHttpSolrServer server = (CommonsHttpSolrServer)
getServerInstance();
ListString unsuccessful = new ArrayListString();
try {
if(ids!= null  !ids.isEmpty()) {
for(String id : ids) {
server.deleteById(id);
}
server.commit();
server.optimize();
}
}catch(IOException ioex) {
throw new SolrCustomException(IOException while deleting : ,
ioex);
}catch(SolrServerException solrex) {
throw new SolrCustomException(Could not delete : , solrex);
}

return unsuccessful;
}

private SolrServer getServerInstance() throws SolrCustomException {
if(server != null) {
return server;
} else {
String url = getServerURL();
log.debug(Server URL:  + url);
try {
server = new CommonsHttpSolrServer(url);
server.setSoTimeout(100); // socket read timeout
server.setConnectionTimeout(100);
server.setDefaultMaxConnectionsPerHost(1000);
server.setMaxTotalConnections(1000);
server.setFollowRedirects(false); // defaults to false
// allowCompression defaults to false.Server side must
support gzip or deflate for this to have any effect.
server.setAllowCompression(true);
server.setMaxRetries(1); // defaults to 0.  1 not
recommended.

} catch (MalformedURLException mex) {
throw new SolrCustomException(Cannot resolve Solr Server at
' + url + '\n, mex);
}
return server;
}
}

Thanks,

Ravi Kiran Bhaskar

On Wed, Dec 8, 2010 at 6:16 PM, Tom Hill solr-l...@worldware.com wrote:

 That''s a pretty low number of documents for auto complete. It means
 that when getting to 850,000 documents, you will create 8500 segments,
 and that's not counting merges.

 How big are your documents? I just created an 850,000 document (and a
 3.5 m doc index) with tiny documents (id and title), and they deleted
 quickly (17 milliseconds).

 Maybe if you post your delete code? Are you doing anything else (like
 commit/optimize?)

 Tom



 On Wed, Dec 8, 2010 at 12:55 PM, Ravi Kiran ravi.bhas...@gmail.com
 wrote:
  Hello,
 
  Iam using solr 1.4.1 when I delete by query or Id from solrj
 it
  is very very slow almost like a hang. The core form which Iam deleting
 has
  close to 850K documents in the index. In the solrconfig.xml autocommit is
  set as follows. Any idea how to speed up the deletion process. Please let
 me
  know if any more info is required
 
 
 
   updateHandler class=*solr.DirectUpdateHandler2*
 
 !-- Perform a commit/ automatically under certain conditions:
 
  maxDocs - number of updates since last commit is greater than
 this
 
  maxTime - oldest *uncommited* update (in *ms*) is this long ago
  --
 
 autoCommit
 
   maxDocs100/maxDocs
 
   maxTime12/maxTime
 
 /autoCommit
 
   /updateHandler
 
 
 
  Thanks,
 
 
 
  *Ravi Kiran Bhaskar*

Re: Delete by query or Id very slow

2010-12-09 Thread Tom Hill

I'd bet it's the optimize that's taking the time, and not the delete.
You don't really need to optimize these days, and you certainly don't
need to do it on every delete.

And you can give solr a list of ids to delete, which would be more efficient.

I don't believe you can tell which ones have failed, if any do, if you
delete with a list, but you are not using unsuccessful now anyway.

Tom
On Thu, Dec 9, 2010 at 7:55 AM, Ravi Kiran ravi.bhas...@gmail.com wrote:
 Thank you Tom for responding. On an average the docs are around 25-35 KB.
 The code is as follows, Kindly let me know if you see anything weird, a
 second pair of eyes always helps :-)

    public ListString deleteDocs(ListString ids) throws
 SolrCustomException {
        CommonsHttpSolrServer server = (CommonsHttpSolrServer)
 getServerInstance();
        ListString unsuccessful = new ArrayListString();
        try {
            if(ids!= null  !ids.isEmpty()) {
                for(String id : ids) {
                    server.deleteById(id);
                }
                server.commit();
                server.optimize();
            }
        }catch(IOException ioex) {
            throw new SolrCustomException(IOException while deleting : ,
 ioex);
        }catch(SolrServerException solrex) {
            throw new SolrCustomException(Could not delete : , solrex);
        }

        return unsuccessful;
    }

    private SolrServer getServerInstance() throws SolrCustomException {
        if(server != null) {
            return server;
        } else {
            String url = getServerURL();
            log.debug(Server URL:  + url);
            try {
                server = new CommonsHttpSolrServer(url);
                server.setSoTimeout(100); // socket read timeout
                server.setConnectionTimeout(100);
                server.setDefaultMaxConnectionsPerHost(1000);
                server.setMaxTotalConnections(1000);
                server.setFollowRedirects(false); // defaults to false
                // allowCompression defaults to false.Server side must
 support gzip or deflate for this to have any effect.
                server.setAllowCompression(true);
                server.setMaxRetries(1); // defaults to 0.  1 not
 recommended.

            } catch (MalformedURLException mex) {
                throw new SolrCustomException(Cannot resolve Solr Server at
 ' + url + '\n, mex);
            }
            return server;
        }
    }

 Thanks,

 Ravi Kiran Bhaskar

 On Wed, Dec 8, 2010 at 6:16 PM, Tom Hill solr-l...@worldware.com wrote:

 That''s a pretty low number of documents for auto complete. It means
 that when getting to 850,000 documents, you will create 8500 segments,
 and that's not counting merges.

 How big are your documents? I just created an 850,000 document (and a
 3.5 m doc index) with tiny documents (id and title), and they deleted
 quickly (17 milliseconds).

 Maybe if you post your delete code? Are you doing anything else (like
 commit/optimize?)

 Tom



 On Wed, Dec 8, 2010 at 12:55 PM, Ravi Kiran ravi.bhas...@gmail.com
 wrote:
  Hello,
 
              Iam using solr 1.4.1 when I delete by query or Id from solrj
 it
  is very very slow almost like a hang. The core form which Iam deleting
 has
  close to 850K documents in the index. In the solrconfig.xml autocommit is
  set as follows. Any idea how to speed up the deletion process. Please let
 me
  know if any more info is required
 
 
 
   updateHandler class=*solr.DirectUpdateHandler2*
 
     !-- Perform a commit/ automatically under certain conditions:
 
          maxDocs - number of updates since last commit is greater than
 this
 
          maxTime - oldest *uncommited* update (in *ms*) is this long ago
  --
 
     autoCommit
 
       maxDocs100/maxDocs
 
       maxTime12/maxTime
 
     /autoCommit
 
   /updateHandler
 
 
 
  Thanks,
 
 
 
  *Ravi Kiran Bhaskar*

Delete by query or Id very slow

2010-12-08 Thread Ravi Kiran

Hello,

 Iam using solr 1.4.1 when I delete by query or Id from solrj it
is very very slow almost like a hang. The core form which Iam deleting has
close to 850K documents in the index. In the solrconfig.xml autocommit is
set as follows. Any idea how to speed up the deletion process. Please let me
know if any more info is required



  updateHandler class=*solr.DirectUpdateHandler2*

!-- Perform a commit/ automatically under certain conditions:

 maxDocs - number of updates since last commit is greater than this

 maxTime - oldest *uncommited* update (in *ms*) is this long ago
--

autoCommit

  maxDocs100/maxDocs

  maxTime12/maxTime

/autoCommit

  /updateHandler



Thanks,



*Ravi Kiran Bhaskar*

Re: Delete by query or Id very slow

2010-12-08 Thread Tom Hill

That''s a pretty low number of documents for auto complete. It means
that when getting to 850,000 documents, you will create 8500 segments,
and that's not counting merges.

How big are your documents? I just created an 850,000 document (and a
3.5 m doc index) with tiny documents (id and title), and they deleted
quickly (17 milliseconds).

Maybe if you post your delete code? Are you doing anything else (like
commit/optimize?)

Tom



On Wed, Dec 8, 2010 at 12:55 PM, Ravi Kiran ravi.bhas...@gmail.com wrote:
 Hello,

             Iam using solr 1.4.1 when I delete by query or Id from solrj it
 is very very slow almost like a hang. The core form which Iam deleting has
 close to 850K documents in the index. In the solrconfig.xml autocommit is
 set as follows. Any idea how to speed up the deletion process. Please let me
 know if any more info is required



  updateHandler class=*solr.DirectUpdateHandler2*

    !-- Perform a commit/ automatically under certain conditions:

         maxDocs - number of updates since last commit is greater than this

         maxTime - oldest *uncommited* update (in *ms*) is this long ago
 --

    autoCommit

      maxDocs100/maxDocs

      maxTime12/maxTime

    /autoCommit

  /updateHandler



 Thanks,



 *Ravi Kiran Bhaskar*

Re: Delete by query issue

2010-08-26 Thread Chris Hostetter

: Here's the problem: the standard Solr parser is a little weird about
: negative queries. The way to make this work is to say
: *:* AND -field:[* TO *]

the default parser actually works ok ... it's a bug specific to 
deletion...
https://issues.apache.org/jira/browse/SOLR-381


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!

Delete by query issue

2010-08-25 Thread Max Lynch

Hi,
I am trying to delete all documents that have null values for a certain
field.  To that effect I can see all of the documents I want to delete by
doing this query:
-date_added_solr:[* TO *]

This returns about 32,000 documents.

However, when I try to put that into a curl call, no documents get deleted:
curl http://localhost:8985/solr/newsblog/update?commit=true -H
Content-Type: text/xml --data-binary 'deletequery-date_added_solr:[*
TO *]/query/delete'

Solr responds with:
response
lst name=responseHeaderint name=status0/intint
name=QTime364/int/lst
/response

But nothing happens, even if I explicitly issue a commit afterward.

Any ideas?

Thanks.

Re: Delete by query issue

2010-08-25 Thread 朱炎詹

Excuse me, what's the hyphen before  the field name 'date_added_solr'? Is this 
some kind of new query format that I didn't know?

deletequery-date_added_solr:[* TO *]/query/delete'

- Original Message - 
From: Max Lynch ihas...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thursday, August 26, 2010 6:12 AM
Subject: Delete by query issue


 Hi,
 I am trying to delete all documents that have null values for a certain
 field.  To that effect I can see all of the documents I want to delete by
 doing this query:
 -date_added_solr:[* TO *]
 
 This returns about 32,000 documents.
 
 However, when I try to put that into a curl call, no documents get deleted:
 curl http://localhost:8985/solr/newsblog/update?commit=true -H
 Content-Type: text/xml --data-binary 'deletequery-date_added_solr:[*
 TO *]/query/delete'
 
 Solr responds with:
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime364/int/lst
 /response
 
 But nothing happens, even if I explicitly issue a commit afterward.
 
 Any ideas?
 
 Thanks.







%b6G$J0T.'$$'d(l/f,r!C
Checked by AVG - www.avg.com 
Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10 
14:34:00

Re: Delete by query issue

2010-08-25 Thread Max Lynch

I was trying to filter out all documents that HAVE that field.  I was trying
to delete any documents where that field had empty values.

I just found a way to do it, but I did a range query on a string date in the
Lucene DateTools format and it worked, so I'm satisfied.  However, I believe
it worked because all of my documents have values for that field.

Oh well.

-max

On Wed, Aug 25, 2010 at 9:45 PM, scott chu (朱炎詹) scott@udngroup.comwrote:

 Excuse me, what's the hyphen before  the field name 'date_added_solr'? Is
 this some kind of new query format that I didn't know?

 deletequery-date_added_solr:[* TO *]/query/delete'

 - Original Message -
 From: Max Lynch ihas...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, August 26, 2010 6:12 AM
 Subject: Delete by query issue


  Hi,
  I am trying to delete all documents that have null values for a certain
  field.  To that effect I can see all of the documents I want to delete by
  doing this query:
  -date_added_solr:[* TO *]
 
  This returns about 32,000 documents.
 
  However, when I try to put that into a curl call, no documents get
 deleted:
  curl http://localhost:8985/solr/newsblog/update?commit=true -H
  Content-Type: text/xml --data-binary
 'deletequery-date_added_solr:[*
  TO *]/query/delete'
 
  Solr responds with:
  response
  lst name=responseHeaderint name=status0/intint
  name=QTime364/int/lst
  /response
 
  But nothing happens, even if I explicitly issue a commit afterward.
 
  Any ideas?
 
  Thanks.
 



 



 %b6G$J0T.'$$'d(l/f,r!C
 Checked by AVG - www.avg.com
 Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10
 14:34:00

Re: Delete by query issue

2010-08-25 Thread Lance Norskog

Here's the problem: the standard Solr parser is a little weird about
negative queries. The way to make this work is to say
*:* AND -field:[* TO *]

This means select everything AND only these documents without a value
in the field.

On Wed, Aug 25, 2010 at 7:55 PM, Max Lynch ihas...@gmail.com wrote:
 I was trying to filter out all documents that HAVE that field.  I was trying
 to delete any documents where that field had empty values.

 I just found a way to do it, but I did a range query on a string date in the
 Lucene DateTools format and it worked, so I'm satisfied.  However, I believe
 it worked because all of my documents have values for that field.

 Oh well.

 -max

 On Wed, Aug 25, 2010 at 9:45 PM, scott chu (朱炎詹) 
 scott@udngroup.comwrote:

 Excuse me, what's the hyphen before  the field name 'date_added_solr'? Is
 this some kind of new query format that I didn't know?

 deletequery-date_added_solr:[* TO *]/query/delete'

 - Original Message -
 From: Max Lynch ihas...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, August 26, 2010 6:12 AM
 Subject: Delete by query issue


  Hi,
  I am trying to delete all documents that have null values for a certain
  field.  To that effect I can see all of the documents I want to delete by
  doing this query:
  -date_added_solr:[* TO *]
 
  This returns about 32,000 documents.
 
  However, when I try to put that into a curl call, no documents get
 deleted:
  curl http://localhost:8985/solr/newsblog/update?commit=true -H
  Content-Type: text/xml --data-binary
 'deletequery-date_added_solr:[*
  TO *]/query/delete'
 
  Solr responds with:
  response
  lst name=responseHeaderint name=status0/intint
  name=QTime364/int/lst
  /response
 
  But nothing happens, even if I explicitly issue a commit afterward.
 
  Any ideas?
 
  Thanks.
 



 



 %b6G$J0T.'$$'d(l/f,r!C
 Checked by AVG - www.avg.com
 Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10
 14:34:00





-- 
Lance Norskog
goks...@gmail.com

Re: Delete by query issue

2010-08-25 Thread Max Lynch

Thanks Lance.  I'll give that a try going forward.

On Wed, Aug 25, 2010 at 9:59 PM, Lance Norskog goks...@gmail.com wrote:

 Here's the problem: the standard Solr parser is a little weird about
 negative queries. The way to make this work is to say
*:* AND -field:[* TO *]

 This means select everything AND only these documents without a value
 in the field.

 On Wed, Aug 25, 2010 at 7:55 PM, Max Lynch ihas...@gmail.com wrote:
  I was trying to filter out all documents that HAVE that field.  I was
 trying
  to delete any documents where that field had empty values.
 
  I just found a way to do it, but I did a range query on a string date in
 the
  Lucene DateTools format and it worked, so I'm satisfied.  However, I
 believe
  it worked because all of my documents have values for that field.
 
  Oh well.
 
  -max
 
  On Wed, Aug 25, 2010 at 9:45 PM, scott chu (朱炎詹) scott@udngroup.com
 wrote:
 
  Excuse me, what's the hyphen before  the field name 'date_added_solr'?
 Is
  this some kind of new query format that I didn't know?
 
  deletequery-date_added_solr:[* TO *]/query/delete'
 
  - Original Message -
  From: Max Lynch ihas...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Thursday, August 26, 2010 6:12 AM
  Subject: Delete by query issue
 
 
   Hi,
   I am trying to delete all documents that have null values for a
 certain
   field.  To that effect I can see all of the documents I want to delete
 by
   doing this query:
   -date_added_solr:[* TO *]
  
   This returns about 32,000 documents.
  
   However, when I try to put that into a curl call, no documents get
  deleted:
   curl http://localhost:8985/solr/newsblog/update?commit=true -H
   Content-Type: text/xml --data-binary
  'deletequery-date_added_solr:[*
   TO *]/query/delete'
  
   Solr responds with:
   response
   lst name=responseHeaderint name=status0/intint
   name=QTime364/int/lst
   /response
  
   But nothing happens, even if I explicitly issue a commit afterward.
  
   Any ideas?
  
   Thanks.
  
 
 
 
 
 
 
 
 
  %b6G$J0T.'$$'d(l/f,r!C
  Checked by AVG - www.avg.com
  Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10
  14:34:00
 
 



 --
 Lance Norskog
 goks...@gmail.com

Delete by query discrepancy

2010-02-16 Thread Mat Brown

Hi all,

Trying to debug a very sneaky bug in a small Solr extension that I
wrote, and I've come across an odd situation. Here's what my test
suite does:

deleteByQuery(*:*);
// add some documents
commit();
// test the search

This works fine. The test suite that exposed the error (which is
actually for a Ruby client library I maintain) was doing almost the
exact same thing, with one exception - the deleteByQuery() passed the
query type:[* TO *] instead of *:* (in an attempt to isolate the
error, I made sure that the input document and search parameters were
identical between the two test suites).

In the schema, the type field has at least one value for every
document (in practice it has more than one for all the documents in
this test suite). Changing the test setup code to pass type:[* TO *]
to deleteByQuery() causes it to fail.

So I'm a bit confused - wouldn't deleteByQuery(type:[* TO *]) have
the same effect as deleteByQuery(*:*), assuming every document has a
value for the type field? Or is there something subtler going on in
the internals - perhaps optimizing the *:* deleteByQuery() to just
tear down the whole index and build a new one from scratch? Something
that might have some subtle side effect? Now that I'm finally able to
reproduce the error in my extension's test suite, I can start actually
figuring out what's causing it, but I was surprised to find out that
the deleteByQuery() query is what makes the difference between passing
and failing.

Any insight much appreciated!

Mat Brown

Re: Delete by query discrepancy

2010-02-16 Thread Mark Miller

Mat Brown wrote:
 Hi all,

 Trying to debug a very sneaky bug in a small Solr extension that I
 wrote, and I've come across an odd situation. Here's what my test
 suite does:

 deleteByQuery(*:*);
 // add some documents
 commit();
 // test the search

 This works fine. The test suite that exposed the error (which is
 actually for a Ruby client library I maintain) was doing almost the
 exact same thing, with one exception - the deleteByQuery() passed the
 query type:[* TO *] instead of *:* (in an attempt to isolate the
 error, I made sure that the input document and search parameters were
 identical between the two test suites).

 In the schema, the type field has at least one value for every
 document (in practice it has more than one for all the documents in
 this test suite). Changing the test setup code to pass type:[* TO *]
 to deleteByQuery() causes it to fail.

 So I'm a bit confused - wouldn't deleteByQuery(type:[* TO *]) have
 the same effect as deleteByQuery(*:*), assuming every document has a
 value for the type field? Or is there something subtler going on in
 the internals - perhaps optimizing the *:* deleteByQuery() to just
 tear down the whole index and build a new one from scratch? Something
 that might have some subtle side effect? Now that I'm finally able to
 reproduce the error in my extension's test suite, I can start actually
 figuring out what's causing it, but I was surprised to find out that
 the deleteByQuery() query is what makes the difference between passing
 and failing.

 Any insight much appreciated!

 Mat Brown
   
Not sure why the tests would would be affected, but yes, Solr detects a
delete of *:* and just creates a new
index instead of deleting every document.

-- 
- Mark

http://www.lucidimagination.com

1 2 >

1 - 100 of 115 matches

Mail list logo