Re: Solr exceptions during batch indexing

2014-11-08 Thread Erick Erickson
bq: Just trying to understand what's the challenge in returning the bad doc

Mostly, nobody has done it yet. There's some complication about
async updates, ConcurrentUpdateSolrServer for instance. I suspect
also that one has to write error handling logic in the client anyway
so the motivation is reduced.

And now it would need to handle SolrCloud mode.

All that said, this has bugged me for a long time, but I haven't gotten around
to it. Which says something about the priority I suspect.

FWIW,
Erick

On Sat, Nov 8, 2014 at 2:51 AM, Anurag Sharma  wrote:
> Just trying to understand what's the challenge in returning the bad doc
> id(s)?
> Solr already know which doc(s) failed on update and can return their id(s)
> in response or callback. Can we have JIRA ticket on it if it doesn't exist?
>
> This looks like a common use case and every solr consumer might be writing
> their own versions to handle this issue.
>
> On Sat, Nov 8, 2014 at 1:17 AM, Walter Underwood 
> wrote:
>
>> Right, that is why we batch.
>>
>> When a batch of 1000 fails, drop to a batch size of 1 and start the batch
>> over. Then it can report the exact document with problems.
>>
>> If you want to continue, go back to the bigger batch size. I usually fail
>> the whole batch on one error.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/
>>
>>
>> On Nov 7, 2014, at 11:44 AM, Peter Keegan  wrote:
>>
>> > I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a
>> single
>> > thread, so it's certainly worth it.
>> >
>> > Thanks,
>> > Peter
>> >
>> >
>> > On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson 
>> > wrote:
>> >
>> >> And Walter has also been around for a _long_ time ;)
>> >>
>> >> (sorry, couldn't resist)
>> >>
>> >> Erick
>> >>
>> >> On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood <
>> wun...@wunderwood.org>
>> >> wrote:
>> >>> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
>> >>>
>> >>> It isn’t to hard if the code is structured for it; retry with a batch
>> >> size of 1.
>> >>>
>> >>> wunder
>> >>>
>> >>> On Nov 7, 2014, at 11:01 AM, Erick Erickson 
>> >> wrote:
>> >>>
>>  Yeah, this has been an ongoing issue for a _long_ time. Basically,
>>  you can't. So far, people have essentially written fallback logic to
>>  index the docs of a failing packet one at a time and report it.
>> 
>>  I'd really like better reporting back, but we haven't gotten there
>> yet.
>> 
>>  Best,
>>  Erick
>> 
>>  On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan 
>> >> wrote:
>> > How are folks handling Solr exceptions that occur during batch
>> >> indexing?
>> > Solr stops parsing the docs stream when an error occurs (e.g. a doc
>> >> with a
>> > missing mandatory field), and stops indexing the batch. The bad
>> >> document is
>> > not identified, so it would be hard for the client to recover by
>> >> skipping
>> > over it.
>> >
>> > Peter
>> >>>
>> >>
>>
>>


Re: Solr exceptions during batch indexing

2014-11-08 Thread Anurag Sharma
Just trying to understand what's the challenge in returning the bad doc
id(s)?
Solr already know which doc(s) failed on update and can return their id(s)
in response or callback. Can we have JIRA ticket on it if it doesn't exist?

This looks like a common use case and every solr consumer might be writing
their own versions to handle this issue.

On Sat, Nov 8, 2014 at 1:17 AM, Walter Underwood 
wrote:

> Right, that is why we batch.
>
> When a batch of 1000 fails, drop to a batch size of 1 and start the batch
> over. Then it can report the exact document with problems.
>
> If you want to continue, go back to the bigger batch size. I usually fail
> the whole batch on one error.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Nov 7, 2014, at 11:44 AM, Peter Keegan  wrote:
>
> > I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a
> single
> > thread, so it's certainly worth it.
> >
> > Thanks,
> > Peter
> >
> >
> > On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson 
> > wrote:
> >
> >> And Walter has also been around for a _long_ time ;)
> >>
> >> (sorry, couldn't resist)
> >>
> >> Erick
> >>
> >> On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood <
> wun...@wunderwood.org>
> >> wrote:
> >>> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
> >>>
> >>> It isn’t to hard if the code is structured for it; retry with a batch
> >> size of 1.
> >>>
> >>> wunder
> >>>
> >>> On Nov 7, 2014, at 11:01 AM, Erick Erickson 
> >> wrote:
> >>>
>  Yeah, this has been an ongoing issue for a _long_ time. Basically,
>  you can't. So far, people have essentially written fallback logic to
>  index the docs of a failing packet one at a time and report it.
> 
>  I'd really like better reporting back, but we haven't gotten there
> yet.
> 
>  Best,
>  Erick
> 
>  On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan 
> >> wrote:
> > How are folks handling Solr exceptions that occur during batch
> >> indexing?
> > Solr stops parsing the docs stream when an error occurs (e.g. a doc
> >> with a
> > missing mandatory field), and stops indexing the batch. The bad
> >> document is
> > not identified, so it would be hard for the client to recover by
> >> skipping
> > over it.
> >
> > Peter
> >>>
> >>
>
>


Re: Solr exceptions during batch indexing

2014-11-07 Thread Walter Underwood
Right, that is why we batch.

When a batch of 1000 fails, drop to a batch size of 1 and start the batch over. 
Then it can report the exact document with problems.

If you want to continue, go back to the bigger batch size. I usually fail the 
whole batch on one error.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Nov 7, 2014, at 11:44 AM, Peter Keegan  wrote:

> I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a single
> thread, so it's certainly worth it.
> 
> Thanks,
> Peter
> 
> 
> On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson 
> wrote:
> 
>> And Walter has also been around for a _long_ time ;)
>> 
>> (sorry, couldn't resist)
>> 
>> Erick
>> 
>> On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood 
>> wrote:
>>> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
>>> 
>>> It isn’t to hard if the code is structured for it; retry with a batch
>> size of 1.
>>> 
>>> wunder
>>> 
>>> On Nov 7, 2014, at 11:01 AM, Erick Erickson 
>> wrote:
>>> 
 Yeah, this has been an ongoing issue for a _long_ time. Basically,
 you can't. So far, people have essentially written fallback logic to
 index the docs of a failing packet one at a time and report it.
 
 I'd really like better reporting back, but we haven't gotten there yet.
 
 Best,
 Erick
 
 On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan 
>> wrote:
> How are folks handling Solr exceptions that occur during batch
>> indexing?
> Solr stops parsing the docs stream when an error occurs (e.g. a doc
>> with a
> missing mandatory field), and stops indexing the batch. The bad
>> document is
> not identified, so it would be hard for the client to recover by
>> skipping
> over it.
> 
> Peter
>>> 
>> 



Re: Solr exceptions during batch indexing

2014-11-07 Thread Peter Keegan
I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a single
thread, so it's certainly worth it.

Thanks,
Peter


On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson 
wrote:

> And Walter has also been around for a _long_ time ;)
>
> (sorry, couldn't resist)
>
> Erick
>
> On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood 
> wrote:
> > Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
> >
> > It isn’t to hard if the code is structured for it; retry with a batch
> size of 1.
> >
> > wunder
> >
> > On Nov 7, 2014, at 11:01 AM, Erick Erickson 
> wrote:
> >
> >> Yeah, this has been an ongoing issue for a _long_ time. Basically,
> >> you can't. So far, people have essentially written fallback logic to
> >> index the docs of a failing packet one at a time and report it.
> >>
> >> I'd really like better reporting back, but we haven't gotten there yet.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan 
> wrote:
> >>> How are folks handling Solr exceptions that occur during batch
> indexing?
> >>> Solr stops parsing the docs stream when an error occurs (e.g. a doc
> with a
> >>> missing mandatory field), and stops indexing the batch. The bad
> document is
> >>> not identified, so it would be hard for the client to recover by
> skipping
> >>> over it.
> >>>
> >>> Peter
> >
>


Re: Solr exceptions during batch indexing

2014-11-07 Thread Erick Erickson
And Walter has also been around for a _long_ time ;)

(sorry, couldn't resist)

Erick

On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood  wrote:
> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
>
> It isn’t to hard if the code is structured for it; retry with a batch size of 
> 1.
>
> wunder
>
> On Nov 7, 2014, at 11:01 AM, Erick Erickson  wrote:
>
>> Yeah, this has been an ongoing issue for a _long_ time. Basically,
>> you can't. So far, people have essentially written fallback logic to
>> index the docs of a failing packet one at a time and report it.
>>
>> I'd really like better reporting back, but we haven't gotten there yet.
>>
>> Best,
>> Erick
>>
>> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan  wrote:
>>> How are folks handling Solr exceptions that occur during batch indexing?
>>> Solr stops parsing the docs stream when an error occurs (e.g. a doc with a
>>> missing mandatory field), and stops indexing the batch. The bad document is
>>> not identified, so it would be hard for the client to recover by skipping
>>> over it.
>>>
>>> Peter
>


Re: Solr exceptions during batch indexing

2014-11-07 Thread Walter Underwood
Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.

It isn’t to hard if the code is structured for it; retry with a batch size of 1.

wunder

On Nov 7, 2014, at 11:01 AM, Erick Erickson  wrote:

> Yeah, this has been an ongoing issue for a _long_ time. Basically,
> you can't. So far, people have essentially written fallback logic to
> index the docs of a failing packet one at a time and report it.
> 
> I'd really like better reporting back, but we haven't gotten there yet.
> 
> Best,
> Erick
> 
> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan  wrote:
>> How are folks handling Solr exceptions that occur during batch indexing?
>> Solr stops parsing the docs stream when an error occurs (e.g. a doc with a
>> missing mandatory field), and stops indexing the batch. The bad document is
>> not identified, so it would be hard for the client to recover by skipping
>> over it.
>> 
>> Peter



Re: Solr exceptions during batch indexing

2014-11-07 Thread Erick Erickson
Yeah, this has been an ongoing issue for a _long_ time. Basically,
you can't. So far, people have essentially written fallback logic to
index the docs of a failing packet one at a time and report it.

I'd really like better reporting back, but we haven't gotten there yet.

Best,
Erick

On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan  wrote:
> How are folks handling Solr exceptions that occur during batch indexing?
> Solr stops parsing the docs stream when an error occurs (e.g. a doc with a
> missing mandatory field), and stops indexing the batch. The bad document is
> not identified, so it would be hard for the client to recover by skipping
> over it.
>
> Peter


Solr exceptions during batch indexing

2014-11-07 Thread Peter Keegan
How are folks handling Solr exceptions that occur during batch indexing?
Solr stops parsing the docs stream when an error occurs (e.g. a doc with a
missing mandatory field), and stops indexing the batch. The bad document is
not identified, so it would be hard for the client to recover by skipping
over it.

Peter