Re: A Last Message to the Solr Users

2019-11-30 Thread Mark Miller
I’d also like to say the last 5 years of my life have been spent being paid
to upgrade Solr systems. I’ve made a lot of doing this.

As I said from the start - take this for what it’s worst. For his guys it’s
not worth much. That’s cool.

And it’s a little inside joke that I’ll be back :) I joke a lot.

But seriously, you have a second chance here.

This mostly concerns SolrCloud. That’s why I recommend standalone mode. But
key people know why to do. I know it will happen - but their lives will be
easier if you help.

Lol.

- Mark

On Sat, Nov 30, 2019 at 9:25 PM Mark Miller  wrote:

> I said the key people understand :)
>
> I’ve worked in Lucene since 2006 and have an insane amount of the code
> foot print in Solr and SolrCloud :) Look up the stats. Do you have any
> contributions?
>
> I said the key people know.
>
> Solr stand-alone is and has been very capable. People are working around
> SolrCloud too.All fine and good. Millions are being made and saved.
> Everyone is comfortable. Some might thinks the sky looks clear and blue.
> I’ve spent a lot of capital to make sure the wrong people don’t think that
> anymore ;)
>
> Unless you are a Developer, you won’t understand the other issues. But you
> don’t need too.
>
> Mark
>
> On Sat, Nov 30, 2019 at 7:05 PM Dave  wrote:
>
>> I’m young here I think, not even 40 and only been using solr since like
>> 2008 or so, so like 1.4 give or take. But I know a really good therapist if
>> you want to talk about it.
>>
>> > On Nov 30, 2019, at 6:56 PM, Mark Miller  wrote:
>> >
>> > Now I have sacrificed to give you a new chance. A little for my
>> community.
>> > It was my community. But it was mostly for me. The developer I started
>> as
>> > would kick my ass today.  Circumstances and luck has brought money to
>> our
>> > project. And it has corrupted our process, our community, and our code.
>> >
>> > In college i would talk about past Mark screwing future Mark and too bad
>> > for him. What did he ever do for me? Well, he got me again ;)
>> >
>> > I’m out of steam, time and wife patentice.
>> >
>> > Enough key people are aware of the scope of the problem now that you
>> won’t
>> > need me. I was never actually part of the package. To the many, many
>> people
>> > that offered me private notes of encouragement and future help - thank
>> you
>> > so much. Your help will be needed.
>> >
>> > You will reset. You will fix this. Or I will be back.
>> >
>> > Mark
>> >
>> >
>> > --
>> > - Mark
>> >
>> > http://about.me/markrmiller
>>
> --
> - Mark
>
> http://about.me/markrmiller
>
-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-30 Thread Mark Miller
I said the key people understand :)

I’ve worked in Lucene since 2006 and have an insane amount of the code foot
print in Solr and SolrCloud :) Look up the stats. Do you have any
contributions?

I said the key people know.

Solr stand-alone is and has been very capable. People are working around
SolrCloud too.All fine and good. Millions are being made and saved.
Everyone is comfortable. Some might thinks the sky looks clear and blue.
I’ve spent a lot of capital to make sure the wrong people don’t think that
anymore ;)

Unless you are a Developer, you won’t understand the other issues. But you
don’t need too.

Mark

On Sat, Nov 30, 2019 at 7:05 PM Dave  wrote:

> I’m young here I think, not even 40 and only been using solr since like
> 2008 or so, so like 1.4 give or take. But I know a really good therapist if
> you want to talk about it.
>
> > On Nov 30, 2019, at 6:56 PM, Mark Miller  wrote:
> >
> > Now I have sacrificed to give you a new chance. A little for my
> community.
> > It was my community. But it was mostly for me. The developer I started as
> > would kick my ass today.  Circumstances and luck has brought money to our
> > project. And it has corrupted our process, our community, and our code.
> >
> > In college i would talk about past Mark screwing future Mark and too bad
> > for him. What did he ever do for me? Well, he got me again ;)
> >
> > I’m out of steam, time and wife patentice.
> >
> > Enough key people are aware of the scope of the problem now that you
> won’t
> > need me. I was never actually part of the package. To the many, many
> people
> > that offered me private notes of encouragement and future help - thank
> you
> > so much. Your help will be needed.
> >
> > You will reset. You will fix this. Or I will be back.
> >
> > Mark
> >
> >
> > --
> > - Mark
> >
> > http://about.me/markrmiller
>
-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-30 Thread Mark Miller
Now I have sacrificed to give you a new chance. A little for my community.
It was my community. But it was mostly for me. The developer I started as
would kick my ass today.  Circumstances and luck has brought money to our
project. And it has corrupted our process, our community, and our code.

In college i would talk about past Mark screwing future Mark and too bad
for him. What did he ever do for me? Well, he got me again ;)

I’m out of steam, time and wife patentice.

Enough key people are aware of the scope of the problem now that you won’t
need me. I was never actually part of the package. To the many, many people
that offered me private notes of encouragement and future help - thank you
so much. Your help will be needed.

You will reset. You will fix this. Or I will be back.

Mark


-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller
It’s going to haunt me if I don’t bring up Hossman. I don’t feel I have to
because who doesn’t know him.

He is a treasure that doesn’t spend much time on SolrCloud and has checked
out of leadership for the large part for reasons I won’t argue with.

Why doesn’t he do much with SolrCloud in a real way? I can only guess. He
will tell you it’s above his pay grade or some dumb shit.

IMO, it’s probably more that super thorough people try to be thorough with
SolrCloud and when you do that, it will poke your eye out with a stick. And
then throw you over a cliff.

Make it something he can work on more than tangentially.

Mark
-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller
I’m including this response to a private email because it’s not something
I’ve brought up and I also think it’s a critical note:

“Yes. That is our biggest advantage. Being Apache. Almost no one seems to
be employed to help other contributors get their work in at the right
level, and all the money has ensured the end of the hobbyist. I hope that
changes too.”

-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller
Yes. That is our biggest advantage. Being Apache. Almost no one seems to be
employed to help other contributors get their work in at the right level,
and all the money has ensured the end of the hobbyist. I hope that changes
too.

Thanks for the note.

Mark

On Thu, Nov 28, 2019 at 1:55 PM Paras Lehana 
wrote:

> Hey Mark,
>
> I was actually expecting (and wanting) this after your LinkedIn post.
>
> At this point, the best way to use Solr is as it’s always been - avoid
>> SolrCloud and setup your own system in standalone mode.
>
>
> That's what I have been telling people who are just getting started with
> Solr and thinking that SolrCloud is actually something superior to the
> standalone mode. That may depend on the use case, but for me, I always
> prefer to achieve things from standalone perspective instead of investing
> my time over switching to Cloud.
>
> I handle Auto-Suggest at IndiaMART. We have over 60 million docs. Single
> server of *standalone* Solr is capable of handling 800 req/sec. In fact,
> on production, we get ~300 req/sec and the single Solr is still able to
> provide responses within 25 ms!
>
> Anyways, I don't think that the project was a failure. All these were the
> small drops of the big Solr Ocean. We, the community and you, tried, we
> tested and we are still here as the open community of one of the most
> powerful search platforms. SolrCloud was also needed to be introduced at
> some time. Notwithstanding, I do think that the project needs to be more
> open with community commits. The community and open-sourceness of Solr is
> what I used to love over those of ElasticSearch's.
>
> Anyways, keep rocking! You have already left your footprints into the
> history of this beast project. 🤘
>
> On Thu, 28 Nov 2019 at 09:10, Mark Miller  wrote:
>
>> Now one company thinks I’m after them because they were the main source of
>> the jokes.
>>
>> Companies is not a typo.
>>
>> If you are using Solr to make or save tons of money or run your business
>> and you employee developers, please include yourself in this list.
>>
>> You are taking and in my opinion Solr is going down. It’s all against your
>> own interest even.
>>
>> I know of enough people that want to solve this now, that it’s likely only
>> a matter of time before they fix the situation - you ever know though.
>> Things change, people get new jobs, jobs change. It will take at least 3-6
>> months to make things reasonable even with a good group banding together.
>>
>> But if you are extracting value from this project and have Solr developers
>> - id like to think you have enough of a stake in this to think about
>> changing the approach everyone has been taking. It’s not working, and the
>> longer it goes on, the harder it’s getting to fix things.
>>
>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
>
> <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-28 Thread Mark Miller
The people I have identified that I have the most faith in to lead the
fixing of Solr are Ishan, Noble and David. I encourage you all to look at
and follow and join in their leadership.

You can do this.


Mark
-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-27 Thread Mark Miller
Now one company thinks I’m after them because they were the main source of
the jokes.

Companies is not a typo.

If you are using Solr to make or save tons of money or run your business
and you employee developers, please include yourself in this list.

You are taking and in my opinion Solr is going down. It’s all against your
own interest even.

I know of enough people that want to solve this now, that it’s likely only
a matter of time before they fix the situation - you ever know though.
Things change, people get new jobs, jobs change. It will take at least 3-6
months to make things reasonable even with a good group banding together.

But if you are extracting value from this project and have Solr developers
- id like to think you have enough of a stake in this to think about
changing the approach everyone has been taking. It’s not working, and the
longer it goes on, the harder it’s getting to fix things.


-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-27 Thread Mark Miller
If SolrCloud worked well I’d still agree both options are very valid
depending on your use case. As it is, I’m embarrassed that people give me
any credit for this. I’m here to try and delight users and I have failed in
that. I tried to put a lot of my own time to address things outside of
working on my job of integrating Hadoop and upgrading Solr 4 instances for
years. But I couldn’t convince anyone of what was necessary to address what
has been happening, and my paid job has always been doing other things
since 2012.

On Wed, Nov 27, 2019 at 6:23 PM David Hastings 
wrote:

> Personally I found nothing in solr cloud worth changing from standalone
> for, and just added more complications, more servers, and required becoming
> an expert/knowledgeable in zoo keeper, id rather spend my time developing
> than becoming a systems administrator
>
> On Wed, Nov 27, 2019 at 3:45 AM Mark Miller  wrote:
>
>> This is your queue to come and make your jokes with your name attached.
>> I’m
>> sure the Solr users will appreciate them more than I do. I can’t laugh at
>> this situation because I take production code seriously.
>>
>> --
>> - Mark
>>
>> http://about.me/markrmiller
>>
> --
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-27 Thread Mark Miller
This is your queue to come and make your jokes with your name attached. I’m
sure the Solr users will appreciate them more than I do. I can’t laugh at
this situation because I take production code seriously.

-- 
- Mark

http://about.me/markrmiller


Re: A Last Message to the Solr Users

2019-11-27 Thread Mark Miller
And if you are a developer, enjoy that Gradle build! It was the highlight
of my year.

On Wed, Nov 27, 2019 at 10:00 AM Mark Miller  wrote:

> If you have a SolrCloud installation that is somehow working for you,
> personally I would never upgrade. The software is getting progressively
> more unstable every release.
>
>
> I wrote most of the core of SolrCloud in a prototype fashion many, many
> years ago. Only Yonik’s isolated work is solid and most of my work still
> stands as it was. This situation has me abandoning that project so that
> people understand I won’t stand by garbage work.
>
> Given that no one seems to understand what is happening in SolrCloud under
> the covers or how it was intended to work, their best bet is to start
> rewriting. Until they do this, I recommend you do not upgrade from an
> install that is working for your needs. A new feature will not be worth the
> headaches.
>
>
> Some of the other committers, who certainly do not understand the scope of
> the problem or my code (they would have touched it a bit if they did) would
> prefer to laugh or form a defensive posture than fix the situation. Wait
> them out. The project will collapse or get better. If I ran a production
> instance of SolrCloud, I would wait to see which happens first before
> embracing any update.
>
>
> At this point, the best way to use Solr is as it’s always been - avoid
> SolrCloud and setup your own system in standalone mode. If I had to build a
> new Solr install today, this is what I would do.
>
>
> In my opinion, the companies that have been claiming to back Solr and
> SolrCloud have been negligent, and all of the users are paying the price.
> It hasn’t been my job to work on it in any real fashion since 2012. I’m
> sorry I couldn’t help improve the situation for you.
>
>
> Take it for what it’s worth. To some, not much I’m sure.
>
>
> Mark Miller
> --
> - Mark
>
> http://about.me/markrmiller
>
-- 
- Mark

http://about.me/markrmiller


A Last Message to the Solr Users

2019-11-27 Thread Mark Miller
If you have a SolrCloud installation that is somehow working for you,
personally I would never upgrade. The software is getting progressively
more unstable every release.


I wrote most of the core of SolrCloud in a prototype fashion many, many
years ago. Only Yonik’s isolated work is solid and most of my work still
stands as it was. This situation has me abandoning that project so that
people understand I won’t stand by garbage work.

Given that no one seems to understand what is happening in SolrCloud under
the covers or how it was intended to work, their best bet is to start
rewriting. Until they do this, I recommend you do not upgrade from an
install that is working for your needs. A new feature will not be worth the
headaches.


Some of the other committers, who certainly do not understand the scope of
the problem or my code (they would have touched it a bit if they did) would
prefer to laugh or form a defensive posture than fix the situation. Wait
them out. The project will collapse or get better. If I ran a production
instance of SolrCloud, I would wait to see which happens first before
embracing any update.


At this point, the best way to use Solr is as it’s always been - avoid
SolrCloud and setup your own system in standalone mode. If I had to build a
new Solr install today, this is what I would do.


In my opinion, the companies that have been claiming to back Solr and
SolrCloud have been negligent, and all of the users are paying the price.
It hasn’t been my job to work on it in any real fashion since 2012. I’m
sorry I couldn’t help improve the situation for you.


Take it for what it’s worth. To some, not much I’m sure.


Mark Miller
-- 
- Mark

http://about.me/markrmiller


Re: Solr 7.7.2 Autoscaling policy - Poor performance

2019-09-03 Thread Mark Miller
Hook up a profiler to the overseer and see what it's doing, file a JIRA and
note the hotspots or what methods appear to be hanging out.

On Tue, Sep 3, 2019 at 1:15 PM Andrew Kettmann 
wrote:

>
> > You’re going to want to start by having more than 3gb for memory in my
> opinion but the rest of your set up is more complex than I’ve dealt with.
>
> right now the overseer is set to a max heap of 3GB, but is only using
> ~260MB of heap, so memory doesn't seem to be the issue unless there is a
> part of the picture I am missing there?
>
> Our overseers only jobs are being overseer and holding the .system
> collection. I would imagine if the overseer were hitting memory constraints
> it would have allocated more than 300MB of the total 3GB it is allowed,
> right?
>
> evolve24 Confidential & Proprietary Statement: This email and any
> attachments are confidential and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. It
> is intended for the use of the recipients. If you are not the intended
> recipient, or believe that you have received this communication in error,
> please do not read, print, copy, retransmit, disseminate, or otherwise use
> the information. Please delete this email and attachments, without reading,
> printing, copying, forwarding or saving them, and notify the Sender
> immediately by reply email. No confidentiality or privilege is waived or
> lost by any transmission in error.
>


-- 
- Mark

http://about.me/markrmiller


Re: Setting up MiniSolrCloudCluster to use pre-built index

2018-10-24 Thread Mark Miller
The merge can be really fast - it can just dump in the new segments and
rewrite the segments file basically.

I guess for you want, that's perhaps not the ideal route though. You could
maybe try and use collection aliases.

I thought about adding shard aliases way back, but never got to it.

On Tue, Oct 23, 2018 at 7:10 PM Ken Krugler 
wrote:

> Hi Mark,
>
> I’ll have a completely new, rebuilt index that’s (a) large, and (b)
> already sharded appropriately.
>
> In that case, using the merge API isn’t great, in that it would take
> significant time and temporarily use double (or more) disk space.
>
> E.g. I’ve got an index with 250M+ records, and about 200GB. There are
> other indexes, still big but not quite as large as this one.
>
> So I’m still wondering if there’s any robust way to swap in a fresh set of
> shards, especially without relying on legacy cloud mode.
>
> I think I can figure out where the data is being stored for an existing
> (empty) collection, shut that down, swap in the new files, and reload.
>
> But I’m wondering if that’s really the best (or even sane) approach.
>
> Thanks,
>
> — Ken
>
> On May 19, 2018, at 6:24 PM, Mark Miller  wrote:
>
> You create MiniSolrCloudCluster with a base directory and then each Jetty
> instance created gets a SolrHome in a subfolder called node{i}. So if
> legacyCloud=true you can just preconfigure a core and index under the right
> node{i} subfolder. legacyCloud=true should not even exist anymore though,
> so the long term way to do this would be to create a collection and then
> use the merge API or something to merge your index into the empty
> collection.
>
> - Mark
>
> On Sat, May 19, 2018 at 5:25 PM Ken Krugler 
> wrote:
>
> Hi all,
>
> Wondering if anyone has experience (this is with Solr 6.6) in setting up
> MiniSolrCloudCluster for unit testing, where we want to use an existing
> index.
>
> Note that this index wasn’t built with SolrCloud, as it’s generated by a
> distributed (Hadoop) workflow.
>
> So there’s no “restore from backup” option, or swapping collection
> aliases, etc.
>
> We can push our configset to Zookeeper and create the collection as per
> other unit tests in Solr, but what’s the right way to set up data dirs for
> the cores such that Solr is running with this existing index (or indexes,
> for our sharded test case)?
>
> Thanks!
>
> — Ken
>
> PS - yes, we’re aware of the routing issue with generating our own shards….
>
> --
> Ken Krugler
> +1 530-210-6378 <(530)%20210-6378>
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
> --
>
> - Mark
> about.me/markrmiller
>
>
> --
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
>

-- 
- Mark

http://about.me/markrmiller


Re: Setting up MiniSolrCloudCluster to use pre-built index

2018-05-19 Thread Mark Miller
You create MiniSolrCloudCluster with a base directory and then each Jetty
instance created gets a SolrHome in a subfolder called node{i}. So if
legacyCloud=true you can just preconfigure a core and index under the right
node{i} subfolder. legacyCloud=true should not even exist anymore though,
so the long term way to do this would be to create a collection and then
use the merge API or something to merge your index into the empty
collection.

 - Mark

On Sat, May 19, 2018 at 5:25 PM Ken Krugler 
wrote:

> Hi all,
>
> Wondering if anyone has experience (this is with Solr 6.6) in setting up
> MiniSolrCloudCluster for unit testing, where we want to use an existing
> index.
>
> Note that this index wasn’t built with SolrCloud, as it’s generated by a
> distributed (Hadoop) workflow.
>
> So there’s no “restore from backup” option, or swapping collection
> aliases, etc.
>
> We can push our configset to Zookeeper and create the collection as per
> other unit tests in Solr, but what’s the right way to set up data dirs for
> the cores such that Solr is running with this existing index (or indexes,
> for our sharded test case)?
>
> Thanks!
>
> — Ken
>
> PS - yes, we’re aware of the routing issue with generating our own shards….
>
> --
> Ken Krugler
> +1 530-210-6378 <(530)%20210-6378>
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
> --
- Mark
about.me/markrmiller


Re: question about updates to shard leaders only

2018-05-15 Thread Mark Miller
Yeah, basically ConcurrentUpdateSolrClient is a shortcut to getting multi
threaded bulk API updates out of the single threaded, single update API.
The downsides to this are: It is not cloud aware - you have to point it at
a server, you have to add special code to see if there are any errors, you
don't get any fine grained error information back, you still basically have
to break up updates into batches of success/fail units but with fewer
guard rails.

If you want to bulk load it usually makes much more sense to use the bulk
api on CloudSolrServer and treat the whole group of updates as a single
success/fail unit.

- Mark

On Tue, May 15, 2018 at 9:25 AM Erick Erickson 
wrote:

> bq. But don't forget a final client.add(list) after the while-loop ;-)
>
> Ha! But only "if (list.size() > 0)"
>
> And then there was the memorable time I forgot the "list.clear()" when
> I sent the batch and wondered why my indexing progress got slower and
> slower...
>
> Not to mention the time I re-used the same SolrInputDocument that got
> bigger and bigger and bigger.
>
> Not to mention the other zillion screw-ups I've managed to perpetrate
> in my career "Who wrote this stupid code? Oh, wait, it was me.
> DON'T LOOK!!!"...
>
> Astronomy anecdote
>
> Dale Vrabeck...was at a party with [Rudolph] Minkowski and Dale said
> he’d heard about the astronomer who had exposed a plate all night and
> then put it in the hypo first. Minkowski said, “It was three nights,
> and it was me.”
>
> On Tue, May 15, 2018 at 10:10 AM, Shawn Heisey 
> wrote:
> > On 5/15/2018 12:12 AM, Bernd Fehling wrote:
> >>
> >> OK, I have the CloudSolrClient with SolrJ now running but it seams
> >> a bit slower compared to ConcurrentUpdateSolrClient.
> >> This was not expected.
> >> The logs show that CloudSolrClient send the docs only to the leaders.
> >>
> >> So the only advantage of CloudSolrClient is that it is "Cloud aware"?
> >>
> >> With ConcurrentUpdateSolrClient I get about 1600 docs/sec for loading.
> >> With CloudSolrClient I get only about 1200 docs/sec.
> >
> >
> > ConcurrentUpdateSolrClient internally puts all indexing requests on a
> queue
> > and then can use multiple threads to do parallel indexing in the
> backround.
> > The design of the client has one big disadvantage -- it returns control
> to
> > your program immediately (before indexing actually begins) and always
> > indicates success.  All indexing errors are swallowed.  They are logged,
> but
> > the calling program is never informed that any errors have occurred.
> >
> > Like all other SolrClient implementations, CloudSolrClient is
> thread-safe,
> > but it is not multi-threaded unless YOU create multiple threads that all
> use
> > the same client object.  Full error handling is possible with this
> client.
> > It is also fully cloud aware, adding and removing Solr servers as the
> > SolrCloud changes, without needing to be reconfigured or recreated.
> >
> > Thanks,
> > Shawn
> >
>
-- 
- Mark
about.me/markrmiller


Re: Solr soft commits

2018-05-10 Thread Mark Miller
A soft commit does not control merging. The IndexWriter controls merging
and hard commits go through the IndexWriter. A soft commit tells Solr to
try and open a new SolrIndexSearcher with the latest view of the index. It
does this with a mix of using the on disk index and talking to the
IndexWriter to see updates that have not been committed.

Opening a new SolrIndexSearcher using the IndexWriter this way does have a
cost. You may flush segments, you may apply deletes, you may have to
rebuild partial or full in memory data structures. It's generally much
faster than a hard commit to get a refreshed view of the index though.

Given how SolrCloud was designed, it's usually best to set an auto hard
commit to something that works for you, given how large it will make tlogs
(affecting recovery times), and how much RAM is used. Then use soft commits
for visibility. It's best to use them as infrequently as your use case
allows.

- Mark

On Thu, May 10, 2018 at 10:49 AM Shivam Omar 
wrote:

> Hi,
>
> I need some help in understanding solr soft commits.  As soft commits are
> about visibility and are fast in nature. They are advised for nrt use
> cases. I want to understand does soft commit also honor merge policies and
> do segment merging for docs in memory. For example, in case, I keep hard
> commit interval very high and allow few million documents to be in memory
> by using soft commit with no hard commit, can it affect solr query time
> performance.
>
>
> Shivam
>
> Get Outlook for Android
>
> DISCLAIMER
> This email and any files transmitted with it are intended solely for the
> person or the entity to whom they are addressed and may contain information
> which is Confidential and Privileged. Any misuse of the information
> contained in this email, including but not limited to retransmission or
> dissemination of the said information by person or entities other than the
> intended recipient is unauthorized and strictly prohibited. If you are not
> the intended recipient of this email, please delete this email and contact
> the sender immediately.
>
-- 
- Mark
about.me/markrmiller


Re: question about updates to shard leaders only

2018-05-09 Thread Mark Miller
It's been a while since I've been in this deeply, but it should be
something like:

sendUpdateOnlyToShardLeaders will select the leaders for each shard as the
load balanced targets for update. The updates may not go to the *right*
leader, but only the leaders will be chosen, followers (non leader
replicas) will not be part of the load balanced server list.

sendDirectUpdatesToShardLeadersOnly is the same, followers are not part of
the mix, but also, updates are sent directly to the right leader as long as
the right hashing field is specified (id by default). We hash the id client
side and know where it should end up.

Optimally, you want sendDirectUpdatesToShardLeadersOnly to be true
configured with the correct id field.

- Mark

On Wed, May 9, 2018 at 4:54 AM Bernd Fehling 
wrote:

> Hi list,
>
> while going from single core master/slave to cloud multi core/node
> with leader/replica I want to change my SolrJ loading, because
> ConcurrentUpdateSolrClient isn't cloud aware and has performance
> impacts.
> I want to use CloudSolrClient with LBHttpSolrClient and updates
> should only go to shard leaders.
>
> Question, what is the difference between sendUpdatesOnlyToShardLeaders
> and sendDirectUpdatesToShardLeadersOnly?
>
> Regards,
> Bernd
>
-- 
- Mark
about.me/markrmiller


Re: 7.3 pull replica with 7.2 tlog leader

2018-05-06 Thread Mark Miller
Yeah, the project should never use built in serialization. I'd file a JIRA
issue. We should remove this when we can.

- Mark

On Sun, May 6, 2018 at 9:39 PM Will Currie  wrote:

> Premise: During an upgrade I should be able to run a 7.3 pull replica
> against a 7.2 tlog leader. Or vice versa.
>
> Maybe I'm totally wrong in assuming that!
>
> Assuming that's correct it looks like adding a new method[1] to
> SolrResponse has broken binary compatibility. When I try to register a new
> pull replica using the admin api[2] I get an HTTP 500 responseI see this
> error logged: java.io.InvalidClassException:
> org.apache.solr.client.solrj.SolrResponse; local class incompatible: stream
> classdesc serialVersionUID = 3945300637328478755, local class
> serialVersionUID = -793110010336024264
>
> The replica actually seems to register ok it just can't read the response
> because the bytes from the 7.2 leader include a different serialVersionUID.
>
> Should SolrResponse include a serialVersionIUID? All subclasses too.
>
> It looks like stock java serialization is only used for these admin
> responses. Query responses use JavaBinCodec instead..
>
> Full(ish) stack trace:
>
> ERROR HttpSolrCall null:org.apache.solr.common.SolrException:
> java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse;
> local class incompatible: st
> ream classdesc serialVersionUID = 3945300637328478755, local class
> serialVersionUID = -7931100103360242645
> at
> org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:73)
> at
>
> org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue(CollectionsHandler.java:348)
> at
>
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:256)
> at
>
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:230)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
> at
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)
> at
>
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:498)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
>
> [1]
>
> https://github.com/apache/lucene-solr/commit/5ce83237e804ac1130eaf5cf793955667793fee0#diff-b809fa594f93aa6805381029a188e4e2R46
> [2]
>
> http://localhost:8983/solr/admin/collections?action=ADDREPLICA&collection=blah&shard=shard1&node=blah&type=pull
>
> Thanks,
> Will
>
-- 
- Mark
about.me/markrmiller


Re: Solr node not found in ZK live_nodes

2016-12-07 Thread Mark Miller
That already happens. The ZK client itself will reconnect when it can and
trigger everything to be setup like when the cluster first starts up,
including a live node and leader election, etc.

You may have hit a bug or something else missing from this conversation,
but reconnecting after losing the ZK connection is a basic feature from day
one.

Mark
On Wed, Dec 7, 2016 at 12:34 AM Manohar Sripada 
wrote:

> Thanks Erick! Should I create a JIRA issue for the same?
>
> Regarding the logs, I have changed the log level to WARN. That may be the
> reason, I couldn't get anything from it.
>
> Thanks,
> Manohar
>
> On Tue, Dec 6, 2016 at 9:58 PM, Erick Erickson 
> wrote:
>
> > Most likely reason is that the Solr node in question,
> > was not reachable thus it was removed from
> > live_nodes. Perhaps due to temporary network
> > glitch, long GC pause or the like. If you're rolling
> > your logs over it's quite possible that any illuminating
> > messages were lost. The default 4M size for each
> > log is quite lo at INFO level...
> >
> > It does seem possible for a Solr node to periodically
> > check its status and re-insert itself into live_nodes,
> > go through recovery and all that. So far most of that
> > registration logic is baked into startup code. What
> > do others think? Worth a JIRA?
> >
> > Erick
> >
> > On Tue, Dec 6, 2016 at 3:53 AM, Manohar Sripada 
> > wrote:
> > > We have a 16 node cluster of Solr (5.2.1) and 5 node Zookeeper (3.4.6).
> > >
> > > All the Solr nodes were registered to Zookeeper (ls /live_nodes) when
> > setup
> > > was done 3 months back. Suddenly, few days back our search started
> > failing
> > > because one of the solr node(consider s16) was not seen in Zookeeper,
> > i.e.,
> > > when we checked for *"ls /live_nodes"*, *s16 *solr node was not found.
> > > However, the corresponding Solr process was up and running.
> > >
> > > To my surprise, I couldn't find any errors or warnings in solr or
> > zookeeper
> > > logs related to this. I have few questions -
> > >
> > > 1. Is there any reason why this registration to ZK was lost? I know
> logs
> > > should provide some information, but, it didn't. Did anyone encountered
> > > similar issue, if so, what can be the root cause?
> > > 2. Shouldn't Solr be clever enough to detect that the registration to
> ZK
> > > was lost (for some reason) and should try to re-register again?
> > >
> > > PS: The issue is resolved by restarting the Solr node. However, I am
> > > curious to know why it happened in the first place.
> > >
> > > Thanks
> >
>
-- 
- Mark
about.me/markrmiller


Re: solr shutdown

2016-11-15 Thread Mark Miller
That is probably partly because of hdfs cache key unmapping. I think I
improved that in some issue at some point.

We really want to wait by default for a long time though - even 10 minutes
or more. If you have tons of SolrCores, each of them has to be torn down,
each of them might commit on close, custom code and resources can be used
and need to be released, and a lot of time can be spent legit. Given these
long shutdowns will normally be legit and not some hang, I think we want to
be willing to wait a long time. A user that finds this too long can always
kill the process themselves, or lower the wait. But most of the time you
will pay for that for a non clean shutdown except in exceptional situations.

- Mark

On Fri, Oct 21, 2016 at 12:10 PM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Thanks Shawn - We've had to increase this to 300 seconds when using a
> large cache size with HDFS, and a fairly heavily loaded index routine (3
> million docs per day).  I don't know if that's why it takes a long time
> to shutdown, but it can take a while for solr cloud to shutdown
> gracefully.  If it does not, you end up with write.lock files for some
> (if not all) of the shards, and have to delete them manually before
> restarting.
>
> -Joe
>
>
> On 10/21/2016 9:01 AM, Shawn Heisey wrote:
> > On 10/21/2016 6:56 AM, Hendrik Haddorp wrote:
> >> I'm running solrcloud in foreground mode (-f). Does it make a
> >> difference for Solr if I stop it by pressing ctrl-c, sending it a
> >> SIGTERM or using "solr stop"?
> > All of those should produce the same result in the end -- Solr's
> > shutdown hook will be called and a graceful shutdown will commence.
> >
> > Note that in the case of the "bin/solr stop" command, the default is to
> > only wait five seconds for graceful shutdown before proceeding to a
> > forced kill, which for a typical install, means that forced kills become
> > the norm rather than the exception.  We have an issue to increase the
> > max timeout, but it hasn't been done yet.
> >
> > I strongly recommend anyone going into production should edit the script
> > to increase the timeout.  For the shell script I would do at least 60
> > seconds.  The Windows script just does a pause, not an intelligent wait,
> > so going that high probably isn't advisable on Windows.
> >
> > Thanks,
> > Shawn
> >
>
> --
- Mark
about.me/markrmiller


Re: autoAddReplicas:true not working

2016-11-15 Thread Mark Miller
Look at the Overseer host and see if there are any relevant logs for
autoAddReplicas.

- Mark

On Mon, Oct 24, 2016 at 3:01 PM Chetas Joshi  wrote:

> Hello,
>
> I have the following configuration for the Solr cloud and a Solr collection
> This is Solr on HDFS and Solr version I am using is 5.5.0
>
> No. of hosts: 52 (Solr Cloud)
>
> shard count:   50
> replicationFactor:   1
> MaxShardsPerNode: 1
> autoAddReplicas:   true
>
> Now, one of my shards is down. Although there are two hosts which are
> available in my cloud on which a new replica could be created, it just does
> not create a replica. All 52 hosts are healthy. What could be the reason
> for this?
>
> Thanks,
>
> Chetas.
>
-- 
- Mark
about.me/markrmiller


Re: ClassNotFoundException with Custom ZkACLProvider

2016-11-15 Thread Mark Miller
Could you file a JIRA issue so that this report does not get lost?

- Mark

On Tue, Nov 15, 2016 at 10:49 AM Solr User  wrote:

> For those interested, I ended up bundling the customized ACL provider with
> the solr.war.  I could not stomach looking at the stack trace in the logs.
>
> On Mon, Nov 7, 2016 at 4:47 PM, Solr User  wrote:
>
> > This is mostly just an FYI regarding future work on issues like
> SOLR-8792.
> >
> > I wanted admin update but world read on ZK since I do not have anything
> > sensitive from a read perspective in the Solr data and did not want to
> > force all SolrCloud clients to implement authentication just for read.
> So,
> > I extended DefaultZkACLProvider and implemented a replacement for
> > VMParamsAllAndReadonlyDigestZkACLProvider.
> >
> > My custom code is loaded from the sharedLib in solr.xml.  However, there
> > is a temporary ZK lookup to read solr.xml (and chroot) which is obviously
> > done before loading sharedLib.  Therefore, I am faced with a
> > ClassNotFoundException.  This has no negative effect on the ACL
> > functionalityjust the annoying stack trace in the logs.  I do not
> want
> > to package this custom code with the Solr code and do not want to package
> > this along with Solr dependencies in the Jetty lib/ext.
> >
> > So, I am planning to live with the stack trace and just wanted to share
> > this for any future work on the dynamic solr.xml and chroot lookups or in
> > case I am missing some work-around.
> >
> > Thanks!
> >
> >
>
-- 
- Mark
about.me/markrmiller


Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

2016-02-04 Thread Mark Miller
Only INFO level, so I suspect not bad...

If that Overseer closed, another node should have picked up where it left
off. See that in another log?

Generally an Overseer close means a node or cluster restart.

This can cause a lot of DOWN state publishing. If it's a cluster restart, a
lot of those DOWN publishes are not processed until the cluster is started
back up - which can lead to the Overseer being overwhelmed and things not
responding fast enough. You should be able to see an active Overseer
working on publishing those states though (it shows that at INFO logging
level).

If the Overseer is simply down and another did not take over, that is just
some kind of bug. If it's overwhelmed, 5x is much much faster,
and SOLR-7281 should also help, but that is no real help for 4.x at this
point.

Anyway, key is, what is the active Overseer doing. Is there no active
Overseer? Or is it busy trying to push through a backlog of operations.

- Mark

On Wed, Feb 3, 2016 at 8:46 PM hawk  wrote:

> Thanks Mark.
>
> I was able to search "Overseer" in the solr logs around the time frame of
> the condition. This particular message was from the leader node of the
> shard.
>
> 160201 11:26:36.380 localhost-startStop-1 Overseer (id=null) closing
>
> Also I found this message in the zookeeper logs.
>
> 11:26:35,218 [myid:02] - INFO [ProcessThread(sid:2
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when
> processing sessionid:0x15297c0fe2e3f2d type:create cxid:0x3
> zxid:0xf0001be48
> txntype:-1 reqpath:n/a Error Path:/overseer Error:KeeperErrorCode =
> NodeExists for /overseer
>
> Any thoughts what these messages suggest?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255105.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller


Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

2016-02-03 Thread Mark Miller
You get this when the Overseer is either bogged down or not processing
events generally.

The Overseer is way, way faster at processing events in 5x.

If you search your logs for .Overseer you can see what it's doing. Either
nothing at the time, or bogged down processing state updates probably.

Along with 5x Overseer processing being much more efficient, SOLR-7281 is
going to take out a lot of state publishing on shutdown that can end up
getting processed on the next startup.

- Mark

On Wed, Feb 3, 2016 at 6:39 PM hawk  wrote:

> Here are more details around the event.
>
> 160201 11:57:22.272 http-bio-8082-exec-18 [] webapp=/solr path=/update
> params={waitSearcher=true&distrib.from=http://x:x
> /solr//&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false}
> {commit=} 0 134
>
> 160201 11:57:25.993 RecoveryThread Error while trying to recover.
> core=x
> java.util.concurrent.ExecutionException:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
> asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
> still do not see the requested state. I see state: recovering live:true
> leader from ZK: http://x:x/solr//
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:188)
> at
>
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
> at
>
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
> Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
> asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
> still do not see the requested state. I see state: recovering live:true
> leader from ZK: http://x:x/solr//
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:550)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> 160201 11:57:25.993 RecoveryThread Recovery failed - trying again... (7)
> core=
>
> 160201 11:57:25.994 RecoveryThread Wait 256.0 seconds before trying to
> recover again (8)
>
> 160201 11:57:30.370 http-bio-8082-exec-3
> org.apache.solr.common.SolrException: no servers hosting shard:
> at
>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
> at
>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255073.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller


Re: Solr has multiple log lines for single search

2016-01-11 Thread Mark Miller
Two of them are sub requests. They have params isShard=true and
distrib=false. The top level user query will not have distrib or isShard
because they default the other way.

- Mark

On Mon, Jan 11, 2016 at 6:30 AM Syed Mudasseer 
wrote:

> Hi,
> I have solr configured on cloud with the following details:
> Every collection has 3 shards andEach shard consists of 3 replicas.
> Whenever I search for any field in solr, having faceting and highlighting
> query checked,then I get more than 2 search logs stored in the log file.
> (sometimes, it goes up to 8 log lines).
> I am trying to get the search terms entered by user, but due to duplicate
> records I am not able to decide which query is more appropriate to parse.
> Here is an example of log lines(field search with faceting) gives me 3
> results in the log,
> INFO  - 2016-01-11 11:07:09.321; org.apache.solr.core.SolrCore;
> [mycollection_shard2_replica1] webapp=/solr path=/select
> params={f.ab_model.facet.limit=160&lowercaseOperators=true&facet=true&qf=description&distrib=false&hl.simple.pre=&wt=javabin&hl=false&version=2&rows=100&defType=edismax&NOW=1452510429317&shard.url=
> http://MyURL:8983/solr/mycollection_shard2_replica1/|http://MyURL:8983/solr/mycollection_shard2_replica3/|http://MyURL:8983/solr/mycollection_shard2_replica2/&fl=id&fl=score&df=search&start=0&q=MySearchTerm&f.ab_model.facet.mincount=0&_=9652510428630&hl.simple.post=&facet.field=ab_model&isShard=true&stopwords=true&fsv=true}
> hits=753 status=0 QTime=1
> INFO  - 2016-01-11 11:07:09.349; org.apache.solr.core.SolrCore;
> [mycollection_shard2_replica1] webapp=/solr path=/select
> params={lowercaseOperators=true&facet=false&ids=2547891056_HDR,3618199460_HDR,3618192453_HDR,3618277839_HDR,3618186992_HDR,3618081995_HDR,3618074192_HDR,3618189660_HDR,3618073929_HDR,3618078287_HDR,3618084580_HDR,3618075438_HDR,3618170375_HDR,3618195949_HDR,3618074030_HDR,3618085730_HDR,3618078288_HDR,3618072500_HDR,3618086961_HDR,3618170928_HDR,3618077108_HDR,3618074090_HDR,3618181279_HDR,3618188058_HDR,3618181018_HDR,3618199309_HDR,3618195610_HDR,3618281575_HDR,3618195568_HDR,3618080877_HDR,3618199114_HDR,3618199132_HDR,3618084030_HDR,3618280868_HDR,3618193086_HDR,3618275194_HDR,3618074917_HDR,3618195102_HDR,3618086958_HDR,3618084870_HDR,3618174630_HDR,3618075776_HDR,3618190529_HDR,3618192993_HDR,3618084217_HDR,3618176677_HDR,3618183612_HDR&qf=description&distrib=false&hl.simple.pre=&wt=javabin&hl=true&version=2&rows=100&defType=edismax&NOW=1452510429317&shard.url=
> http://MyURL:8983/solr/mycollection_shard2_replica1/|http://MyURL:8983/solr/mycollection_shard2_replica3/|http://MyURL:8983/solr/mycollection_shard2_replica2/&df=search&q=MySearchTerm&_=1452510428630&hl.simple.post=&facet.field=ab_model&isShard=true&stopwords=true}
> status=0 QTime=15
> INFO  - 2016-01-11 11:07:09.352; org.apache.solr.core.SolrCore;
> [mycollection_shard1_replica1] webapp=/solr path=/select
> params={lowercaseOperators=true&facet=true&indent=true&qf=description&hl.simple.pre=&wt=json&hl=true&defType=edismax&q=MySearchTerm&_=1452510428630&hl.simple.post=&facet.field=ab_model&stopwords=true}
> hits=2276 status=0 QTime=35
> If I have highlighted query checked, then I get more than 3 logs.
> So my question is which line is more appropriate to get the search query
> entered by User?or Should I consider all of the log lines?
>

-- 
- Mark
about.me/markrmiller


Re: Possible Bug - MDC handling in org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execute(Runnable)

2016-01-11 Thread Mark Miller
Not sure I'm onboard with the first proposed solution, but yes, I'd open a
JIRA issue to discuss.

- Mark

On Mon, Jan 11, 2016 at 4:01 AM Konstantin Hollerith 
wrote:

> Hi,
>
> I'm using SLF4J MDC to log additional Information in my WebApp. Some of my
> MDC-Parameters even include Line-Breaks.
> It seems, that Solr takes _all_ MDC parameters and puts them into the
> Thread-Name, see
>
> org.apache.solr.common.util.ExecutorUtil.MDCAwareThreadPoolExecutor.execute(Runnable).
>
> When there is some logging of Solr, the log gets cluttered:
>
> [11.01.16 09:14:19:170 CET] 02a3 SystemOut O 09:14:19,169
> [zkCallback-14-thread-1-processing-My
> Custom
> MDC
> Parameter ROraqiFWaoXqP21gu4uLpMh SANDHO] WARN
> common.cloud.ConnectionManager [session=ROraqiFWaoXqP21gu4uLpMh]
> [user=SANDHO]: zkClient received AuthFailed
>
> (some of my MDC-Parameters are only active in Email-Logs and are not
> included in the file-log)
>
> I think this is a Bug. Solr should only put its own MDC-Parameter into the
> Thread-Name.
>
> Possible Solution: Since all (as far as i can check) invocations in Solr of
> MDC.put uses a Prefix like "ConcurrentUpdateSolrClient" or
> "CloudSolrClient" etc., it would be possible to put a check into
> MDCAwareThreadPoolExecutor.execute(Runnable) that process only those
> Prefixes.
>
> Should i open a Jira-Issue for this?
>
> Thanks,
>
> Konstantin
>
> Environment: JSF-Based App with WebSphrere 8.5, Solr 5.3.0, slf4j-1.7.12,
> all jars are in WEB-INF/lib.
>
-- 
- Mark
about.me/markrmiller


Re: Specifying a different txn log directory

2016-01-09 Thread Mark Miller
dataDir and tlog dir cannot be changed with a core reload.

- Mark

On Sat, Jan 9, 2016 at 1:20 PM Erick Erickson 
wrote:

> Please show us exactly what you did. and exactly
> what you saw to say that "does not seem to work".
>
> Best,
> Erick
>
> On Fri, Jan 8, 2016 at 7:47 PM, KNitin  wrote:
> > Hi,
> >
> > How do I specify a different directory for transaction logs? I tried
> using
> > the updatelog entry in solrconfig.xml and reloaded the collection but
> that
> > does not seem to work.
> >
> > Is there another setting I need to change?
> >
> > Thanks
> > Nitin
>
-- 
- Mark
about.me/markrmiller


Re: SolrCloud 4.8.1 - commit wait

2015-12-11 Thread Mark Miller
He has waitSearcher as false it looks, so all the time should be in the
commit. So that amount of time does sound odd.

I would certainly change those commit settings though. I would not use
maxDocs, that is an ugly way to control this. And one second is much too
aggressive as Erick says.

If you want to attempt that kind of visibility, you should use the
softAutoCommit. The regular autoCommit should be at least 15 or 20 seconds.

- Mark

On Fri, Dec 11, 2015 at 1:22 PM Erick Erickson 
wrote:

> First of all, your autocommit settings are _very_ aggressive. Committing
> every second is far to frequent IMO.
>
> As an aside, I generally prefer to omit the maxDocs as it's not all
> that predictable,
> but that's a personal preference and really doesn't bear on your problem..
>
> My _guess_ is that you are doing a lot of autowarming. The number of docs
> doesn't really matter if your autowarming is taking forever, your Solr logs
> should report the autowarm times at INFO level, have you checked those?
>
> The commit settings shouldn't be a problem in terms of your server dying,
> the indexing process flushes docs to the tlog independent of committing so
> upon restart they should be recovered. Here's a blog on the subject:
>
>
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best,
> Erick
>
> On Fri, Dec 11, 2015 at 8:24 AM, Vincenzo D'Amore 
> wrote:
> > Hi all,
> >
> > I have a SolrCloud cluster with a collection (2.5M docs) with 3 shards
> and
> > 15 replicas.
> > There is a solrj application that feeds the collection, updating few
> > documents every hour, I don't understand why, at end of process, the hard
> > commit takes about 8/10 minutes.
> >
> > Even if there are only few hundreds of documents.
> >
> > This is the autocommit configuration:
> >
> > 
> > 1
> > 1000
> > false
> > 
> >
> > In your experience why hard commit takes so long even for so few
> documents?
> >
> > Now I'm changing the code to softcommit, calling commit (waitFlush =
> > false, waitSearcher
> > = false, softCommit = true);
> >
> > solrServer.commit(false, false, true);.
> >
> > I have configured NRTCachingDirectoryFactory, but I'm a little bit
> worried
> > if a server goes down (something like: kill -9, SolrCloud crashes, out of
> > memory, etc.), and if, using this strategy
> softcommit+NRTCachingDirectory,
> > SolrCloud instance could not recover a replica.
> >
> > Should I worry about this new configuration? I was thinking to take a
> > snapshot of everything every day, in order to recover immediately the
> > index. Could this be considered a best practice?
> >
> > Thanks in advance for your time,
> > Vincenzo
> >
> > --
> > Vincenzo D'Amore
> > email: v.dam...@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
>
-- 
- Mark
about.me/markrmiller


Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-18 Thread Mark Miller
If you see "WARNING: too many searchers on deck" or something like that in
the logs, that could cause this behavior and would indicate you are opening
searchers faster than Solr can keep up.

- Mark

On Tue, Nov 17, 2015 at 2:05 PM Erick Erickson 
wrote:

> That's what was behind my earlier comment about perhaps
> the call is timing out, thus the commit call is returning
> _before_ the actual searcher is opened. But the call
> coming back is not a return from commit, but from Jetty
> even though the commit hasn't really returned.
>
> Just a guess however.
>
> Best,
> Erick
>
> On Tue, Nov 17, 2015 at 12:11 AM, adfel70  wrote:
> > Thanks Eric,
> > I'll try to play with the autowarm config.
> >
> > But I have a more direct question - why does the commit return without
> > waiting till the searchers are fully refreshed?
> >
> > Could it be that the parameter waitSearcher=true doesn't really work?
> > or maybe I don't understand something here...
> >
> > Thanks,
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/CloudSolrCloud-Commit-returns-but-not-all-data-is-visible-occasionally-tp4240368p4240518.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller


Re: Explicit commit with openSearcher=false

2015-11-12 Thread Mark Miller
You can pass arbitrary params with Solrj. The API usage is just a little
more arcane.

- Mark

On Wed, Nov 11, 2015 at 11:33 PM Sathyakumar Seshachalam <
sathyakumar_seshacha...@trimble.com> wrote:

> I intend to use SolrJ. I only saw the below overloaded commit method in
> documentation (http://lucene.apache.org/solr/4_10_3/solr-solrj/index.html)
> of class ³org.apache.solr.client.solrj.SolrServer"
>
> public UpdateResponse commit(boolean waitFlush, boolean waitSearcher,
> boolean softCommit).
>
>
> And I assumed waitSearcher is not the same as openSearcher.  (From the
> documentation atleast it would seem that waitSearcher when false only does
> not block the call, but a searcher is still opened).
> None of the add methods take a openSearcher param either.
>
> Regards
> Sathya
>
>
> On 11/11/15, 11:58 PM, "Chris Hostetter"  wrote:
>
> >
> >: I saw mention of openSearcher for SolrJ, so I looked in the source of
> >: the UpdateRequestHandler, and there is no mention of openSearcher in
> >: there that I can see, for XML, JSON or SolrJ requests.
> >:
> >: So my take is that this isn't possible right now :-(
> >
> >It's handled by the Loaders - all of which (i think?) delegate to
> >RequestHandlerUtils.handleCommit to generate the CommitUpdateCommand
> >according to the relevant UpdateParams.
> >
> >Most of the constants you see in UpdateRequestHandler look like dead code
> >that should be removed.
> >
> >
> >-Hoss
> >http://www.lucidworks.com/
>
> --
- Mark
about.me/markrmiller


Re: Explicit commit with openSearcher=false

2015-11-11 Thread Mark Miller
openSearcher is a valid param for a commit whatever the api you are using
to issue it.

- Mark

On Wed, Nov 11, 2015 at 12:32 PM Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Does waitSearcher=false works like you need?
>
> On Wed, Nov 11, 2015 at 1:34 PM, Sathyakumar Seshachalam <
> sathyakumar_seshacha...@trimble.com> wrote:
>
> > Hi,
> >
> > I have a Search system based on Solr that relies on autoCommit
> > configuration (with openSearcher=false). I now have a use-case that
> > requires me to disable autoCommit and issue explicit commit commands, But
> > as I understand an explicit commit command "always" opens a searcher. Is
> > this correct ? Is there anyway to work-around this?  I really do not want
> > to open searcher overtime I hard commit (I rely on autoSoftCommit for
> this).
> >
> > Regards,
> > Sathya
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>
-- 
- Mark
about.me/markrmiller


Re: No live SolrServers available to handle this request

2015-10-08 Thread Mark Miller
Your Lucene and Solr versions must match.

On Thu, Oct 8, 2015 at 4:02 PM Steve  wrote:

> I've loaded the Films data into a 4 node cluster.  Indexing went well, but
> when I issue a query, I get this:
>
> "error": {
> "msg": "org.apache.solr.client.solrj.SolrServerException: No live
> SolrServers available to handle this request:
> [
>
> http://host-192-168-0-63.openstacklocal:8081/solr/CollectionFilms_shard1_replica2
> ,
>
>
> http://host-192-168-0-62.openstacklocal:8081/solr/CollectionFilms_shard2_replica2
> ,
>
>
> http://host-192-168-0-60.openstacklocal:8081/solr/CollectionFilms_shard2_replica1
> ]",
> ...
>
> and further down in the stacktrace:
>
> Server Error
> Caused by:
> java.lang.NoSuchMethodError:
>
> org.apache.lucene.index.TermsEnum.postings(Lorg/apache/lucene/index/PostingsEnum;I)Lorg/apache/lucene/index/PostingsEnum;\n\tat
>
> org.apache.solr.search.SolrIndexSearcher.getFirstMatch(SolrIndexSearcher.java:802)\n\tat
>
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:333)\n\tat
> ...
>
>
> I'm using:
>
> solr version 5.3.1
>
> lucene 5.2.1
>
> zookeeper version 3.4.6
>
> indexing with:
>
>cd /opt/solr/example/films;
>
> /opt/solr/bin/post -c CollectionFilms -port 8081  films.json
>
>
>
> thx,
> .strick
>
-- 
- Mark
about.me/markrmiller


Re: Recovery Thread Blocked

2015-10-06 Thread Mark Miller
If it's a thread and you have plenty of RAM and the heap is fine, have you
checked raising OS thread limits?

- Mark

On Tue, Oct 6, 2015 at 4:54 PM Rallavagu  wrote:

> GC logging shows normal. The "OutOfMemoryError" appears to be pertaining
> to a thread but not to JVM.
>
> On 10/6/15 1:07 PM, Mark Miller wrote:
> > That amount of RAM can easily be eaten up depending on your sorting,
> > faceting, data.
> >
> > Do you have gc logging enabled? That should describe what is happening
> with
> > the heap.
> >
> > - Mark
> >
> > On Tue, Oct 6, 2015 at 4:04 PM Rallavagu  wrote:
> >
> >> Mark - currently 5.3 is being evaluated for upgrade purposes and
> >> hopefully get there sooner. Meanwhile, following exception is noted from
> >> logs during updates
> >>
> >> ERROR org.apache.solr.update.CommitTracker  – auto commit
> >> error...:java.lang.IllegalStateException: this writer hit an
> >> OutOfMemoryError; cannot commit
> >>   at
> >>
> >>
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
> >>   at
> >>
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
> >>   at
> >>
> >>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)
> >>   at
> >> org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> >>   at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
> >>   at
> >>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> >>   at
> >>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> >>   at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
> >>   at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
> >>   at java.lang.Thread.run(Thread.java:682)
> >>
> >> Considering the fact that the machine is configured with 48G (24G for
> >> JVM which will be reduced in future) wondering how would it still go out
> >> of memory. For memory mapped index files the remaining 24G or what is
> >> available off of it should be available. Looking at the lsof output the
> >> memory mapped files were around 10G.
> >>
> >> Thanks.
> >>
> >>
> >> On 10/5/15 5:41 PM, Mark Miller wrote:
> >>> I'd make two guess:
> >>>
> >>> Looks like you are using Jrocket? I don't think that is common or well
> >>> tested at this point.
> >>>
> >>> There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace
> >> of
> >>> SolrCloud, you are dealing with something fairly ancient and so it will
> >> be
> >>> harder to find help with older issues most likely.
> >>>
> >>> - Mark
> >>>
> >>> On Mon, Oct 5, 2015 at 12:46 PM Rallavagu  wrote:
> >>>
> >>>> Any takers on this? Any kinda clue would help. Thanks.
> >>>>
> >>>> On 10/4/15 10:14 AM, Rallavagu wrote:
> >>>>> As there were no responses so far, I assume that this is not a very
> >>>>> common issue that folks come across. So, I went into source (4.6.1)
> to
> >>>>> see if I can figure out what could be the cause.
> >>>>>
> >>>>>
> >>>>> The thread that is locking is in this block of code
> >>>>>
> >>>>> synchronized (recoveryLock) {
> >>>>>  // to be air tight we must also check after lock
> >>>>>  if (cc.isShutDown()) {
> >>>>>log.warn("Skipping recovery because Solr is shutdown");
> >>>>>return;
> >>>>>  }
> >>>>>  log.info("Running recovery - first canceling any ongoing
> >>>> recovery");
> >>>>>  cancelRecovery();
> >>>>>
> >>>>>  while (recoveryRunning) {
> >>>>>try {
> >>>>>  recoveryLock.wait(1000);
> >>>>>} catch (InterruptedExceptio

Re: Recovery Thread Blocked

2015-10-06 Thread Mark Miller
That amount of RAM can easily be eaten up depending on your sorting,
faceting, data.

Do you have gc logging enabled? That should describe what is happening with
the heap.

- Mark

On Tue, Oct 6, 2015 at 4:04 PM Rallavagu  wrote:

> Mark - currently 5.3 is being evaluated for upgrade purposes and
> hopefully get there sooner. Meanwhile, following exception is noted from
> logs during updates
>
> ERROR org.apache.solr.update.CommitTracker  – auto commit
> error...:java.lang.IllegalStateException: this writer hit an
> OutOfMemoryError; cannot commit
>  at
>
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
>  at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
>  at
>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)
>  at
> org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
>  at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>  at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>  at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
>  at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>  at java.lang.Thread.run(Thread.java:682)
>
> Considering the fact that the machine is configured with 48G (24G for
> JVM which will be reduced in future) wondering how would it still go out
> of memory. For memory mapped index files the remaining 24G or what is
> available off of it should be available. Looking at the lsof output the
> memory mapped files were around 10G.
>
> Thanks.
>
>
> On 10/5/15 5:41 PM, Mark Miller wrote:
> > I'd make two guess:
> >
> > Looks like you are using Jrocket? I don't think that is common or well
> > tested at this point.
> >
> > There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace
> of
> > SolrCloud, you are dealing with something fairly ancient and so it will
> be
> > harder to find help with older issues most likely.
> >
> > - Mark
> >
> > On Mon, Oct 5, 2015 at 12:46 PM Rallavagu  wrote:
> >
> >> Any takers on this? Any kinda clue would help. Thanks.
> >>
> >> On 10/4/15 10:14 AM, Rallavagu wrote:
> >>> As there were no responses so far, I assume that this is not a very
> >>> common issue that folks come across. So, I went into source (4.6.1) to
> >>> see if I can figure out what could be the cause.
> >>>
> >>>
> >>> The thread that is locking is in this block of code
> >>>
> >>> synchronized (recoveryLock) {
> >>> // to be air tight we must also check after lock
> >>> if (cc.isShutDown()) {
> >>>   log.warn("Skipping recovery because Solr is shutdown");
> >>>   return;
> >>> }
> >>> log.info("Running recovery - first canceling any ongoing
> >> recovery");
> >>> cancelRecovery();
> >>>
> >>> while (recoveryRunning) {
> >>>   try {
> >>> recoveryLock.wait(1000);
> >>>   } catch (InterruptedException e) {
> >>>
> >>>   }
> >>>   // check again for those that were waiting
> >>>   if (cc.isShutDown()) {
> >>> log.warn("Skipping recovery because Solr is shutdown");
> >>> return;
> >>>   }
> >>>   if (closed) return;
> >>> }
> >>>
> >>> Subsequently, the thread will get into cancelRecovery method as below,
> >>>
> >>> public void cancelRecovery() {
> >>>   synchronized (recoveryLock) {
> >>> if (recoveryStrat != null && recoveryRunning) {
> >>>   recoveryStrat.close();
> >>>   while (true) {
> >>> try {
> >>>   recoveryStrat.join();
> >>> } catch (InterruptedException e) {
> >>>   // not interruptible - keep waiting
> >>>   continue;
> >>> }
> >>> break;
> >>>   }
> >>>
> >>&

Re: Solr Log Analysis

2015-10-05 Thread Mark Miller
Best tool for this job really depends on your needs, but one option:

I have a dev tool for Solr log analysis:
https://github.com/markrmiller/SolrLogReader

If you use the -o option, it will spill out just the queries to a file with
qtimes.

- Mark

On Wed, Sep 23, 2015 at 8:16 PM Tarala, Magesh  wrote:

> I'm using Solr 4.10.4 in a 3 node cloud setup. I have 3 shards and 3
> replicas for the collection.
>
> I want to analyze the logs to extract the queries and query times. Is
> there a tool or script someone has created already for this?
>
> Thanks,
> Magesh
>
-- 
- Mark
about.me/markrmiller


Re: Recovery Thread Blocked

2015-10-05 Thread Mark Miller
I'd make two guess:

Looks like you are using Jrocket? I don't think that is common or well
tested at this point.

There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace of
SolrCloud, you are dealing with something fairly ancient and so it will be
harder to find help with older issues most likely.

- Mark

On Mon, Oct 5, 2015 at 12:46 PM Rallavagu  wrote:

> Any takers on this? Any kinda clue would help. Thanks.
>
> On 10/4/15 10:14 AM, Rallavagu wrote:
> > As there were no responses so far, I assume that this is not a very
> > common issue that folks come across. So, I went into source (4.6.1) to
> > see if I can figure out what could be the cause.
> >
> >
> > The thread that is locking is in this block of code
> >
> > synchronized (recoveryLock) {
> >// to be air tight we must also check after lock
> >if (cc.isShutDown()) {
> >  log.warn("Skipping recovery because Solr is shutdown");
> >  return;
> >}
> >log.info("Running recovery - first canceling any ongoing
> recovery");
> >cancelRecovery();
> >
> >while (recoveryRunning) {
> >  try {
> >recoveryLock.wait(1000);
> >  } catch (InterruptedException e) {
> >
> >  }
> >  // check again for those that were waiting
> >  if (cc.isShutDown()) {
> >log.warn("Skipping recovery because Solr is shutdown");
> >return;
> >  }
> >  if (closed) return;
> >}
> >
> > Subsequently, the thread will get into cancelRecovery method as below,
> >
> > public void cancelRecovery() {
> >  synchronized (recoveryLock) {
> >if (recoveryStrat != null && recoveryRunning) {
> >  recoveryStrat.close();
> >  while (true) {
> >try {
> >  recoveryStrat.join();
> >} catch (InterruptedException e) {
> >  // not interruptible - keep waiting
> >  continue;
> >}
> >break;
> >  }
> >
> >  recoveryRunning = false;
> >  recoveryLock.notifyAll();
> >}
> >  }
> >}
> >
> > As per the stack trace "recoveryStrat.join()" is where things are
> > holding up.
> >
> > I wonder why/how cancelRecovery would take time so around 870 threads
> > would be waiting on. Is it possible that ZK is not responding or
> > something else like Operating System resources could cause this? Thanks.
> >
> >
> > On 10/2/15 4:17 PM, Rallavagu wrote:
> >> Here is the stack trace of the thread that is holding the lock.
> >>
> >>
> >> "Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
> >> native_blocked, daemon
> >>  -- Waiting for notification on:
> >> org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
> >>  at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
> >>  at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
> >>  at
> >> syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
> >>  at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
> >>  at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
> >>  at
> >>
> RJNI_jrockit_vm_Threads_waitForNotifySignal+73(rnithreads.c:72)@0x7ff31351939a
> >>
> >>
> >>  at
> >> jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native
> >> Method)
> >>  at java/lang/Object.wait(J)V(Native Method)
> >>  at java/lang/Thread.join(Thread.java:1206)
> >>  ^-- Lock released while waiting:
> >> org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
> >>  at java/lang/Thread.join(Thread.java:1259)
> >>  at
> >>
> org/apache/solr/update/DefaultSolrCoreState.cancelRecovery(DefaultSolrCoreState.java:331)
> >>
> >>
> >>  ^-- Holding lock: java/lang/Object@0x114d8dd00[recursive]
> >>  at
> >>
> org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:297)
> >>
> >>
> >>  ^-- Holding lock: java/lang/Object@0x114d8dd00[fat lock]
> >>  at
> >>
> org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)
> >>
> >>
> >>  at jrockit/vm/RNI.c2java(J)V(Native Method)
> >>
> >>
> >> Stack trace of one of the 870 threads that is waiting for the lock to be
> >> released.
> >>
> >> "Thread-55489" id=77520 idx=0xebc tid=1494 prio=5 alive, blocked,
> >> native_blocked, daemon
> >>  -- Blocked trying to get lock: java/lang/Object@0x114d8dd00[fat
> >> lock]
> >>  at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
> >>  at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
> >>  at
> >> syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
> >>  at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
> >>  at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
> >>  at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
> >>  at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized]
> >>  

Re: Implementing AbstractFullDistribZkTestBase

2015-10-05 Thread Mark Miller
Not sure what that means :)

SOLR-5776 would not happen all the time, but too frequently. It also
wouldn't matter the power of CPU, cores or RAM :)

Do you see fails without https is what you want to check.

- mark

On Mon, Oct 5, 2015 at 2:16 PM Markus Jelsma 
wrote:

> Hi - no, i don't think so, it doesn't happen all the time, but too
> frequently. The machine running the tests has a high powered CPU, plenty of
> cores and RAM.
>
> Markus
>
>
>
> -Original message-
> > From:Mark Miller 
> > Sent: Monday 5th October 2015 19:52
> > To: solr-user@lucene.apache.org
> > Subject: Re: Implementing AbstractFullDistribZkTestBase
> >
> > If it's always when using https as in your examples, perhaps it's
> SOLR-5776.
> >
> > - mark
> >
> > On Mon, Oct 5, 2015 at 10:36 AM Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hmmm, i tried that just now but i sometimes get tons of Connection
> reset
> > > errors. The tests then end with "There are still nodes recoverying -
> waited
> > > for 30 seconds".
> > >
> > > [RecoveryThread-collection1] ERROR
> org.apache.solr.cloud.RecoveryStrategy
> > > - Error while trying to
> recover.:java.util.concurrent.ExecutionException:
> > > org.apache.solr.client.solrj.SolrServerException: IOException occured
> when
> > > talking to server at: https://127.0.0.1:49146
> > > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > > at
> > >
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
> > > at
> > >
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:361)
> > > at
> > > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> > > Caused by: org.apache.solr.client.solrj.SolrServerException:
> IOException
> > > occured when talking to server at: https://127.0.0.1:49146
> > > at
> > >
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:574)
> > > at
> > >
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:270)
> > > at
> > >
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:266)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > at
> > >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
> > > at
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > > at
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > > at java.lang.Thread.run(Thread.java:745)
> > > Caused by: java.net.SocketException: Connection reset
> > > at java.net.SocketInputStream.read(SocketInputStream.java:209)
> > > at java.net.SocketInputStream.read(SocketInputStream.java:141)
> > > at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
> > > at sun.security.ssl.InputRecord.read(InputRecord.java:503)
> > > at
> > > sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954)
> > > at
> > >
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1343)
> > > at
> > > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)
> > > at
> > > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)
> > > at
> > >
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543)
> > > at
> > >
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409)
> > > at
> > >
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
> > > at
> > >
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
> > > at
> > >
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
> > > at
> > >
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
> > > at
> > >
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> > > at
> > >
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> > > at
> > >
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
> > > at
> > >
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> > > at
> > >
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:465)
> > > ... 7 more
> > >
> > > [RecoveryThread-collection1] ERROR
> org.apache.solr.cloud.RecoveryStrategy
> > > - Recovery failed - trying again... (1)
> > > [RecoveryThread-collection1] INFO
> org.apache.solr.cloud.RecoveryStrategy -
> > > Wait 4.0 se

Re: Implementing AbstractFullDistribZkTestBase

2015-10-05 Thread Mark Miller
If it's always when using https as in your examples, perhaps it's SOLR-5776.

- mark

On Mon, Oct 5, 2015 at 10:36 AM Markus Jelsma 
wrote:

> Hmmm, i tried that just now but i sometimes get tons of Connection reset
> errors. The tests then end with "There are still nodes recoverying - waited
> for 30 seconds".
>
> [RecoveryThread-collection1] ERROR org.apache.solr.cloud.RecoveryStrategy
> - Error while trying to recover.:java.util.concurrent.ExecutionException:
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: https://127.0.0.1:49146
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:361)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> Caused by: org.apache.solr.client.solrj.SolrServerException: IOException
> occured when talking to server at: https://127.0.0.1:49146
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:574)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:270)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:266)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketException: Connection reset
> at java.net.SocketInputStream.read(SocketInputStream.java:209)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
> at sun.security.ssl.InputRecord.read(InputRecord.java:503)
> at
> sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954)
> at
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1343)
> at
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)
> at
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)
> at
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543)
> at
> org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409)
> at
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
> at
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
> at
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:465)
> ... 7 more
>
> [RecoveryThread-collection1] ERROR org.apache.solr.cloud.RecoveryStrategy
> - Recovery failed - trying again... (1)
> [RecoveryThread-collection1] INFO org.apache.solr.cloud.RecoveryStrategy -
> Wait 4.0 seconds before trying to recover again (2)
>
>
>
> -Original message-
> > From:Erick Erickson 
> > Sent: Monday 5th October 2015 15:59
> > To: solr-user@lucene.apache.org
> > Subject: Re: Implementing AbstractFullDistribZkTestBase
> >
> > Right, I'm assuming you're creating a cluster somewhere.
> > Try calling (from memory) waitForRecoveriesToFinish in
> > AbstractDistribZkTestBase after creating the collection
> > to insure that the nodes are up and running before you
> > index to them.
> >
> > Shot in the dark
> > Erick
> >
> > On Mon, Oct 5, 2015 at 1:36 AM, Markus Jelsma
> >  wrote:
> > > Hello,
> > >
> > > I have several implementations of AbstractFullDistribZkTestBase of
> Solr 5.3.0. Sometimes a test fails with either "There are still nodes
> recoverying - waited for 30 seconds" or "IOException occured when talking
> to server at: https://127.0.0.1:44474/collection1";, so usually at least
> one of all test fails. These are very simple implementations such as :
> > >
> > >   @Test
> > >   @ShardsFixed(num = 2)
> > >   pu

Re: Cloud Deployment Strategy... In the Cloud

2015-10-01 Thread Mark Miller
On Wed, Sep 30, 2015 at 10:36 AM Steve Davids  wrote:

> Our project built a custom "admin" webapp that we use for various O&M
> activities so I went ahead and added the ability to upload a Zip
> distribution which then uses SolrJ to forward the extracted contents to ZK,
> this package is built and uploaded via a Gradle build task which makes life
> easy on us by allowing us to jam stuff into ZK which is sitting in a
> private network (local VPC) without necessarily needing to be on a ZK
> machine. We then moved on to creating collection (trivial), and
> adding/removing replicas. As for adding replicas I am rather confused as to
> why I would need specify a specific shard for replica placement, before
> when I threw down a core.properties file the machine would automatically
> come up and figure out which shard it should join based on reasonable
> assumptions - why wouldn't the same logic apply here?


I'd file a JIRA issue for the functionality.


> I then saw that
> a Rule-based
> Replica Placement
> <
> https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement
> >
> feature was added which I thought would be reasonable but after looking at
> the tests  it appears to
> still require a shard parameter for adding a replica which seems to defeat
> the entire purpose.


I was not involved in the addReplica command, but the predefined stuff
worked that way just to make bootstrapping up a cluster really simple. I
don't see why addReplica couldn't follow the same logic if no shard was
specified.


> So after getting bummed out about that, I took a look
> at the delete replica request since we are having machines come/go we need
> to start dropping them and found that the delete replica requires a
> collection, shard, and replica name and if I have the name of the machine
> it appears the only way to figure out what to remove is by walking the
> clusterstate tree for all collections and determine which replicas are a
> candidate for removal which seems unnecessarily complicated.
>

You should not need the shard for this call. The collection and replica
core node name will be unique. Another JIRA issue?


>
> Hopefully I don't come off as complaining, but rather looking at it from a
> client perspective, the Collections API doesn't seem simple to use and
> really the only reason I am messing around with it now is because there is
> repeated threats to make "zk as truth" the default in the 5.x branch at
> some point in the future. I would personally advocate that something like
> the autoManageReplicas 
> be
> introduced to make life much simpler on clients as this appears to be the
> thing I am trying to implement externally.
>
> If anyone has happened to to build a system to orchestrate Solr for cloud
> infrastructure and have some pointers it would be greatly appreciated.
>
> Thanks,
>
> -Steve
>
>
> --
- Mark
about.me/markrmiller


Re: Ant Ivy resolve / Authenticated Proxy Issue

2015-09-16 Thread Mark Miller
You should be able to easily see where the task is hanging in ivy code.

- Mark

On Wed, Sep 16, 2015 at 1:36 PM Susheel Kumar  wrote:

> Not really. There are no lock files & even after cleaning up lock files (to
> be sure) problem still persists.  It works outside company network but
> inside it stucks.  let me try to see if jconsole can show something
> meaningful.
>
> Thanks,
> Susheel
>
> On Wed, Sep 16, 2015 at 12:17 PM, Shawn Heisey 
> wrote:
>
> > On 9/16/2015 9:32 AM, Mark Miller wrote:
> > > Have you used jconsole or visualvm to see what it is actually hanging
> on
> > to
> > > there? Perhaps it is lock files that are not cleaned up or something
> > else?
> > >
> > > You might try: find ~/.ivy2 -name "*.lck" -type f -exec rm {} \;
> >
> > If that does turn out to be the problem and deleting lockfiles fixes it,
> > then you may be running into what I believe is a bug.  It is a bug that
> > was (in theory) fixed in IVY-1388.
> >
> > https://issues.apache.org/jira/browse/IVY-1388
> >
> > I have seen the same problem even in version 2.3.0 which contains a fix
> > for IVY-1388, so I filed a new issue:
> >
> > https://issues.apache.org/jira/browse/IVY-1489
> >
> > Thanks,
> > Shawn
> >
> >
>
-- 
- Mark
about.me/markrmiller


Re: Ant Ivy resolve / Authenticated Proxy Issue

2015-09-16 Thread Mark Miller
I mention the same thing in
https://issues.apache.org/jira/browse/LUCENE-6743

They claim to have addressed this with Java delete on close stuff, but it
still happens even with 2.4.0.

Locally, I now use the nio strategy and never hit it.

- Mark

On Wed, Sep 16, 2015 at 12:17 PM Shawn Heisey  wrote:

> On 9/16/2015 9:32 AM, Mark Miller wrote:
> > Have you used jconsole or visualvm to see what it is actually hanging on
> to
> > there? Perhaps it is lock files that are not cleaned up or something
> else?
> >
> > You might try: find ~/.ivy2 -name "*.lck" -type f -exec rm {} \;
>
> If that does turn out to be the problem and deleting lockfiles fixes it,
> then you may be running into what I believe is a bug.  It is a bug that
> was (in theory) fixed in IVY-1388.
>
> https://issues.apache.org/jira/browse/IVY-1388
>
> I have seen the same problem even in version 2.3.0 which contains a fix
> for IVY-1388, so I filed a new issue:
>
> https://issues.apache.org/jira/browse/IVY-1489
>
> Thanks,
> Shawn
>
> --
- Mark
about.me/markrmiller


Re: Ant Ivy resolve / Authenticated Proxy Issue

2015-09-16 Thread Mark Miller
Have you used jconsole or visualvm to see what it is actually hanging on to
there? Perhaps it is lock files that are not cleaned up or something else?

You might try: find ~/.ivy2 -name "*.lck" -type f -exec rm {} \;

- Mark

On Wed, Sep 16, 2015 at 9:50 AM Susheel Kumar  wrote:

> Hi,
>
> Sending it to Solr group in addition to Ivy group.
>
>
> I have been building Solr trunk (
> http://svn.apache.org/repos/asf/lucene/dev/trunk/) using "ant eclipse"
> from
> quite some time but this week i am on a job where things are behind the
> firewall and a proxy is used.
>
> Issue: When not in company network then build works fine but when inside
> company network  Ivy stucks during resolve when downloading
> https://repo1.maven.org/maven2/org/apache/ant/ant/1.8.2/ant-1.8.2.jar (see
> below) I have set ANT_OPTS=-Dhttp.proxyHost=myproxyhost
> -Dhttp.proxyPort=8080 -Dhttp.proxyUser=myproxyusername
> -Dhttp.proxyPassword=myproxypassword  but that doesn't help.   Similar
> issue i run into with SVN but i was able to specify proxy & auth into
> .subversion/servers file and it worked.With Ant Ivy no idea what's
> going wrong.  I also tried -autoproxy with ant command line but no luck.
> In the meantime .ivy2 folder which got populated outside network would help
> to proceed temporarily.
>
> Machine : mac 10.10.3
> Ant : 1.9.6
> Ivy : 2.4.0
>
> Attach build.xml & ivysettings.xml
>
> kumar$ ant eclipse
>
> Buildfile: /Users/kumar/sourcecode/trunk/build.xml
>
> resolve:
>
> resolve:
>
> ivy-availability-check:
>
> ivy-fail:
>
> ivy-configure:
>
> [ivy:configure] :: Apache Ivy 2.4.0 - 20141213170938 ::
> http://ant.apache.org/ivy/ ::
>
> [ivy:configure] :: loading settings :: file =
> /Users/kumar/sourcecode/trunk/lucene/ivy-settings.xml
>
>
> resolve:
>
-- 
- Mark
about.me/markrmiller


Re: SolrCloud Admin UI shows node is Down, but state.json says it's active/up

2015-09-09 Thread Mark Miller
Perhaps there is something preventing clean shutdown. Shutdown makes a best
effort attempt to publish DOWN for all the local cores.

Otherwise, yes, it's a little bit annoying, but full state is a combination
of the state entry and whether the live node for that replica exists or not.

- Mark

On Wed, Sep 9, 2015 at 1:50 AM Arcadius Ahouansou 
wrote:

> Thank you Tomás for pointing to the JavaDoc
>
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE
>
> The Javadoc is quite clear. So this stale state.json is not an issue after
> all.
>
> However, it's very confusing that when a node goes down, state.json may be
> updated for 1 collection while it remains stale in the other collection.
> Also in our case, the node did not crash as per the JavaDoc... it was a
> normal server stop/shut-down.
> We may need to review our shut-down process and see whether things change.
>
> Thank you very much Erick and Tomás for your valuable help... very
> appreciated.
>
> Arcadius.
>
>
> On 8 September 2015 at 18:28, Erick Erickson 
> wrote:
>
> > bq: You were probably referring to state.json
> >
> > yep, I'm never sure whether people are on the old or new ZK versions.
> >
> > OK, With Tomás' comment, I think it's explained... although confusing.
> >
> > WDYT?
> >
> >
> > On Tue, Sep 8, 2015 at 10:03 AM, Arcadius Ahouansou
> >  wrote:
> > > Hello Erick.
> > >
> > > Yes,
> > >
> > > 1> liveNodes has N nodes listed (correctly): Correct, liveNodes is
> always
> > > right.
> > >
> > > 2> clusterstate.json has N+M nodes listed as "active":
> clusterstate.json
> > is
> > > always empty as it's no longer being "used" in 5.3. You were
> > > probably referring to state.json which is in individual collections.
> Yes,
> > > that one reflects the wrong value i.e N+M
> > >
> > > 3> using the collection API to get CLUSTERSTATUS always return the
> > correct
> > > value N
> > >
> > > 4> The Front-end code in code in cloud.js displays the right colour
> when
> > > nodes go down because it checks for the live node
> > >
> > > The problem is only with state.json under certain circumstances.
> > >
> > > Thanks.
> > >
> > > On 8 September 2015 at 17:51, Erick Erickson 
> > > wrote:
> > >
> > >> Arcadius:
> > >>
> > >> Hmmm. It may take a while for the cluster state to change, but I'm
> > >> assuming that this state persists for minutes/hours/days.
> > >>
> > >> So to recap: If dump the entire ZK node from the root, you have
> > >> 1> liveNodes has N nodes listed (correctly)
> > >> 2> clusterstate.json has N+M nodes listed as "active"
> > >>
> > >> Doesn't sound right to me, but I'll have to let people who are deep
> > >> into that code speculate from here.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Tue, Sep 8, 2015 at 1:13 AM, Arcadius Ahouansou <
> > arcad...@menelic.com>
> > >> wrote:
> > >> > On Sep 8, 2015 6:25 AM, "Erick Erickson" 
> > >> wrote:
> > >> >>
> > >> >> Perhaps the browser cache? What happens if you, say, use
> > >> >> Zookeeper client tools to bring down the the cluster state in
> > >> >> question? Or perhaps just refresh the admin UI when showing
> > >> >> the cluster status
> > >> >>
> > >> >
> > >> > Hello Erick.
> > >> >
> > >> > Thank you very much for answering.
> > >> > I did use the ZooInspetor tool to check the state.json in all 5 zk
> > nodes
> > >> > and they are all out of date and identical to what I get through the
> > tree
> > >> > view in sole admin ui.
> > >> >
> > >> > Looking at the source code cloud.js that correctly display nodes as
> > >> "gone"
> > >> > in the graph view, it calls the end point /zookeeper?wt=json and
> > relies
> > >> on
> > >> > the live nodes to mark a node as down instead of status.json.
> > >> >
> > >> > Thanks.
> > >> >
> > >> >> Shot in the dark,
> > >> >> Erick
> > >> >>
> > >> >> On Mon, Sep 7, 2015 at 6:09 PM, Arcadius Ahouansou <
> > >> arcad...@menelic.com>
> > >> > wrote:
> > >> >> > We are running the latest Solr 5.3.0
> > >> >> >
> > >> >> > Thanks.
> > >>
> > >
> > >
> > >
> > > --
> > > Arcadius Ahouansou
> > > Menelic Ltd | Information is Power
> > > M: 07908761999
> > > W: www.menelic.com
> > > ---
> >
>
>
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Information is Power
> M: 07908761999
> W: www.menelic.com
> ---
>
-- 
- Mark
about.me/markrmiller


Re: mapreduce job using soirj 5

2015-06-17 Thread Mark Miller
I think there is some better classpath isolation options in the works for
Hadoop. As it is, there is some harmonization that has to be done depending
on versions used, and it can get tricky.

- Mark

On Wed, Jun 17, 2015 at 9:52 AM Erick Erickson 
wrote:

> For sure there are a few rough edges here
>
> On Wed, Jun 17, 2015 at 12:28 AM, adfel70  wrote:
> > We cannot downgrade httpclient in solrj5 because its using new features
> and
> > we dont want to start altering solr code, anyway we thought about
> upgrading
> > httpclient in hadoop but as Erick said its sounds more work than just put
> > the jar in the data nodes.
> >
> > About that flag we tried it, hadoop even has an environment variable
> > HADOOP_USER_CLASSPATH_FIRST but all our tests with that flag failed.
> >
> > We thought this is an issue that is more likely that solr users will
> > encounter rather than cloudera users, so we will be glad for a more
> elegant
> > solution or workaround than to replace the httpclient jar in the data
> nodes
> >
> > Thank you all for your responses
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199p4212350.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller


Re: Deletion Policy in Solr Cloud

2015-06-15 Thread Mark Miller
SolrCloud does not really support any form of rollback.

On Mon, Jun 15, 2015 at 5:05 PM Aurélien MAZOYER <
aurelien.mazo...@francelabs.com> wrote:

> Hi all,
>
> Is DeletionPolicy customization still available in Solr Cloud? Is there
> a way to rollback to a previous commit point in Solr Cloud thanks to a
> specific deletion policy?
>
> Thanks,
>
>   Aurélien
>
-- 
- Mark
about.me/markrmiller


Re: Please help test the new Angular JS Admin UI

2015-06-15 Thread Mark Miller
Sure, just curious. Wasn't sure if there was other motivations around what
could be done, or the overall look and feel that could be achieved or
anything beyond it's just easier for devs to work on and maintain (which is
always good when it comes to JavaScript - I still wish it was all GWT :) ).

- Mark

On Mon, Jun 15, 2015 at 11:35 AM Upayavira  wrote:

> The current UI was written before tools like AngularJS were widespread,
> and before decent separation of concerns was easy to achieve in
> Javascript.
>
> In a sense, your paraphrase of the justification was as you described -
> to make it easier for programmer types - partly by using a tool that is
> closer to their working model, but also this rewrite has more than
> halved the number of lines of code in the UI, which will make it vastly
> more maintainable and extensible.
>
> As a case in point, I've got a working patch that I'll release at some
> point soon that gives us a "collections" version of the "core admin"
> pane. I'd love to add HDFS support to the UI if there were APIs worth
> exposing (I haven't dug into HDFS support yet).
>
> Make sense?
>
> Upayavira
>
> On Mon, Jun 15, 2015, at 07:49 AM, Mark Miller wrote:
> > I didn't really follow this issue - what was the motivation for the
> > rewrite?
> >
> > Is it entirely under: "new code should be quite a bit easier to work on
> > for
> > programmer
> > types" or are there other reasons as well?
> >
> > - Mark
> >
> > On Mon, Jun 15, 2015 at 10:40 AM Erick Erickson  >
> > wrote:
> >
> > > Gaaah, that'll teach me to type URLs late on Sunday!
> > >
> > > Thanks Upayavira!
> > >
> > > You'll notice that 5.2.1 just had the release announcement posted,
> > > so let the fun begin!
> > >
> > > Erick
> > >
> > > On Mon, Jun 15, 2015 at 4:12 AM, Upayavira  wrote:
> > > > Slight correction, the url, if running locally, would be:
> > > >
> > > > Http://localhost:8983/solr/index.html
> > > >
> > > > The reason we need your help: there is so much to the admin UI that I
> > > > cannot possibly have created the test setups to have tested it all.
> If
> > > > there are aspects of the UI you rely upon, please try them out on
> 5.2.1
> > > > - any bugs we don't find could persist long enough to be annoying and
> > > > inconvenient.
> > > >
> > > > Likewise, the sooner we can finish testing, the sooner we can do some
> > > > fun things:
> > > > * revamp the UI to be cloud friendly, e.g. Create and manage
> collections
> > > > * update schema browser to allow you to update your schema
> > > > * improve query tab to be able to prettily display your search
> results,
> > > > e.g.
> > > >- graphical explains viewer
> > > >- parsed query debugger
> > > > * and much more
> > > >
> > > > If enough people engage with testing, I will publish a zip file you
> can
> > > > unpack on top of your 5.2.1 zip to clear up any bugs that have been
> > > > found so far.
> > > >
> > > > Keep the bug reports coming!!
> > > >
> > > > Upayavira
> > > >
> > > > On Mon, Jun 15, 2015, at 01:53 AM, Erick Erickson wrote:
> > > >> And anyone who, you know, really likes working with UI code please
> > > >> help making it better!
> > > >>
> > > >> As of Solr 5.2, there is a new version of the Admin UI available,
> and
> > > >> several improvements are already in 5.2.1 (release imminent). The
> old
> > > >> admin UI is still the default, the new one is available at
> > > >>
> > > >> /admin/index.html
> > > >>
> > > >> Currently, you will see very little difference at first glance; the
> > > >> goal for this release was to have as much of the current
> functionality
> > > >> as possible ported to establish the framework. Upayavira has done
> > > >> almost all of the work getting this in place, thanks for taking that
> > > >> initiative Upayavira!
> > > >>
> > > >> Anyway, the plan is several fold:
> > > >> > Get as much testing on this as possible over the 5.2 time frame.
> > > >> > Make the new Angular JS-based code the default in 5.3
> > > >> > Make improvements/bug fixes to t

Re: Please help test the new Angular JS Admin UI

2015-06-15 Thread Mark Miller
I didn't really follow this issue - what was the motivation for the rewrite?

Is it entirely under: "new code should be quite a bit easier to work on for
programmer
types" or are there other reasons as well?

- Mark

On Mon, Jun 15, 2015 at 10:40 AM Erick Erickson 
wrote:

> Gaaah, that'll teach me to type URLs late on Sunday!
>
> Thanks Upayavira!
>
> You'll notice that 5.2.1 just had the release announcement posted,
> so let the fun begin!
>
> Erick
>
> On Mon, Jun 15, 2015 at 4:12 AM, Upayavira  wrote:
> > Slight correction, the url, if running locally, would be:
> >
> > Http://localhost:8983/solr/index.html
> >
> > The reason we need your help: there is so much to the admin UI that I
> > cannot possibly have created the test setups to have tested it all. If
> > there are aspects of the UI you rely upon, please try them out on 5.2.1
> > - any bugs we don't find could persist long enough to be annoying and
> > inconvenient.
> >
> > Likewise, the sooner we can finish testing, the sooner we can do some
> > fun things:
> > * revamp the UI to be cloud friendly, e.g. Create and manage collections
> > * update schema browser to allow you to update your schema
> > * improve query tab to be able to prettily display your search results,
> > e.g.
> >- graphical explains viewer
> >- parsed query debugger
> > * and much more
> >
> > If enough people engage with testing, I will publish a zip file you can
> > unpack on top of your 5.2.1 zip to clear up any bugs that have been
> > found so far.
> >
> > Keep the bug reports coming!!
> >
> > Upayavira
> >
> > On Mon, Jun 15, 2015, at 01:53 AM, Erick Erickson wrote:
> >> And anyone who, you know, really likes working with UI code please
> >> help making it better!
> >>
> >> As of Solr 5.2, there is a new version of the Admin UI available, and
> >> several improvements are already in 5.2.1 (release imminent). The old
> >> admin UI is still the default, the new one is available at
> >>
> >> /admin/index.html
> >>
> >> Currently, you will see very little difference at first glance; the
> >> goal for this release was to have as much of the current functionality
> >> as possible ported to establish the framework. Upayavira has done
> >> almost all of the work getting this in place, thanks for taking that
> >> initiative Upayavira!
> >>
> >> Anyway, the plan is several fold:
> >> > Get as much testing on this as possible over the 5.2 time frame.
> >> > Make the new Angular JS-based code the default in 5.3
> >> > Make improvements/bug fixes to the admin UI on the new code line,
> particularly SolrCloud functionality.
> >> > Deprecate the current code and remove it eventually.
> >>
> >> The new code should be quite a bit easier to work on for programmer
> >> types, and there are Big Plans Afoot for making the admin UI more
> >> SolrCloud-friendly. Now that the framework is in place, it should be
> >> easier for anyone who wants to volunteer to contribute, please do!
> >>
> >> So please give it a whirl. I'm sure there will be things that crop up,
> >> and any help addressing them will be appreciated. There's already an
> >> umbrella JIRA for this work, see:
> >> https://issues.apache.org/jira/browse/SOLR-7666. Please link any new
> >> issues to this JIRA so we can keep track of it all as well as
> >> coordinate efforts. If all goes well, this JIRA can be used to see
> >> what's already been reported too.
> >>
> >> Note that things may be moving pretty quickly, so trunk and 5x will
> >> always be the most current. That said looking at 5.2.1 will be much
> >> appreciated.
> >>
> >> Erick
>
-- 
- Mark
about.me/markrmiller


Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Mark Miller
We will have to a find a way to deal with this long term. Browsing the code
I can see a variety of places where problem exception handling has been
introduced since this all was fixed.

- Mark

On Wed, Jun 3, 2015 at 8:19 AM Mark Miller  wrote:

> File a JIRA issue please. That OOM Exception is getting wrapped in a
> RuntimeException it looks. Bug.
>
> - Mark
>
>
> On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV 
> wrote:
>
>> Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available
>> for Solr.
>>
>> I am seeing the following OOMs:
>> ERROR - 2015-06-03 05:17:13.317; [   customer-1-de_CH_1]
>> org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
>> java.lang.OutOfMemoryError: Java heap space
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> at org.eclipse.jetty.server.Server.handle(Server.java:368)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
>> at
>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>> at
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>> at
>> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
>> at
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
>> at
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>> WARN  - 2015-06-03 05:17:13.319; [   customer-1-de_CH_1]
>> org.eclipse.jetty.servlet.ServletHandler; Error for
>> /solr/customer-1-de_CH_1/suggest_phrase
>> java.lang.OutOfMemoryError: Java heap space
>>
>> The full commandline is
>> /usr/local/java/bin/java -server -Xss256k -Xms16G
>> -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90
>> -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>> -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
>> -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
>> -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc
>> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
>> -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log
>> -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks 

Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Mark Miller
File a JIRA issue please. That OOM Exception is getting wrapped in a
RuntimeException it looks. Bug.

- Mark

On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV 
wrote:

> Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available
> for Solr.
>
> I am seeing the following OOMs:
> ERROR - 2015-06-03 05:17:13.317; [   customer-1-de_CH_1]
> org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
> at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
> at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: Java heap space
> WARN  - 2015-06-03 05:17:13.319; [   customer-1-de_CH_1]
> org.eclipse.jetty.servlet.ServletHandler; Error for
> /solr/customer-1-de_CH_1/suggest_phrase
> java.lang.OutOfMemoryError: Java heap space
>
> The full commandline is
> /usr/local/java/bin/java -server -Xss256k -Xms16G
> -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90
> -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
> -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc
> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log
> -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC
> -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr
> -Dlog4j.configuration=file:/opt/solr/log4j.properties
> -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983
> /opt/solr/logs OPTIONS=default,rewrite
>
> So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this
> does not seem to "happen". What am I missing? Is it o to pull a heapdump
> from Solr before killing/rebooting in oom_solr.sh?
>
> Also I would like to know what query parameters were sent to
> /solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he
> OOM ...
>
>
> --
- Mark
about.me/markrmiller


Re: Solr 5.1.0 Cloud and Zookeeper

2015-05-05 Thread Mark Miller
A bug fix version difference probably won't matter. It's best to use the
same version everyone else uses and the one our tests use, but it's very
likely 3.4.5 will work without a hitch.

- Mark

On Tue, May 5, 2015 at 9:09 AM shacky  wrote:

> Hi.
>
> I read on
> https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
> that Solr needs to use the same ZooKeeper version it owns (at the
> moment 3.4.6).
> Debian Jessie has ZooKeeper 3.4.5
> (https://packages.debian.org/jessie/zookeeper).
>
> Are you sure that this version won't work with Solr 5.1.0?
>
> Thank you very much for your help!
> Bye
>


Re: Multiple index.timestamp directories using up disk space

2015-04-28 Thread Mark Miller
If copies of the index are not eventually cleaned up, I'd fill a JIRA to
address the issue. Those directories should be removed over time. At times
there will have to be a couple around at the same time and others may take
a while to clean up.

- Mark

On Tue, Apr 28, 2015 at 3:27 AM Ramkumar R. Aiyengar <
andyetitmo...@gmail.com> wrote:

> SolrCloud does need up to twice the amount of disk space as your usual
> index size during replication. Amongst other things, this ensures you have
> a full copy of the index at any point. There's no way around this, I would
> suggest you provision the additional disk space needed.
> On 20 Apr 2015 23:21, "Rishi Easwaran"  wrote:
>
> > Hi All,
> >
> > We are seeing this problem with solr 4.6 and solr 4.10.3.
> > For some reason, solr cloud tries to recover and creates a new index
> > directory - (ex:index.20150420181214550), while keeping the older index
> as
> > is. This creates an issues where the disk space fills up and the shard
> > never ends up recovering.
> > Usually this requires a manual intervention of  bouncing the instance and
> > wiping the disk clean to allow for a clean recovery.
> >
> > Any ideas on how to prevent solr from creating multiple copies of index
> > directory.
> >
> > Thanks,
> > Rishi.
> >
>


Re: Solr 5.0.0 and HDFS

2015-03-28 Thread Mark Miller
Hmm...can you file a JIRA issue with this info?

- Mark

On Fri, Mar 27, 2015 at 6:09 PM Joseph Obernberger 
wrote:

> I just started up a two shard cluster on two machines using HDFS. When I
> started to index documents, the log shows errors like this. They repeat
> when I execute searches.  All seems well - searches and indexing appear
> to be working.
> Possibly a configuration issue?
> My HDFS config:
>   class="solr.HdfsDirectoryFactory">
>  true
>  160
>   name="solr.hdfs.blockcache.direct.memory.allocation">true
>  16384
>  true
>  false
>  true
>  64
>  512
>  hdfs://nameservice1:8020/solr5
>  /etc/hadoop/conf.cloudera.hdfs1 str>
>  
> Thank you!
>
> -Joe
> 
>
> java.lang.IllegalStateException: file:
> BlockDirectory(HdfsDirectory@799d5a0e
> lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@49838b82) appears
> both in delegate and in cache: cache=[_25.fnm, _2d.si, _2e.nvd, _2b.si,
> _28.tvx, _2c.tvx, _1t.si, _27.nvd, _2b.tvd, _2d_Lucene50_0.pos, _23.nvd,
> _28_Lucene50_0.doc, _28_Lucene50_0.dvd, _2d.fdt, _2c_Lucene50_0.pos,
> _23.fdx, _2b_Lucene50_0.doc, _2d.nvm, _28.nvd, _23.fnm,
> _2b_Lucene50_0.tim, _2e.fdt, _2d_Lucene50_0.doc, _2b_Lucene50_0.dvd,
> _2d_Lucene50_0.dvd, _2b.nvd, _2g.tvx, _28_Lucene50_0.dvm,
> _1v_Lucene50_0.tip, _2e_Lucene50_0.dvm, _2e_Lucene50_0.pos, _2g.fdx,
> _2e.nvm, _2f.fdx, _1s.tvd, _23.nvm, _27.nvm, _1s_Lucene50_0.tip,
> _2c.fnm, _2b.fdt, _2d.fdx, _2c.fdx, _2c.nvm, _2e.fnm,
> _2d_Lucene50_0.dvm, _28.nvm, _28.fnm, _2b_Lucene50_0.tip,
> _2e_Lucene50_0.dvd, _2c.si, _2f.fdt, _2b.fnm, _2e_Lucene50_0.tip,
> _28.si, _28_Lucene50_0.tip, _2f.tvd, _2d_Lucene50_0.tim, _2f.tvx,
> _2b_Lucene50_0.pos, _2e.fdx, _28.fdx, _2c_Lucene50_0.dvd, _2g.tvd,
> _2c_Lucene50_0.tim, _2b.nvm, _23.fdt, _1s_Lucene50_0.tim,
> _28_Lucene50_0.tim, _2c_Lucene50_0.doc, _28.tvd, _2b.tvx, _2c.nvd,
> _2b.fdx, _2c_Lucene50_0.tip, _2e_Lucene50_0.doc, _2e_Lucene50_0.tim,
> _2c.fdt, _27.tvd, _2d.tvd, _2d.tvx, _28_Lucene50_0.pos,
> _2b_Lucene50_0.dvm, _2e.si, _2e.tvd, _2d.fnm, _2c.tvd, _2g.fdt, _2e.tvx,
> _28.fdt, _2d_Lucene50_0.tip, _2c_Lucene50_0.dvm,
> _2d.nvd],delegate=[_10.fdt, _10.fdx, _10.fnm, _10.nvd, _10.nvm, _10.si,
> _10.tvd, _10.tvx, _10_Lucene50_0.doc, _10_Lucene50_0.dvd,
> _10_Lucene50_0.dvm, _10_Lucene50_0.pos, _10_Lucene50_0.tim,
> _10_Lucene50_0.tip, _11.fdt, _11.fdx, _11.fnm, _11.nvd, _11.nvm, _11.si,
> _11.tvd, _11.tvx, _11_Lucene50_0.doc, _11_Lucene50_0.dvd,
> _11_Lucene50_0.dvm, _11_Lucene50_0.pos, _11_Lucene50_0.tim,
> _11_Lucene50_0.tip, _12.fdt, _12.fdx, _12.fnm, _12.nvd, _12.nvm, _12.si,
> _12.tvd, _12.tvx, _12_Lucene50_0.doc, _12_Lucene50_0.dvd,
> _12_Lucene50_0.dvm, _12_Lucene50_0.pos, _12_Lucene50_0.tim,
> _12_Lucene50_0.tip, _13.fdt, _13.fdx, _13.fnm, _13.nvd, _13.nvm, _13.si,
> _13.tvd, _13.tvx, _13_Lucene50_0.doc, _13_Lucene50_0.dvd,
> _13_Lucene50_0.dvm, _13_Lucene50_0.pos, _13_Lucene50_0.tim,
> _13_Lucene50_0.tip, _14.fdt, _14.fdx, _14.fnm, _14.nvd, _14.nvm, _14.si,
> _14.tvd, _14.tvx, _14_Lucene50_0.doc, _14_Lucene50_0.dvd,
> _14_Lucene50_0.dvm, _14_Lucene50_0.pos, _14_Lucene50_0.tim,
> _14_Lucene50_0.tip, _15.fdt, _15.fdx, _15.fnm, _15.nvd, _15.nvm, _15.si,
> _15.tvd, _15.tvx, _15_Lucene50_0.doc, _15_Lucene50_0.dvd,
> _15_Lucene50_0.dvm, _15_Lucene50_0.pos, _15_Lucene50_0.tim,
> _15_Lucene50_0.tip, _1f.fdt, _1f.fdx, _1f.fnm, _1f.nvd, _1f.nvm, _1f.si,
> _1f.tvd, _1f.tvx, _1f_Lucene50_0.doc, _1f_Lucene50_0.dvd,
> _1f_Lucene50_0.dvm, _1f_Lucene50_0.pos, _1f_Lucene50_0.tim,
> _1f_Lucene50_0.tip, _1g.fdt, _1g.fdx, _1g.fnm, _1g.nvd, _1g.nvm, _1g.si,
> _1g.tvd, _1g.tvx, _1g_Lucene50_0.doc, _1g_Lucene50_0.dvd,
> _1g_Lucene50_0.dvm, _1g_Lucene50_0.pos, _1g_Lucene50_0.tim,
> _1g_Lucene50_0.tip, _1h.fdt, _1h.fdx, _1h.fnm, _1h.nvd, _1h.nvm, _1h.si,
> _1h.tvd, _1h.tvx, _1h_Lucene50_0.doc, _1h_Lucene50_0.dvd,
> _1h_Lucene50_0.dvm, _1h_Lucene50_0.pos, _1h_Lucene50_0.tim,
> _1h_Lucene50_0.tip, _1i.fdt, _1i.fdx, _1i.fnm, _1i.nvd, _1i.nvm, _1i.si,
> _1i.tvd, _1i.tvx, _1i_Lucene50_0.doc, _1i_Lucene50_0.dvd,
> _1i_Lucene50_0.dvm, _1i_Lucene50_0.pos, _1i_Lucene50_0.tim,
> _1i_Lucene50_0.tip, _1j.fdt, _1j.fdx, _1j.fnm, _1j.nvd, _1j.nvm, _1j.si,
> _1j.tvd, _1j.tvx, _1j_Lucene50_0.doc, _1j_Lucene50_0.dvd,
> _1j_Lucene50_0.dvm, _1j_Lucene50_0.pos, _1j_Lucene50_0.tim,
> _1j_Lucene50_0.tip, _1k.fdt, _1k.fdx, _1k.fnm, _1k.nvd, _1k.nvm, _1k.si,
> _1k.tvd, _1k.tvx, _1k_Lucene50_0.doc, _1k_Lucene50_0.dvd,
> _1k_Lucene50_0.dvm, _1k_Lucene50_0.pos, _1k_Lucene50_0.tim,
> _1k_Lucene50_0.tip, _1l.fdt, _1l.fdx, _1l.fnm, _1l.nvd, _1l.nvm, _1l.si,
> _1l.tvd, _1l.tvx, _1l_Lucene50_0.doc, _1l_Lucene50_0.dvd,
> _1l_Lucene50_0.dvm, _1l_Lucene50_0.pos, _1l_Lucene50_0.tim,
> _1l_Lucene50_0.tip, _1m.fdt, _1m.fdx, _1m.fnm, _1m.nvd, _1m.nvm, _1m.si,
> _1m.tvd, _1m.tvx, _1m_Lucene50_0.doc, _1m_Lucene50_0.dvd,
> _1m_Lucene50_0.dvm, _1m_Lucene50_0.pos, _1m_Lucene50_0.tim,
>

Re: How to use ConcurrentUpdateSolrServer for Secured Solr?

2015-03-23 Thread Mark Miller
Doesn't ConcurrentUpdateSolrServer take an HttpClient in one of it's
constructors?

- Mark

On Sun, Mar 22, 2015 at 3:40 PM Ramkumar R. Aiyengar <
andyetitmo...@gmail.com> wrote:

> Not a direct answer, but Anshum just created this..
>
> https://issues.apache.org/jira/browse/SOLR-7275
>  On 20 Mar 2015 23:21, "Furkan KAMACI"  wrote:
>
> > Is there anyway to use ConcurrentUpdateSolrServer for secured Solr as
> like
> > CloudSolrServer:
> >
> > HttpClientUtil.setBasicAuth(cloudSolrServer.getLbServer().
> getHttpClient(),
> > , );
> >
> > I see that there is no way to access HTTPClient for
> > ConcurrentUpdateSolrServer?
> >
> > Kind Regards,
> > Furkan KAMACI
> >
>


Re: 4.10.4 - nodes up, shard without leader

2015-03-08 Thread Mark Miller
Interesting bug.

First there is the already closed transaction log. That by itself deserves
a look. I'm not even positive we should be replaying the log we
reconnecting from ZK disconnect, but even if we do, this should never
happen.

Beyond that there seems to be some race. Because of the log trouble, we try
and cancel the election - but we don't find the ephemeral election node yet
for some reason and so just assume it's fine, no node there to remove
(well, we WARN, because it is a little unexpected). Then that ephemeral
node materializes I guess, and the new leader doesn't register because the
old leader won't give up the thrown. We don't try and force the new leader
because that may just hide bugs and cause data loss, we no leader is
elected.

I'd guess there are two JIRA issues to resolve here.

- Mark

On Sun, Mar 8, 2015 at 8:37 AM Markus Jelsma 
wrote:

> Hello - i stumbled upon an issue i've never seen earlier, a shard with all
> nodes up and running but no leader. This is on 4.10.4. One of the two nodes
> emits the following error log entry:
>
> 2015-03-08 05:25:49,095 WARN [solr.cloud.ElectionContext] - [Thread-136] -
> : cancelElection did not find election node to remove
> /overseer_elect/election/93434598784958483-178.21.116.
> 225:8080_solr-n_000246
> 2015-03-08 05:25:49,121 WARN [solr.cloud.ElectionContext] - [Thread-136] -
> : cancelElection did not find election node to remove
> /collections/oi/leader_elect/shard3/election/93434598784958483-178.21.116.
> 225:8080_solr_oi_h-n_43
> 2015-03-08 05:25:49,220 ERROR [solr.update.UpdateLog] - [Thread-136] - :
> Error inspecting tlog 
> tlog{file=/opt/solr/cores/oi_c/data/tlog/tlog.0001394
> refcount=2}
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:679)
> at org.apache.solr.update.ChannelFastInputStream.
> readWrappedStream(TransactionLog.java:784)
> at org.apache.solr.common.util.FastInputStream.refill(
> FastInputStream.java:89)
> at org.apache.solr.common.util.FastInputStream.read(
> FastInputStream.java:125)
> at java.io.InputStream.read(InputStream.java:101)
> at org.apache.solr.update.TransactionLog.endsWithCommit(
> TransactionLog.java:218)
> at org.apache.solr.update.UpdateLog.recoverFromLog(
> UpdateLog.java:800)
> at org.apache.solr.cloud.ZkController.register(
> ZkController.java:841)
> at org.apache.solr.cloud.ZkController$1.command(
> ZkController.java:277)
> at org.apache.solr.common.cloud.ConnectionManager$1$1.run(
> ConnectionManager.java:166)
> 2015-03-08 05:25:49,225 ERROR [solr.update.UpdateLog] - [Thread-136] - :
> Error inspecting tlog 
> tlog{file=/opt/solr/cores/oi_c/data/tlog/tlog.0001471
> refcount=2}
> java.nio.channels.ClosedChannelException
> at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:679)
> at org.apache.solr.update.ChannelFastInputStream.
> readWrappedStream(TransactionLog.java:784)
> at org.apache.solr.common.util.FastInputStream.refill(
> FastInputStream.java:89)
> at org.apache.solr.common.util.FastInputStream.read(
> FastInputStream.java:125)
> at java.io.InputStream.read(InputStream.java:101)
> at org.apache.solr.update.TransactionLog.endsWithCommit(
> TransactionLog.java:218)
> at org.apache.solr.update.UpdateLog.recoverFromLog(
> UpdateLog.java:800)
> at org.apache.solr.cloud.ZkController.register(
> ZkController.java:841)
> at org.apache.solr.cloud.ZkController$1.command(
> ZkController.java:277)
> at org.apache.solr.common.cloud.ConnectionManager$1$1.run(
> ConnectionManager.java:166)
> 2015-03-08 12:21:04,438 WARN [solr.cloud.RecoveryStrategy] -
> [zkCallback-2-thread-28] - : Stopping recovery for core=oi_h coreNodeName=
> 178.21.116.225:8080_solr_oi_h
>
> The other node makes a mess in the logs:
>
> 2015-03-08 05:25:46,020 WARN [solr.cloud.RecoveryStrategy] -
> [zkCallback-2-thread-20] - : Stopping recovery for core=oi_c coreNodeName=
> 194.145.201.190:
> 8080_solr_oi_c
> 2015-03-08 05:26:08,670 ERROR [solr.cloud.ShardLeaderElectionContext] -
> [zkCallback-2-thread-19] - : There was a problem trying to register as the
> leader:org.
> apache.solr.common.SolrException: Could not register as the leader
> because creating the ephemeral registration node in ZooKeeper failed
> at org.apache.solr.cloud.ShardLeaderElectionContextBase
> .runLeaderProcess(ElectionContext.java:146)
> at org.apache.solr.cloud.ShardLeaderElectionContext.
> runLeaderProcess(ElectionContext.java:317)
> at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(
> LeaderElector.java:163)
> at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(
> LeaderElector.java:125)
> at org.apache.solr

Re: Solrcloud Index corruption

2015-03-05 Thread Mark Miller
If you google replication can cause index corruption there are two jira issues 
that are the most likely cause of corruption in a solrcloud env. 

- Mark

> On Mar 5, 2015, at 2:20 PM, Garth Grimm  
> wrote:
> 
> For updates, the document will always get routed to the leader of the 
> appropriate shard, no matter what server first receives the request.
> 
> -Original Message-
> From: Martin de Vries [mailto:mar...@downnotifier.com] 
> Sent: Thursday, March 05, 2015 4:14 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solrcloud Index corruption
> 
> Hi Erick,
> 
> Thank you for your detailed reply.
> 
> You say in our case some docs didn't made it to the node, but that's not 
> really true: the docs can be found on the corrupted nodes when I search on 
> ID. The docs are also complete. The problem is that the docs do not appear 
> when I filter on certain fields (however the fields are in the doc and have 
> the right value when I search on ID). So something seems to be corrupt in the 
> filter index. We will try the checkindex, hopefully it is able to identify 
> the problematic cores.
> 
> I understand there is not a "master" in SolrCloud. In our case we use haproxy 
> as a load balancer for every request. So when indexing every document will be 
> sent to a different solr server, immediately after each other. Maybe 
> SolrCloud is not able to handle that correctly?
> 
> 
> Thanks,
> 
> Martin
> 
> 
> 
> 
> Erick Erickson schreef op 05.03.2015 19:00:
> 
>> Wait up. There's no "master" index in SolrCloud. Raw documents are 
>> forwarded to each replica, indexed and put in the local tlog. If a 
>> replica falls too far out of synch (say you take it offline), then the 
>> entire index _can_ be replicated from the leader and, if the leader's 
>> index was incomplete then that might propagate the error.
>> 
>> The practical consequence of this is that if _any_ replica has a 
>> complete index, you can recover. Before going there though, the 
>> brute-force approach is to just re-index everything from scratch.
>> That's likely easier, especially on indexes this size.
>> 
>> Here's what I'd do.
>> 
>> Assuming you have the Collections API calls for ADDREPLICA and 
>> DELETEREPLICA, then:
>> 0> Identify the complete replicas. If you're lucky you have at least
>> one for each shard.
>> 1> Copy 1 good index from each shard somewhere just to have a backup.
>> 2> DELETEREPLICA on all the incomplete replicas
>> 2.5> I might shut down all the nodes at this point and check that all 
>> the cores I'd deleted were gone. If any remnants exist, 'rm -rf 
>> deleted_core_dir'.
>> 3> ADDREPLICA to get the ones removed in back.
>> 
>> should copy the entire index from the leader for each replica. As you 
>> do the leadership will change and after you've deleted all the 
>> incomplete replicas, one of the complete ones will be the leader and 
>> you should be OK.
>> 
>> If you don't want to/can't use the Collections API, then
>> 0> Identify the complete replicas. If you're lucky you have at least
>> one for each shard.
>> 1> Shut 'em all down.
>> 2> Copy the good index somewhere just to have a backup.
>> 3> 'rm -rf data' for all the incomplete cores.
>> 4> Bring up the good cores.
>> 5> Bring up the cores that you deleted the data dirs from.
>> 
>> What should do is replicate the entire index from the leader. When you 
>> restart the good cores (step 4 above), they'll _become_ the leader.
>> 
>> bq: Is it possible to make Solrcloud invulnerable for network problems 
>> I'm a little surprised that this is happening. It sounds like the 
>> network problems were such that some nodes weren't out of touch long 
>> enough for Zookeeper to sense that they were down and put them into 
>> recovery. Not sure there's any way to secure against that.
>> 
>> bq: Is it possible to see if a core is corrupt?
>> There's "CheckIndex", here's at least one link:
>> http://java.dzone.com/news/lucene-and-solrs-checkindex
>> What you're describing, though, is that docs just didn't make it to 
>> the node, _not_ that the index has unexpected bits, bad disk sectors 
>> and the like so CheckIndex can't detect that. How would it know what 
>> _should_ have been in the index?
>> 
>> bq: I noticed a difference in the "Gen" column on Overview - 
>> Replication. Does this mean there is something wrong?
>> You cannot infer anything from this. In particular, the merging will 
>> be significantly different between a single full-reindex and what the 
>> state of segment merges is in an incrementally built index.
>> 
>> The admin UI screen is rooted in the pre-cloud days, the Master/Slave 
>> thing is entirely misleading. In SolrCloud, since all the raw data is 
>> forwarded to all replicas, and any auto commits that happen may very 
>> well be slightly out of sync, the index size, number of segments, 
>> generations, and all that are pretty safely ignored.
>> 
>> Best,
>> Erick
>> 
>> On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries 
>> 
>> wrote:
>> 
>

Re: New leader/replica solution for HDFS

2015-02-26 Thread Mark Miller
I’ll be working on this at some point: 
https://issues.apache.org/jira/browse/SOLR-6237

- Mark

http://about.me/markrmiller

> On Feb 25, 2015, at 2:12 AM, longsan  wrote:
> 
> We used HDFS as our Solr index storage and we really have a heavy update
> load. We had met much problems with current leader/replica solution. There
> is duplicate index computing on Replilca side. And the data sync between
> leader/replica is always a problem.
> 
> As HDFS already provides data replication on data layer, could Solr provide
> just service layer replication?
> 
> My thought is that the leader and the replica all bind to the same data
> index directory. And the leader will build up index for new request, the
> replica will just keep update the index version with the leader(such as a
> soft commit periodically? ). If the leader lost then the replica will take
> the duty immediately. 
> 
> Thanks for any suggestion of this idea.
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/New-leader-replica-solution-for-HDFS-tp4188735.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Collections API - HTTP verbs

2015-02-18 Thread Mark Miller
Perhaps try quotes around the url you are providing to curl. It's not
complaining about the http method - Solr has historically always taken
simple GET's for http - for good or bad, you pretty much only post
documents / updates.

It's saying the name param is required and not being found and since you
are trying to specify the name, I'm guessing something about the command is
not working. You might try just shoving it in a browser url bar as well.

- Mark

On Wed Feb 18 2015 at 8:56:26 PM Hrishikesh Gadre 
wrote:

> Hi,
>
> Can we please document which HTTP method is supposed to be used with each
> of these APIs?
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> I am trying to invoke following API
>
> curl http://
> :8983/solr/admin/collections?action=CLUSTERPROP&name=urlScheme&
> val=https
>
> This request is failing due to following error,
>
> 2015-02-18 17:29:39,965 INFO org.apache.solr.servlet.SolrDispatchFilter:
> [admin] webapp=null path=/admin/collections params={action=CLUSTERPROP}
> status=400 QTime=20
>
> org.apache.solr.core.SolrCore: org.apache.solr.common.SolrException:
> Missing required parameter: name
>
> at
> org.apache.solr.common.params.RequiredSolrParams.get(
> RequiredSolrParams.java:49)
>
> at
> org.apache.solr.common.params.RequiredSolrParams.check(
> RequiredSolrParams.java:153)
>
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleProp(
> CollectionsHandler.java:238)
>
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(
> CollectionsHandler.java:200)
>
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:135)
>
> at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(
> SolrDispatchFilter.java:770)
>
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:271)
>
> I am using Solr 4.10.3 version.
>
> Thanks
>
> Hrishikesh
>


Re: Solrcloud (to HDFS) poor indexing performance

2015-02-03 Thread Mark Miller
What is your replication factor and doc size?

Replication can affect performance a fair amount more than it should currently.

For the number of nodes, that doesn’t sound like it matches what I’ve seen 
unless those are huge documents or you have some slow analyzer in the chain or 
something.

Without replication, with relatively small docs and decent hardware, I’d expect 
around 10,000-12,000 doc’s per node. Replication can up to half that by some 
reports. Larger doc size or other outliers might cut some off as well.

Solr 4.4 is pretty ancient in SolrCloud terms at this point in general by the 
way.

- Mark

http://about.me/markrmiller

> On Feb 3, 2015, at 7:47 PM, Tim Smith  wrote:
> 
> Hi,
> 
> I have a SolrCloud (Solr 4.4, writing to HDFS on CDH-5.3) collection
> configured to be populated by flume Morphlines sink. The flume agent reads
> data from Kafka and writes to the Solr collection.
> 
> The issue is that Solr indexing rate is abysmally poor (~6k docs/sec at
> best, dips to a few hundred per sec) across the cluster. The incoming
> data/document rate is about 30-40k/second.
> 
> I have gone wide/thin with 18 nodes and each with 8GB (Java) + 4GB
> (non-heap) memory and narrow/thick with current set of 5 dedicated nodes
> each with 36GB (Java) and 16GB (non-heap) memory (18 shards with the former
> config and 5 shards, right now).
> 
> On the flume side, I have gone from 5 flume instances, each with a single
> sink to 5 sinks for each flume instance. I have tweaked batchSize and
> batchDuration.
> 
> I checked ZooKeeper loads and don't see it stressed. Neither are the
> datanodes. On the Solr nodes, solr is consuming all the allocated memory
> (32GB) but I don't see solr hitting any CPU limits.
> 
> *But*, indexing rate stubbornly stays at ~6k docs/sec. When I bounce the
> flume agent, it jumps up momentarily to several hundreds of thousands but
> then comes down to ~6k/sec and the flume channels get saturated within
> seconds.
> 
> Any clues/pointers for troubleshooting will be appreciated?
> 
> 
> Thanks,
> 
> Tim



Re: replica never takes leader role

2015-01-28 Thread Mark Miller
Yes, after 45 seconds a replica should take over as leader. It should
likely explain in the logs of the replica that should be taking over why
this is not happening.

- Mar

On Wed Jan 28 2015 at 2:52:32 PM Joshi, Shital  wrote:

> When leader reaches 99% physical memory on the box and starts swapping
> (stops replicating), we forcefully bring down leader (first kill -15 and
> then kill -9 if kill -15 doesn't work). This is when we are looking up to
> replica to assume leader's role and it never happens.
>
> Zookeeper timeout is 45 seconds. We can increase it up to 2 minutes and
> test.
>
>  host="${host:}" hostPort="${jetty.port:8983}" 
> hostContext="${hostContext:solr}"
> zkClientTimeout="${zkClientTimeout:45000}">
>
> As per definition of zkClientTimeout, After the leader is brought down and
> it doesn't talk to zookeeper for 45 seconds, shouldn't ZK promote replica
> to leader? I am not sure how increasing zk timeout will help.
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, January 28, 2015 11:42 AM
> To: solr-user@lucene.apache.org
> Subject: Re: replica never takes leader role
>
> This is not the desired behavior at all. I know there have been
> improvements in this area since 4.8, but can't seem to locate the JIRAs.
>
> I'm curious _why_ the nodes are going down though, is it happening at
> random or are you taking it down? One problem has been that the Zookeeper
> timeout used to default to 15 seconds, and occasionally a node would be
> unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping
> the ZK timeout has helped some people avoid this...
>
> FWIW,
> Erick
>
> On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital 
> wrote:
>
> > We're using Solr 4.8.0
> >
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Tuesday, January 27, 2015 7:47 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: replica never takes leader role
> >
> > What version of Solr? This is an ongoing area of improvements and several
> > are very recent.
> >
> > Try searching the JIRA for Solr for details.
> >
> > Best,
> > Erick
> >
> > On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital 
> > wrote:
> >
> > > Hello,
> > >
> > > We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and
> three
> > > zookeeper instances. We have noticed that when a leader node goes down
> > the
> > > replica never takes over as a leader, cloud becomes unusable and we
> have
> > to
> > > bounce entire cloud for replica to assume leader role. Is this default
> > > behavior? How can we change this?
> > >
> > > Thanks.
> > >
> > >
> > >
> >
>


Re: Connection Reset Errors with Solr 4.4

2015-01-27 Thread Mark Miller
Sorry, there is no great workaroud. You might try raising the max idle time
for your container - perhaps that makes it less frequent.

- Mark

On Tue Jan 20 2015 at 1:56:54 PM Nishanth S  wrote:

> Thank you Mike.Sure enough,we are running into the same issue you
> mentoined.Is there a quick fix for this other than the patch.I do not see
> the tlogs getting replayed at all.It is doing a full index recovery from
> the leader and our index size is around 200G.Would lowering the autocommit
> settings help(where the replica would go for a tlog replay as the tlogs I
> see are not huge).
>
> Thanks,
> Nishanth
>
> On Tue, Jan 20, 2015 at 10:46 AM, Mike Drob  wrote:
>
> > Are we sure this isn't SOLR-6931?
> >
> > On Tue, Jan 20, 2015 at 11:39 AM, Nishanth S 
> > wrote:
> >
> > > Hello All,
> > >
> > > We are running solr cloud 4.4 with 30 shards and 3 replicas with real
> > time
> > > indexing on rhel 6.5.The indexing rate is 3K Tps now.We are running
> into
> > an
> > > issue with replicas going into recovery mode  due to connection reset
> > > errors.Soft commit time is 2 min and auto commit is set as 5 minutes.I
> > have
> > > seen that replicas do a full index recovery which takes a long
> > > time(days).Below is the error trace that  I see.I would really
> appreciate
> > > any help in this case.
> > >
> > > g.apache.solr.client.solrj.SolrServerException: IOException occured
> when
> > > talking to server at: http://xxx:8083/solr/log_pn_shard20_replica2
> > > at
> > >
> > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(
> HttpSolrServer.java:435)
> > > at
> > >
> > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(
> HttpSolrServer.java:180)
> > > at
> > >
> > >
> > org.apache.solr.update.SolrCmdDistributor$1.call(
> SolrCmdDistributor.java:401)
> > > at
> > >
> > >
> > org.apache.solr.update.SolrCmdDistributor$1.call(
> SolrCmdDistributor.java:375)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > at
> > > java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:471)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > > at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:745)
> > > Caused by: java.net.SocketException: Connection reset
> > > at java.net.SocketInputStream.read(SocketInputStream.java:196)
> > > at java.net.SocketInputStream.read(SocketInputStream.java:122)
> > > at
> > >
> > >
> > org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(
> AbstractSessionInputBuffer.java:166)
> > > at
> > >
> > >
> > org.apache.http.impl.io.SocketInputBuffer.fillBuffer(
> SocketInputBuffer.java:90)
> > > at
> > >
> > >
> > org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(
> AbstractSessionInputBuffer.java:281)
> > > at
> > >
> > >
> > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(
> DefaultHttpResponseParser.java:92)
> > > at
> > >
> > >
> > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(
> DefaultHttpResponseParser.java:62)
> > > at
> > >
> > >
> > org.apache.http.impl.io.AbstractMessageParser.parse(
> AbstractMessageParser.java:254)
> > > at
> > >
> > >
> > org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(
> AbstractHttpClientConnection.java:289)
> > > at
> > >
> > >
> > org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(
> DefaultClientConnection.java:252)
> > > at
> > >
> > >
> > org.apache.http.impl.conn.ManagedClientConnectionImpl.
> receiveResponseHeader(ManagedClientConnectionImpl.java:191)
> > > at
> > >
> > >
> > org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(
> HttpRequestExecutor.java:300)
> > > at
> > >
> > >
> > org.apache.http.protocol.HttpRequestExecutor.execute(
> HttpRequestExecutor.java:127)
> > > at
> > >
> > >
> > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(
> DefaultRequestDirector.java:717)
> > > at
> > >
> > >
> > org.apache.http.impl.client.DefaultRequestDirector.execute(
> DefaultRequestDirector.java:522)
> > > at
> > >
> > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(
> AbstractHttpClient.java:906)
> > > at
> > >
> > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(
> AbstractHttpClient.java:805)
> > > at
> > >
> > >
> > org.apache.http.impl.client.AbstractHttpClient.execute(
> AbstractHttpClient.java:784)
> > > at
> > >
> > >
> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(
> HttpSolrServer.java:365)
> > > ... 9 more
> > >
> > >
> > > Thanks,
> > > Nishanth
> > >
> >
>


Re: solr cloud replicas goes in recovery mode after update

2015-01-26 Thread Mark Miller
bq. Is this the correct approach ?

It works, but it might not be ideal. Recent versions of ZooKeeper have an
alternate config for this max limit though, and it is preferable to use
that.

See maxSessionTimeout in
http://zookeeper.apache.org/doc/r3.3.1/zookeeperAdmin.html

- Mark

On Mon Jan 26 2015 at 4:44:41 PM Vijay Sekhri  wrote:

> Hi Erick,
>
> The older message seems to be deleted so I am sending a new one
> http://osdir.com/ml/solr-user.lucene.apache.org/2015-01/msg00773.html
>
>
> In solr.xml file I had zk timeout set to*   name="zkClientTimeout">${zkClientTimeout:45}*
> One thing that made a it a bit better now is the zk tick time and syncLimit
> settings. I set it to a higher value as below. This may not be advisable
> though.
>
> tickTime=3
> initLimit=30
> syncLimit=20
>
> Now we observed that replicas do not go in recovery that often as before.
> In the whole cluster at a given time I would have a couple of replicas in
> recovery whereas earlier it were multiple replicas from every shard .
> On the wiki https://wiki.apache.org/solr/SolrCloud it says the "The
> maximum
> is 20 times the tickTime." in the FAQ so I decided to increase the tick
> time. Is this the correct approach ?
>
> One question I have is that if auto commit settings has anything to do with
> this or not ? Does it induce extra work for the searchers because of which
> this would happen? I have tried with following settings
> *  *
> *50*
> *90*
> **
>
> **
> *20*
> *3*
> *false*
> **
>
> I have increased  the  heap size to 15GB for each JVM instance . I
> monitored during full indexing how the heap usage looks like and it never
> goes beyond 8 GB .  I don't see any Full GC happening at any point .
>
>
>  Our rate is a variable rate . It is not a sustained rate of 6000/second ,
> however there are intervals where it would reach that much and come down
> and grow again and come down.  So if I would take an average it would be
> 600/second only but that is not real rate at any given time.
> Version of solr cloud is *4.10*.  All indexers are basically java programs
> running on different host using CloudSolrServer api.
> As I mentioned it is much better now than before , however not completely
> as expected .If possible we would like to have none of them go in recovery
>
> I captured some logs before and after recovery
>
> 14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: delete
> "_5r2_1.del"
> 14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: 0 msec
> to
> checkpoint
> 14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: now
> checkpoint "_4qe(4.10.0):C4312879/1nts ; isCommit = false]
> 14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: delete
> "_5r4_1.del"
> 14:16:52,774 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: 0 msec
> to
> checkpoint
> 14:16:52,775 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [DW][recoveryExecutor-7-thread-1]:
> recoveryExecutor-7-thread-1 finishFullFlush success=true
> 14:16:52,775 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> findMerges: 34 segments
> 14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_554(4.10.0):C3995865/780418:delGen=23 size=3669.307 MB [skip: too
> large]
> 14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_4qe(4.10.0):C4312879/1370113:delGen=57 size=3506.254 MB [skip: too
> large]
> 14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_5co(4.10.0):C871785/93995:delGen=11 size=853.668 MB
> 14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_5kb(4.10.0):C424868/49572:delGen=12 size=518.704 MB
> 14:16:52,777 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_5hm(4.10.0):C457977/83353:delGen=12 size=470.422 MB
> 14:16:52,778 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_56u(4.10.0):C286775/11906:delGen=15 size=312.952 MB
> 14:16:52,778 INFO  [org.apache.solr.update.LoggingInfoStream]
> (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]:
> seg=_5f5(4.10.0):C116528/43621:delGen=2 size=95.529 MB
> 14

Re: Distributed unit tests and SSL doesn't have a valid keystore

2015-01-12 Thread Mark Miller
I'd have to do some digging. Hossman might know offhand. You might just
want to use @SupressSSL on the tests :)

- Mark

On Mon Jan 12 2015 at 8:45:11 AM Markus Jelsma 
wrote:

> Hi - in a small Maven project depending on Solr 4.10.3, running unit tests
> that extend BaseDistributedSearchTestCase randomly fail with "SSL doesn't
> have a valid keystore", and a lot of zombie threads. We have a
> solrtest.keystore file laying around, but where to put it?
>
> Thanks,
> Markus
>


Re: leader split-brain at least once a day - need help

2015-01-12 Thread Mark Miller
bq. ClusterState says we are the leader, but locally we don't think so

Generally this is due to some bug. One bug that can lead to it was recently
fixed in 4.10.3 I think. What version are you on?

- Mark

On Mon Jan 12 2015 at 7:35:47 AM Thomas Lamy  wrote:

> Hi,
>
> I found no big/unusual GC pauses in the Log (at least manually; I found
> no free solution to analyze them that worked out of the box on a
> headless debian wheezy box). Eventually i tried with -Xmx8G (was 64G
> before) on one of the nodes, after checking allocation after 1 hour run
> time was at about 2-3GB. That didn't move the time frame where a restart
> was needed, so I don't think Solr's JVM GC is the problem.
> We're trying to get all of our node's logs (zookeeper and solr) into
> Splunk now, just to get a better sorted view of what's going on in the
> cloud once a problem occurs. We're also enabling GC logging for
> zookeeper; maybe we were missing problems there while focussing on solr
> logs.
>
> Thomas
>
>
> Am 08.01.15 um 16:33 schrieb Yonik Seeley:
> > It's worth noting that those messages alone don't necessarily signify
> > a problem with the system (and it wouldn't be called "split brain").
> > The async nature of updates (and thread scheduling) along with
> > stop-the-world GC pauses that can change leadership, cause these
> > little windows of inconsistencies that we detect and log.
> >
> > -Yonik
> > http://heliosearch.org - native code faceting, facet functions,
> > sub-facets, off-heap data
> >
> >
> > On Wed, Jan 7, 2015 at 5:01 AM, Thomas Lamy 
> wrote:
> >> Hi there,
> >>
> >> we are running a 3 server cloud serving a dozen
> >> single-shard/replicate-everywhere collections. The 2 biggest
> collections are
> >> ~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5,
> Tomcat
> >> 7.0.56, Oracle Java 1.7.0_72-b14
> >>
> >> 10 of the 12 collections (the small ones) get filled by DIH full-import
> once
> >> a day starting at 1am. The second biggest collection is updated usind
> DIH
> >> delta-import every 10 minutes, the biggest one gets bulk json updates
> with
> >> commits once in 5 minutes.
> >>
> >> On a regular basis, we have a leader information mismatch:
> >> org.apache.solr.update.processor.DistributedUpdateProcessor; Request
> says it
> >> is coming from leader, but we are the leader
> >> or the opposite
> >> org.apache.solr.update.processor.DistributedUpdateProcessor;
> ClusterState
> >> says we are the leader, but locally we don't think so
> >>
> >> One of these pop up once a day at around 8am, making either some cores
> going
> >> to "recovery failed" state, or all cores of at least one cloud node into
> >> state "gone".
> >> This started out of the blue about 2 weeks ago, without changes to
> neither
> >> software, data, or client behaviour.
> >>
> >> Most of the time, we get things going again by restarting solr on the
> >> current leader node, forcing a new election - can this be triggered
> while
> >> keeping solr (and the caches) up?
> >> But sometimes this doesn't help, we had an incident last weekend where
> our
> >> admins didn't restart in time, creating millions of entries in
> >> /solr/oversser/queue, making zk close the connection, and leader
> re-elect
> >> fails. I had to flush zk, and re-upload collection config to get solr up
> >> again (just like in https://gist.github.com/
> isoboroff/424fcdf63fa760c1d1a7).
> >>
> >> We have a much bigger cloud (7 servers, ~50GiB Data in 8 collections,
> 1500
> >> requests/s) up and running, which does not have these problems since
> >> upgrading to 4.10.2.
> >>
> >>
> >> Any hints on where to look for a solution?
> >>
> >> Kind regards
> >> Thomas
> >>
> >> --
> >> Thomas Lamy
> >> Cytainment AG & Co KG
> >> Nordkanalstrasse 52
> >> 20097 Hamburg
> >>
> >> Tel.: +49 (40) 23 706-747
> >> Fax: +49 (40) 23 706-139
> >> Sitz und Registergericht Hamburg
> >> HRA 98121
> >> HRB 86068
> >> Ust-ID: DE213009476
> >>
>
>
> --
> Thomas Lamy
> Cytainment AG & Co KG
> Nordkanalstrasse 52
> 20097 Hamburg
>
> Tel.: +49 (40) 23 706-747
> Fax: +49 (40) 23 706-139
>
> Sitz und Registergericht Hamburg
> HRA 98121
> HRB 86068
> Ust-ID: DE213009476
>
>


Re: Garbage Collection tuning - G1 is now a good option

2015-01-02 Thread Mark Miller
bq. But tons of people on this mailing list do not recommend AggressiveOpts

It's up to you to decide - that is why it's an option. It will enable more
aggressive options that will tend to perform better. On the other hand,
these more aggressive options and optimizations have a history of being
more buggy. Depending on your needs and risk tolerance, you might make a
different choice.

You probably want to research the specific issues it has been found to
cause and whether those issues are important to you or affect the JVM you
are using. You also have to keep in mind that probably all of the issues
are not known or documented and that updates may introduce new issues. You
face some risk like that no matter what though.

- Mark

On Fri Jan 02 2015 at 8:37:47 AM Shawn Heisey  wrote:

> On 1/1/2015 6:35 PM, William Bell wrote:
> > But tons of people on this mailing list do not recommend AggressiveOpts
> >
> > Why do you recommend it?
>
> I haven't done any comparisons with and without it.  To call it a
> "recommendation" is a little bit strong.  I use it, and I am seeing good
> results.
>
> My reading indicates that AggressiveOpts basically enables settings that
> are being considered for defaults in a later Java version.  If they are
> being seriously considered for new defaults, they are probably something
> that I want to be using.  I have also heard that there are sometimes
> bugs with that option, but I have not personally had any problems.
>
> If you don't want to use that option, feel free to leave it out.  I will
> update my wiki page with a note about AggressiveOpts.
>
> Thanks,
> Shawn
>
>


[ANNOUNCE] Apache Solr 4.10.3 released

2014-12-29 Thread Mark Miller
December 2014, Apache Solr™ 4.10.3 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.10.3

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.10.3 is available for immediate download at:

http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Solr 4.10.3 includes 21 bug fixes, as well as Lucene 4.10.3 and its 12
bug fixes.

This release fixes the following security vulnerability that has
affected Solr since the Solr 4.0 Alpha release.

CVE-2014-3628: Stored XSS vulnerability in Solr Admin UI.

Information disclosure: The Solr Admin UI Plugin / Stats page does not
escape data values which allows an attacker to execute javascript by
executing a query that will be stored and displayed via the
'fieldvaluecache' object.

See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

Happy Holidays,

Mark Miller

http://www.about.me/markrmiller


Re: Dealing with bad apples in a SolrCloud cluster

2014-11-21 Thread Mark Miller
bq.  esp. since we've set max threads so high to avoid distributed
dead-lock.

We should fix this for 5.0 - add a second thread pool that is used for
internal requests. We can make it optional if necessary (simpler default
container support), but it's a fairly easy improvement I think.

- Mark

On Fri Nov 21 2014 at 1:56:51 PM Timothy Potter 
wrote:

> Just soliciting some advice from the community ...
>
> Let's say I have a 10-node SolrCloud cluster and have a single collection
> with 2 shards with replication factor 10, so basically each shard has one
> replica on each of my nodes.
>
> Now imagine one of those nodes starts getting into a bad state and starts
> to be slow about serving queries (not bad enough to crash outright though)
> ... I'm sure we could ponder any number of ways a box might slow down
> without crashing.
>
> From my calculations, about 2/10ths of the queries will now be affected
> since
>
> 1/10 queries from client apps will hit the bad apple
>   +
> 1/10 queries from other replicas will hit the bad apple (distrib=false)
>
>
> If QPS is high enough and the bad apple is slow enough, things can start to
> get out of control pretty fast, esp. since we've set max threads so high to
> avoid distributed dead-lock.
>
> What have others done to mitigate this risk? Anything we can do in Solr to
> help deal with this? It seems reasonable that nodes can identify a bad
> apple by keeping track of query times and looking for nodes that are
> significantly outside (>=2 stddev) what the other nodes are doing. Then
> maybe mark the node as being down in ZooKeeper so clients and other nodes
> stop trying to send requests to it; or maybe a simple policy of just don't
> send requests to that node for a few minutes.
>


Re: Solrcloud and remote Zookeeper ensemble

2014-11-19 Thread Mark Miller
If someone wants to file a JIRA, we really should detect and help the user
on that.

- Mark

On Wed Nov 19 2014 at 10:39:56 AM Robert Kent 
wrote:

> Yes, Alan's comment was correct.  Using the correct Zookeeper string made
> things work correctly, e.g.:
>
>  SOLR_ZK_ENSEMBLE=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/solr
> 
> From: Erick Erickson [erickerick...@gmail.com]
> Sent: 19 November 2014 14:32
> To: solr-user@lucene.apache.org
> Subject: Re: Solrcloud and remote Zookeeper ensemble
>
> Alan's comment is spot on, and it's the first thing I'd try.
>
> Beyond that, though, this forum really doesn't have much
> knowledge about various company's bundling for Solr and
> associated support tools so you might get more knowledgeable
> responses from the Cloudera support forums...
>
> Just in case there's a thundering silence ;)
>
> Best,
> Erick
>
> On Wed, Nov 19, 2014 at 5:37 AM, Alan Woodward  wrote:
> >> SOLR_ZK_ENSEMBLE=zookeeper1:2181/solr,zookeeper2:2181/
> solr,zookeeper3:2181/solr
> >
> > This is the incorrect part, it should be:
> >
> >> SOLR_ZK_ENSEMBLE=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/solr
> >
> > The chroot is only appended at the end of the connection string.  Not
> the way I would have done it, but that's how ZK works...
> >
> > Alan Woodward
> > www.flax.co.uk
> >
> >
> > On 19 Nov 2014, at 12:54, Robert Kent wrote:
> >
> >> Hi,
> >>
> >> I'm experiencing some odd behaviour with Solrcloud and Zookeeper.  I am
> running Solrcloud on one host and am running three Zookeepers on another
> three hosts.  The Zookeeper part of things works correctly, I can
> add/remove/etc nodes from Zookeeper.  I am running, or rather trying to
> run, Solrcloud on top of Hadoop.  Again, the Hadoop side of things works
> correctly, I can create/remove/etc dirs/files under Hadoop.
> >>
> >> Unfortunately, the solrctl utility bundled with Solrcloud doesn't
> appear to work correctly.  Depending on how or where I set the Zookeeper
> ensemble details I get different results.  My Zookeeper instances are used
> by other services, so I am trying to force the Solrcloud configuration to
> be created under /solr - from reading the documentation this appears to be
> the recommended appraoch.
> >>
> >> I have set the Zookeeper ensemble and Hadoop configuration in
> /etc/default/solr:
> >>
> >> SOLR_ZK_ENSEMBLE=zookeeper1:2181/solr,zookeeper2:2181/
> solr,zookeeper3:2181/solr
> >> SOLR_HDFS_HOME=hdfs://zookeeper1:8020/solr
> >> SOLR_HDFS_CONFIG=/etc/hadoop/conf
> >> SOLR_HDFS_HOME=hdfs://3xNodeHA:8020/solr
> >>
> >> If I do not specify any Zookeeper parameters for solrctl it creates it
> Zookeeper configuration under '/solr,zookeeper2:2181' and under that is
> creates  '/solr,zookeeper3:2181/solr/configs/my-data'.  This also occurs
> if I specify --zk 
> zookeeper1:2181/solr,zookeeper2:2181/solr,zookeeper3:2181/solr.
> I suspect that something somewhere is not treating the SOLR_ZK_ENSEMBLE
> variable correctly and believes it is a single connection (eg
> zookeeper1:2181) and the path is /solr,zookeeper2:2181,
> zookeeper3:2181/solr.
> >>
> >> If I run solrctl with --zk zookeeper1:2181, it creates its
> configuration under / (eg /solr.xml /configs).
> >>
> >> If I run solrctl with --zk zookeeper1:2181/solr, it creates the
> configuration under /solr
> >>
> >>
> >> If I completely ignore the Zookeeper configuration Solr works
> correctly, but as I'm using Lily I need Solr's configuration to exist under
> Zookeeper.
> >>
> >> What am I missing?  How can I specify a multi-node Zookeeper ensemble
> and have all of the configuration nodes created under /solr?  How do I
> point Tomcat towards the Solr configuration under /solr?
> >>
> >> If you would like more details, please look at the attachment as this
> explains what I did at each step and the results of that step.
> >>
> >>
> >> I'm using Cloudera's packages throughout.
> >>
> >> thanks
> >>
> >> Rob
> >>
> >> Registered name: In Practice Systems Ltd.
> >> Registered address: The Bread Factory, 1a Broughton Street, London, SW8
> 3QJ
> >> Registered Number: 1788577
> >> Registered in England
> >> Visit our Internet Web site at www.inps.co.uk
> >> The information in this internet email is confidential and is intended
> solely for the addressee. Access, copying or re-use of information in it by
> anyone else is not authorised. Any views or opinions presented are solely
> those of the author and do not necessarily represent those of INPS or any
> of its affiliates. If you are not the intended recipient please contact
> is.helpd...@inps.co.uk
> >>
> >> 
> >


Re: Log message "zkClient has disconnected".

2014-10-29 Thread Mark Miller


> On Oct 28, 2014, at 9:31 AM, Shawn Heisey  wrote:
> 
> exceed a 15 second zkClientTimeout

Which is too low even with good GC settings. Anyone with config still using 15 
or 10 seconds should move it to at least 30.

- Mark

http://about.me/markrmiller

Re: Recovering from Out of Mem

2014-10-14 Thread Mark Miller
Best is to pass the Java cmd line option that kills the process on OOM and 
setup a supervisor on the process to restart it.  You need a somewhat recent 
release for this to work properly though. 

- Mark

> On Oct 14, 2014, at 9:06 AM, Salman Akram 
>  wrote:
> 
> I know there are some suggestions to avoid OOM issue e.g. setting
> appropriate Max Heap size etc. However, what's the best way to recover from
> it as it goes into non-responding state? We are using Tomcat on back end.
> 
> The scenario is that once we face OOM issue it keeps on taking queries
> (doesn't give any error) but they just time out. So even though we have a
> fail over system implemented but we don't have a way to distinguish if
> these are real time out queries OR due to OOM.
> 
> -- 
> Regards,
> 
> Salman Akram


Re: SolrCloud: Meaning of SYNC state in ZkStateReader?

2014-10-13 Thread Mark Miller
I think it's just cruft I left in and never ended up using anywhere. You can 
ignore it. 

- Mark

> On Oct 13, 2014, at 8:42 PM, Martin Grotzke  
> wrote:
> 
> Hi,
> 
> can anybody tell me the meaning of ZkStateReader.SYNC? All other state
> related constants are clear to me, I'm only not sure about the semantics
> of SYNC.
> 
> Background: I'm working on an async solr client
> (https://github.com/inoio/solrs) and want to add SolrCloud support - for
> this I'm reusing ZkStateReader.
> 
> TIA && cheers,
> Martin
> 


Re: Scaling to large Number of Collections

2014-08-31 Thread Mark Miller
>
> so you might still end up with these out of threads issue again.


You can also generally drop the stack size (Xss) quite a bit to to handle
more threads.

Beyond that, there are some thread pools you can configure. However, until
we fix the distrib deadlock issue, you don't want to drop the container
thread pool too much. There are other control points though.

- Mark
http://about.me/markrmiller


On Sun, Aug 31, 2014 at 11:53 AM, Ramkumar R. Aiyengar <
andyetitmo...@gmail.com> wrote:

> On 31 Aug 2014 13:24, "Mark Miller"  wrote:
> >
> >
> > > On Aug 31, 2014, at 4:04 AM, Christoph Schmidt <
> christoph.schm...@moresophy.de> wrote:
> > >
> > > we see at least two problems when scaling to large number of
> collections. I would like to ask the community, if they are known and maybe
> already addressed in development:
> > > We have a SolrCloud running with the following numbers:
> > > -  5 Servers (each 24 CPUs, 128 RAM)
> > > -  13.000 Collection with 25.000 SolrCores in the Cloud
> > > The Cloud is working fine, but we see two problems, if we like to scale
> further
> > > 1.   Resource consumption of native system threads
> > > We see that each collection opens at least two threads: one for the
> zookeeper (coreZkRegister-1-thread-5154) and one for the searcher
> (searcherExecutor-28357-thread-1)
> > > We will run in "OutOfMemoryError: unable to create new native thread".
> Maybe the architecture could be changed here to use thread pools?
> > > 2.   The shutdown and the startup of one server in the SolrCloud
> takes 2 hours. So a rolling start is about 10h. For me the problem seems to
> be that leader election is "linear". The Overseer does core per core. The
> organisation of the cloud is not done parallel or distributed. Is this
> already addressed by https://issues.apache.org/jira/browse/SOLR-5473 or is
> there more needed?
> >
> > 2. No, but it should have been fixed by another issue that will be in
> 4.10.
>
> Note however that this fix will result in even more temporary thread usage
> as all leadership elections will happen in parallel, so you might still end
> up with these out of threads issue again.
>
> Quite possibly the out of threads issue is just some system soft limit
> which is kicking in. Linux certainly has a limit you can configure through
> sysctl, your OS, whatever that might be, probably does the same. May be
> worth exploring if you can bump that up.
>
> >
> > - Mark
> > http://about.me/markrmiller


Re: Scaling to large Number of Collections

2014-08-31 Thread Mark Miller

> On Aug 31, 2014, at 4:04 AM, Christoph Schmidt 
>  wrote:
> 
> we see at least two problems when scaling to large number of collections. I 
> would like to ask the community, if they are known and maybe already 
> addressed in development:
> We have a SolrCloud running with the following numbers:
> -  5 Servers (each 24 CPUs, 128 RAM)
> -  13.000 Collection with 25.000 SolrCores in the Cloud
> The Cloud is working fine, but we see two problems, if we like to scale 
> further
> 1.   Resource consumption of native system threads
> We see that each collection opens at least two threads: one for the zookeeper 
> (coreZkRegister-1-thread-5154) and one for the searcher 
> (searcherExecutor-28357-thread-1)
> We will run in "OutOfMemoryError: unable to create new native thread". Maybe 
> the architecture could be changed here to use thread pools?
> 2.   The shutdown and the startup of one server in the SolrCloud takes 2 
> hours. So a rolling start is about 10h. For me the problem seems to be that 
> leader election is "linear". The Overseer does core per core. The 
> organisation of the cloud is not done parallel or distributed. Is this 
> already addressed by https://issues.apache.org/jira/browse/SOLR-5473 or is 
> there more needed?

2. No, but it should have been fixed by another issue that will be in 4.10.


- Mark
http://about.me/markrmiller

A Fast, Generic, Solr Log Reader

2014-08-29 Thread Mark Miller
I am often asked to take a look at one to many Solr log files that are
hundreds of megabytes to gigabytes in size. "Peaking" at this amount of
logs is a bit time consuming. Anybody that does this often enough has to
build a log parsing tool eventually. One off greps can only get you so far.
The last log reader I started putting together was after a hurricane a year
or two back while I was without power for a week. I have a 6 core machine
and a lot of RAM and I wanted to be able to blow through hundreds of
megabytes of log files in a few seconds (why not). So I started this multi
threaded single file Solr log reader that uses MappedByteBuffers. As I have
had to do a little debugging here or there over the months I have added a
bit to the program. It's my current go to tool when faced with a dozen
gigabyte log files. And it's pretty darn fast if you have some cores to
throw at it.

At this point, to avoid losing the program and someday starting yet again,
I've shared it on GitHub and invite anyone else looking to have a really
good standard log analyzer available to help complete it or offer logs for
testing. It's still early. It's still basically in personal, hack project
phase. But as I have further needs I will continue to add to it and improve
it. It already does a bit more than the current sample output in the README.

My goal is to make it generic enough to read a variety of formats – either
by being flexible internally or through user configuration. With a little
effort, there is a lot of great information and summarization that can be
pulled out of Solr logs.

https://github.com/markrmiller/SolrLogReader


-- 
- Mark

http://about.me/markrmiller


Re: ADDREPLICA doesn't respect requested solr_port assignment, replicas can report green w/o replicating

2014-08-24 Thread Mark Miller
Sounds like you should file 3 JIRA issues. They all look like legit stuff we 
should dig into on a glance.

-- 
Mark Miller
about.me/markrmiller

On August 24, 2014 at 12:35:13 PM, ralph tice (ralph.t...@gmail.com) wrote:
> Hi all,
> 
> Two issues, first, when I issue an ADDREPLICA call like so:
> 
> 
> http://localhost:8983/solr/admin/collections?action=ADDREPLICA&shard=myshard&collection=mycollection&createNodeSet=solr18.mycorp.com:8983_solr
>  
> 
> It does not seem to respect the 8983_solr designation in the createNodeSet
> parameter and instead places the shard on any JVM on the node. First
> attempt I got a replica on 8994_solr and second attempt to place a replica
> on 8983 got a replica on 8992_solr instead.
> 
> As an aside, is there any particular reason why DELETEREPLICA asks for the
> ZK "shard id" (node_###) instead of the same syntax as createNodeSet? I
> can't recall any other instance in which the ZK "shard id" is exposed via
> query parameter and I've only ever seen it in clusterstate.json /
> CLUSTERSTATUS calls.
> 
> The 2nd issues is as follows:
> 
> I am running Solr built off branch_4x, and thanks to some help from IRC
> we've determined that we have an incompatible index situation where we have
> indexes built with 4.9 that we can read but not index into further or
> update. Understandable, and going forward we don't intend to run off of
> master. In this situation, if I try to add a replica, this also fails,
> however, the only log ouput (at WARN threshold) is:
> 
> 16:21:58.156 [RecoveryThread] WARN org.apache.solr.update.PeerSync - no
> frame of reference to tell if we've missed updates
> 
> ...and the replica comes up green. I think this might indicate a missing
> integrity check on replication but certainly IMO a replica should report as
> green/active if it is not on the same revision as the leader, or at least
> if it has never been on the same revision as the leader.
> 
> Thanks for any assistance/validation/advice,
> 
> --Ralph
> 



Re: Why does CLUSTERSTATUS return different information than the web cloud view?

2014-08-23 Thread Mark Miller
The state is actually a combo of the state in clusterstate and the live nodes. 
If the live node is not there, it's gone regardless of the last state it 
published. 

- Mark

> On Aug 23, 2014, at 6:00 PM, Nathan Neulinger  wrote:
> 
> In particular, a shard being 'active' vs. 'gone'.
> 
> The web ui is clearly showing the given replicas as being in "Gone" state 
> when I shut down a server, yet the CLUSTERSTATUS says that each replica has 
> state: "active"
> 
> Is there any way to ask it for status that will reflect that the replica is 
> gone?
> 
> This is with 4.8.0.
> 
> -- Nathan
> 
> 
> Nathan Neulinger   nn...@neulinger.org
> Neulinger Consulting   (573) 612-1412


Re: Replication of full index to replica after merge index into leader not working

2014-08-19 Thread Mark Miller

On August 19, 2014 at 1:33:10 PM, Mark Miller (markrmil...@gmail.com) wrote:
> > sounds like we should write a test and make it work.

Keeping in mind that when using a shared filesystem like HDFS or especially if 
using the MapReduce contrib, you probably won’t want this new behavior.

-- 
Mark Miller
about.me/markrmiller


Re: Replication of full index to replica after merge index into leader not working

2014-08-19 Thread Mark Miller
I’d just file a JIRA. Merge, like optimize and a few other things, were never 
tested or considered in early SolrCloud days. It’s used in the HDFS stuff, but 
in that case, the index is merged to all replicas and no recovery is necessary.

If you want to make the local filesystem merge work well with SolrCloud, sounds 
like we should write a test and make it work.

--  
Mark Miller
about.me/markrmiller

On August 19, 2014 at 1:20:54 PM, Timothy Potter (thelabd...@gmail.com) wrote:
> Hi,
>  
> Using the coreAdmin mergeindexes command to merge an index into a
> leader (SolrCloud mode on 4.9.0) and the replica does not do a snap
> pull from the leader as I would have expected. The merge into the
> leader worked like a charm except I had to send a hard commit after
> that (which makes sense).
>  
> I'm guessing the replica would snap pull from the leader if I
> restarted it, but reloading the collection or core does not trigger
> the replica to pull from the leader. This seems like an oversight in
> the mergeindex interaction with SolrCloud. Seems like the simplest
> would be for the leader to send all replicas a request recovery
> command after performing the merge.
>  
> Advice?
>  
> Cheers,
> Tim
>  



Re: Any recommendation for Solr Cloud version.

2014-08-19 Thread Mark Miller


On August 19, 2014 at 2:39:32 AM, Lee Chunki (lck7...@coupang.com) wrote:
> > the sooner the better? i.e. version 4.9.0.

Yes, certainly. 

-- 
Mark Miller
about.me/markrmiller


Re: Disabling transaction logs

2014-08-13 Thread Mark Miller
That is good testing :) We should track down what is up with that 30%. Might 
open a JIRA with some logs.

It can help if you restart the overseer node last.

There are likely some improvements around this post 4.6.

-- 
Mark Miller
about.me/markrmiller

On August 13, 2014 at 12:05:27 PM, KNitin (nitin.t...@gmail.com) wrote:
> Thank u all! Yes I want to disable it for testing purposes
> 
> The main issue is that rolling restart of solrcloud for 1000 collections is
> extremely unreliable and slow. More than 30% of the collections fail to
> recover.
> 
> What are some good guidelines to follow while restarting a massive cluster
> like this ?
> 
> Are there any new improvements (post 4.6) in solr that helps restarts to be
> more robust ?
> 
> Thanks
> 
> On Sunday, August 10, 2014, rulinma wrote:
> 
> > good.
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Disabling-transaction-logs-tp4151721p415.html
> >  
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> 



Re: Are there any performance impact of using a non-standard length UUID as the unique key of Solr?

2014-07-24 Thread Mark Miller
Some good info on unique id’s for Lucene / Solr can be found here: 
http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
-- 
Mark Miller
about.me/markrmiller

On July 24, 2014 at 9:51:28 PM, He haobo (haob...@gmail.com) wrote:

Hi,  

In our Solr collection (Solr 4.8), we have the following unique key  
definition.  
  

id  


In our external java program, we will generate an UUID with  
UUID.randomUUID().toString() first. Then, we will use Cryptographic hash to  
generate a 32 bytes length text and finally use it as id.  

For now, we might need to post more than 20k Solr docs per second. Then  
UUID.randomUUID() or the Cryptographic hash stuff might take time. We might  
have a simple workaround to share one Cryptographic hash stuff for many  
Solr docs. Namely, we want to append sequence to Cryptographic hash such  
as 9AD0BB6DDD7AA9FE4D9EB1FF16B3BDFY00,  
9AD0BB6DDD7AA9FE4D9EB1FF16B3BDFY01,  
9AD0BB6DDD7AA9FE4D9EB1FF16B3BDFY02, etc.  


What we want to know, if we use a 38 bytes length id, are there any  
performance impact for Solr data insert or query? Or, if we use Solr's  
default automatically generated id implementation, should it be more  
efficient?  



Thanks,  
Eternal  


Re: SolrCloud replica dies under high throughput

2014-07-21 Thread Mark Miller
Looks like you probably have to raise the http client connection pool limits to 
handle that kind of load currently.

They are specified as top level config in solr.xml:

maxUpdateConnections
maxUpdateConnectionsPerHost

--  
Mark Miller
about.me/markrmiller

On July 21, 2014 at 7:14:59 PM, Darren Lee (d...@amplience.com) wrote:
> Hi,
>  
> I'm doing some benchmarking with Solr Cloud 4.9.0. I am trying to work out 
> exactly how  
> much throughput my cluster can handle.
>  
> Consistently in my test I see a replica go into recovering state forever 
> caused by what  
> looks like a timeout during replication. I can understand the timeout and 
> failure (I  
> am hitting it fairly hard) but what seems odd to me is that when I stop the 
> heavy load it still  
> does not recover the next time it tries, it seems broken forever until I 
> manually go in,  
> clear the index and let it do a full resync.
>  
> Is this normal? Am I misunderstanding something? My cluster has 4 nodes (2 
> shards, 2 replicas)  
> (AWS m3.2xlarge). I am indexing with ~800 concurrent connections and a 10 sec 
> soft commit.  
> I consistently get this problem with a throughput of around 1.5 million 
> documents per  
> hour.
>  
> Thanks all,
> Darren
>  
>  
> Stack Traces & Messages:
>  
> [qtp779330563-627] ERROR org.apache.solr.servlet.SolrDispatchFilter â 
> null:org.apache.http.conn.ConnectionPoolTimeoutException:  
> Timeout waiting for connection from pool
> at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226)
>   
> at 
> org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
>   
> at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:422)
>   
> at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>   
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>   
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>   
> at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>   
> at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
>   
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   
> at java.lang.Thread.run(Thread.java:724)
>  
> Error while trying to recover. 
> core=assets_shard2_replica1:java.util.concurrent.ExecutionException:  
> org.apache.solr.client.solrj.SolrServerException: IOException occured when  
> talking to server at: http://xxx.xxx.15.171:8080/solr
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:188)
> at 
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
>   
> at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)  
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)  
> Caused by: org.apache.solr.client.solrj.SolrServerException: IOException 
> occured  
> when talking to server at: http://xxx.xxx.15.171:8080/solr
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
>   
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
>   
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
>   
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.net.SocketException: Socket closed
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at 
> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
>   
> at 
> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
>   
> at 
> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
>   
> at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:

Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Mark Miller
I think that’s pretty much a search time param, though it might end being used 
on the update side as well. In any case, I know it doesn’t affect commit or 
optimize.

Also, to my knowledge, SolrCloud optimize support was never explicitly added or 
tested.

--  
Mark Miller
about.me/markrmiller

On July 9, 2014 at 12:00:27 PM, Shawn Heisey (s...@elyograg.org) wrote:
> > I thought a bug had been filed on the distrib=false problem,  



Re: Slow inserts when using Solr Cloud

2014-07-08 Thread Mark Miller
Updates are currently done locally before concurrently being sent to all 
replicas - so on a single update, you can expect 2x just from that.

As for your results, it sounds like perhaps there is more overhead than we 
would like in the code that sends to replicas and forwards updates? Someone 
would have to dig in to really know I think. I would doubt it’s a configuration 
issue, but you never know.

-- 
Mark Miller
about.me/markrmiller

On July 8, 2014 at 9:18:28 AM, Ian Williams (NWIS - Applications Design) 
(ian.willi...@wales.nhs.uk) wrote:

Hi  

I'm encountering a surprisingly high increase in response times when I insert 
new documents into a SolrCloud, compared with a standalone Solr instance.  

I have a SolrCloud set up for test and evaluation purposes. I have four shards, 
each with a leader and a replica, distributed over four Windows virtual 
servers. I have zookeeper running on three of the four servers. There are not 
many documents in my SolrCloud (just a few hundred). I am using composite id 
routing, specifying a prefix to my document ids which is then used by Solr to 
determine which shard the document should be stored on.  

I determine in advance which shard a document with a given id prefix will end 
up in, by trying it out in advance. I then try the following scenarios, using 
inserts without commits. E.g. I use:  
curl http://servername:port/solr/update -H "Content-Type: text/xml" 
--data-binary @test.txt  

1. Insert a document, sending it to the server hosting the correct shard, with 
replicas turned off (response time <20ms)  
I find that if I 'switch off' the replicas for my shard (by shutting down Solr 
for the replicas), and then I send the new document to the server hosting the 
leader for the correct shard, then I get a very fast response, i.e. under 10ms, 
which is similar to the performance I get when not using SolrCloud. This is 
expected, as I've removed any overhead to do with replicas or routing to the 
correct shard.  

2. Insert a document, sending it to the server hosting the correct shard, but 
with replicas turned on (response time approx 250ms)  
If I switch on the replica for that shard, then my average response time for an 
insert increases from <10ms to around 250ms. Now I expect an overhead, because 
the leader has to find out where the replica is (from Zookeeper?) and then 
forward the request to that replica, then wait for a reply - but an increase 
from <20ms to 250ms seems very high?  

3. Insert a document, sending it to a server hosting the incorrect shard, with 
replicas turned on (response time approx 500ms)  
If I do the same thing again but this time send to the server hosting a 
different shard to the shard my document will end up in, the average response 
times increase again to around 500ms. Again, I'd expect an increase because of 
the extra step of needing to forward to the correct shard, but the increase 
seems very high?  


Should I expect this much of an overhead for shard routing and replicas, or 
might this indicate a problem in my configuration?  

Many thanks  
Ian  

---  
Mae?r wybodaeth a gynhwysir yn y neges e-bost hon ac yn unrhyw atodiadau?n 
gyfrinachol. Os ydych yn ei derbyn ar gam, rhowch wybod i?r anfonwr a?i dileu?n 
ddi-oed. Ni fwriedir i ddatgelu i unrhyw un heblaw am y derbynnydd, boed yn 
anfwriadol neu fel arall, hepgor cyfrinachedd. Efallai bydd Gwasanaeth Gwybodeg 
GIG Cymru (NWIS) yn monitro ac yn cofnodi pob neges e-bost rhag firysau a 
defnydd amhriodol. Mae?n bosibl y bydd y neges e-bost hon ac unrhyw atebion neu 
atodiadau dilynol yn ddarostyngedig i?r Ddeddf Rhyddid Gwybodaeth. Mae?r farn a 
fynegir yn y neges e-bost hon yn perthyn i?r anfonwr ac nid ydynt o reidrwydd 
yn perthyn i NWIS.  

The information included in this email and any attachments is confidential. If 
received in error, please notify the sender and delete it immediately. 
Disclosure to any party other than the addressee, whether unintentional or 
otherwise, is not intended to waive confidentiality. The NHS Wales Informatics 
Service (NWIS) may monitor and record all emails for viruses and inappropriate 
use. This e-mail and any subsequent replies or attachments may be subject to 
the Freedom of Information Act. The views expressed in this email are those of 
the sender and not necessarily of NWIS.  
---  


Re: Question about solrcloud recovery process

2014-07-03 Thread Mark Miller
I don’t know offhand about the num docs issue - are you doing NRT?

As far as being able to query the replica, I’m not sure anyone ever got to 
making that fail if you directly query a node that is not active. It certainly 
came up, but I have no memory of anyone tackling it. Of course in many other 
cases, information is being pulled from zookeeper and recovering nodes are 
ignored. If this is the issue I think it is, it should only be an issue when 
you directly query recovery node.

The CloudSolrServer client works around this issue as well.

-- 
Mark Miller
about.me/markrmiller

On July 3, 2014 at 8:42:48 AM, Peter Keegan (peterlkee...@gmail.com) wrote:

I bring up a new Solr node with no index and watch the index being  
replicated from the leader. The index size is 12G and the replication takes  
about 6 minutes, according to the replica log (from 'Starting recovery  
process' to 'Finished recovery process). However, shortly after the  
replication begins, while the index files are being copied, I am able to  
query the index on the replica and see q=*:* find all of the documents.  
But, from the core admin screen, numDocs = 0, and in the cloud screen the  
replica is in 'recovering' mode. How can this be?  

Peter  


Re: SolrCloud multiple data center support

2014-06-23 Thread Mark Miller
We have been waiting for that issue to be finished before thinking too hard 
about how it can improve things. There have been a couple ideas (I’ve mostly 
wanted it for improving the internal zk mode situation), but no JIRAs yet that 
I know of.
-- 
Mark Miller
about.me/markrmiller

On June 23, 2014 at 10:37:27 AM, Arcadius Ahouansou (arcad...@menelic.com) 
wrote:

On 3 February 2014 22:16, Daniel Collins  wrote:  

>  
> One other option is in ZK trunk (but not yet in a release) is the ability  
> to dynamically reconfigure ZK ensembles (  
> https://issues.apache.org/jira/browse/ZOOKEEPER-107). That would give the  
> ability to create new ZK instances in the event of a DC failure, and  
> reconfigure the Solr Cloud without having to reload everything. That would  
> help to some extent.  
>  


ZOOKEEPER-107 has now been implemented.  
I checked the Solr Jira and it seems there is nothing for multi-data-center  
support.  

Do we need to create a ticket or is there already one?  

Thanks.  

Arcadius.  


Re: SolrCloud: AliasAPI-Maximum number of collections

2014-06-06 Thread Mark Miller
The main limit is the 1mb zk node limit. But even that can be raised. 

- Mark

> On Jun 6, 2014, at 6:21 AM, Shalin Shekhar Mangar  
> wrote:
> 
> No, there's no theoretical limit.
> 
> 
>> On Fri, Jun 6, 2014 at 11:20 AM, ku3ia  wrote:
>> 
>> Hi all!
>> The question is how many collections I can put to one alias, using
>> SolrCloud
>> alias collection API
>> 
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api4
>> to process distributed requests? Is it limited?
>> 
>> Thanks.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SolrCloud-AliasAPI-Maximum-number-of-collections-tp4140305.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.


Re: Replica active during warming

2014-05-13 Thread Mark Miller
If you are sure about this, can you file a JIRA issue?
-- 
Mark Miller
about.me/markrmiller

On May 12, 2014 at 8:50:42 PM, lboutros (boutr...@gmail.com) wrote:

Dear All,  

we just finished the migration of a cluster from Solr 4.3.1 to Solr 4.6.1.  
With solr 4.3.1 a node was not considered as active before the end of the  
warming process.  

Now, with solr 4.6.1 a replica is considered as active during the warming  
process.  
This means that if you restart a replica or create a new one, queries will  
be send to this replica and the query will hang until the end of the warming  
process (We do not use cold searchers).  

We have quite long warming queries and this is a big issue.  
Is there a parameter I do not know that could control this behavior ?  

thanks,  

Ludovic.  



-  
Jouve  
France.  
--  
View this message in context: 
http://lucene.472066.n3.nabble.com/Replica-active-during-warming-tp4135274.html 
 
Sent from the Solr - User mailing list archive at Nabble.com.  


Re: overseer queue clogged

2014-05-01 Thread Mark Miller
What version are you running? This was fixed in a recent release. It can happen 
if you hit add core with the defaults on the admin page in older versions.

-- 
Mark Miller
about.me/markrmiller

On May 1, 2014 at 11:19:54 AM, ryan.cooke (ryan.co...@gmail.com) wrote:

I saw an overseer queue clogged as well due to a bad message in the queue.  
Unfortunately this went unnoticed for a while until there were 130K messages  
in the overseer queue. Since it was a production system we were not able to  
simply stop everything and delete all Zookeeper data, so we manually deleted  
messages by issuing commands directly through the zkCli.sh tool. After all  
the messages had been cleared, some nodes were in the wrong state (e.g.  
'down' when should have been 'active'). Restarting the 'down' or 'recovery  
failed' nodes brought the whole cluster back to a stable and healthy state.  

Since it can take some digging to determine backlog in the overseer queue,  
some of the symptoms we saw were:  
Overseer throwing an exception like "Path must not end with / character"  
Random nodes throwing an exception like "ClusterState says we are the  
leader, but locally we don't think so"  
Bringing up new replicas time out when attempting to fetch shard id  



--  
View this message in context: 
http://lucene.472066.n3.nabble.com/overseer-queue-clogged-tp4047878p4134129.html
  
Sent from the Solr - User mailing list archive at Nabble.com.  


Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
bq. due to things like NTP, etc.

The full sentence is very important. NTP is not the only way for this to happen 
- you also have leap seconds, daylight savings time, internet clock sync, a 
whole host of things that affect currentTimeMillis and not nanoTime. It is 
without question the way to go to even hope for monotonicity.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 1:11:14 PM, Walter Underwood (wun...@wunderwood.org) wrote:

NTP works very hard to keep the clock positive monotonic. But nanoTime is 
intended for elapsed time measurement anyway, so it is the right choice.  

You can get some pretty fun clock behavior by running on virtual machines, like 
in AWS. And some system real time clocks don't tick during a leap second. And 
Windows system clocks are probably still hopeless.  

If you want to run the clock backwards, we don't need NTP, we can set it with 
"date".  

wunder  

On Apr 26, 2014, at 9:10 AM, Mark Miller  wrote:  

> My answer remains the same. I guess if you want more precise terminology, 
> nanoTime will generally be monotonic and currentTimeMillis will not be, due 
> to things like NTP, etc. You want monotonicity for measuring elapsed times.  
> --  
> Mark Miller  
> about.me/markrmiller  
>  
> On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) 
> wrote:  
>  
> NTP should slew the clock rather than jump it. I haven't checked recently, 
> but that is how it worked in the 90's when I was organizing the NTP hierarchy 
> at HP.  
>  
> It only does step changes if the clocks is really wrong. That is most likely 
> at reboot, when other demons aren't running yet.  
>  
> wunder  
>  
> On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:  
>  
>> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
>> to count elapsed time, you don’t want to use a method that can jump around 
>> with the results.  
>> --  
>> Mark Miller  
>> about.me/markrmiller  
>>  
>> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
>> wrote:  
>>  
>> Hi Rafał Kuć  
>> I got it,the point is many operating systems measure time in units of  
>> tens of milliseconds,and the System.currentTimeMillis() is just base on  
>> operating system.  
>> In my case,I just do DIH with a crontable, Is there any possiblity to get  
>> in that trouble?I am really can not picture what the situation may lead to  
>> the problem.  
>>  
>>  
>> Thanks very much.  
>>  
>>  
>> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
>>  
>>> Hi Mark Miller  
>>> Sorry to get you in these discussion .  
>>> I notice that Mark Miller report this issure in  
>>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>>> the zookeeper.  
>>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>>> problem.  
>>> Please give some hints  
>>>  
>>>>> Bonus,just post the last mail I send about the problem:  
>>>  
>>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>>> Notice that the time in the getConnection function is declared with the  
>>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>>  
>>> Thank you very much.  
>>>  
>>>  
>>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>>>  
>>> Hi  
>>>> I have just compare the difference between the version 4.6.0 and  
>>>> 4.7.1. Notice that the time in the getConnection function is declared  
>>>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>>> Curious about the resson for the change.the benefit of it .Is it  
>>>> neccessory?  
>>>> I have read the SOLR-5734 ,  
>>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>>> Do some google about the difference of currentTimeMillis and nano,but  
>>>> still can not figure out it.  
>>>>  
>>>>  
>>>>  
>>>>  
>>>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>>>  
>>>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
>>>>&

Re: zkCli zkhost parameter

2014-04-26 Thread Mark Miller
Have you tried a comma-separated list or are you going by documentation? It 
should work. 
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 1:03:25 PM, Scott Stults 
(sstu...@opensourceconnections.com) wrote:

It looks like this only takes a single host as its value, whereas the  
zkHost environment variable for Solr takes a comma-separated list.  
Shouldn't the client also take a comma-separated list?  

k/r,  
Scott  


Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
My answer remains the same. I guess if you want more precise terminology, 
nanoTime will generally be monotonic and currentTimeMillis will not be, due to 
things like NTP, etc. You want monotonicity for measuring elapsed times.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) 
wrote:

NTP should slew the clock rather than jump it. I haven't checked recently, but 
that is how it worked in the 90's when I was organizing the NTP hierarchy at 
HP.  

It only does step changes if the clocks is really wrong. That is most likely at 
reboot, when other demons aren't running yet.  

wunder  

On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:  

> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
> to count elapsed time, you don’t want to use a method that can jump around 
> with the results.  
> --  
> Mark Miller  
> about.me/markrmiller  
>  
> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
> wrote:  
>  
> Hi Rafał Kuć  
> I got it,the point is many operating systems measure time in units of  
> tens of milliseconds,and the System.currentTimeMillis() is just base on  
> operating system.  
> In my case,I just do DIH with a crontable, Is there any possiblity to get  
> in that trouble?I am really can not picture what the situation may lead to  
> the problem.  
>  
>  
> Thanks very much.  
>  
>  
> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
>  
>> Hi Mark Miller  
>> Sorry to get you in these discussion .  
>> I notice that Mark Miller report this issure in  
>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>> the zookeeper.  
>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>> problem.  
>> Please give some hints  
>>  
>>>> Bonus,just post the last mail I send about the problem:  
>>  
>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>> Notice that the time in the getConnection function is declared with the  
>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>> Curious about the resson for the change.the benefit of it .Is it  
>> neccessory?  
>> I have read the SOLR-5734 ,  
>> https://issues.apache.org/jira/browse/SOLR-5734  
>> Do some google about the difference of currentTimeMillis and nano,but  
>> still can not figure out it.  
>>  
>> Thank you very much.  
>>  
>>  
>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>>  
>> Hi  
>>> I have just compare the difference between the version 4.6.0 and  
>>> 4.7.1. Notice that the time in the getConnection function is declared  
>>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>>  
>>>  
>>>  
>>>  
>>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>>  
>>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
>>>>  
>>>>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
>>>>> process that we are using takes 4x as long to complete. The only odd  
>>>>> thing I notice is when I enable debug logging for the dataimporthandler  
>>>>> process, it appears that in the new version each sql query is resulting  
>>>>> in  
>>>>> a new connection opened through jdbcdatasource (log:  
>>>>> http://pastebin.com/JKh4gpmu). Were there any changes that would  
>>>>> affect  
>>>>> the speed of running a full import?  
>>>>>  
>>>>  
>>>> This is most likely the problem you are experiencing:  
>>>>  
>>>> https://issues.apache.org/jira/browse/SOLR-5954  
>>>>  
>>>> The fix will be in the new 4.8 version. The release process for 4.8 is  
>>>> underway right now. A second release candidate was required yesterday. If  
>>>> no further problems are encountered, the release should be made around the 
>>>>  
>>>> middle of next week. If problems are encountered, the release will be  
>>>> delayed.  
>>>>  
>>>> Here's something very important that has been mentioned before: Solr  
>>>> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
>>>> current release from Oracle as I write this) is recommended as a minimum.  
>>>>  
>>>> If a 4.7.3 version is built, this is a fix that we should backport.  
>>>>  
>>>> Thanks,  
>>>> Shawn  
>>>>  
>>>>  
>>>  
>>  

--  
Walter Underwood  
wun...@wunderwood.org  





Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
System.currentTimeMillis can jump around due to NTP, etc. If you are trying to 
count elapsed time, you don’t want to use a method that can jump around with 
the results.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) wrote:

Hi Rafał Kuć  
I got it,the point is many operating systems measure time in units of  
tens of milliseconds,and the System.currentTimeMillis() is just base on  
operating system.  
In my case,I just do DIH with a crontable, Is there any possiblity to get  
in that trouble?I am really can not picture what the situation may lead to  
the problem.  


Thanks very much.  


2014-04-26 20:49 GMT+08:00 YouPeng Yang :  

> Hi Mark Miller  
> Sorry to get you in these discussion .  
> I notice that Mark Miller report this issure in  
> https://issues.apache.org/jira/browse/SOLR-5734 according to  
> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
> the zookeeper.  
> If I just do DIH with JDBCDataSource ,I do not think it will get the  
> problem.  
> Please give some hints  
>  
> >> Bonus,just post the last mail I send about the problem:  
>  
> I have just compare the difference between the version 4.6.0 and 4.7.1.  
> Notice that the time in the getConnection function is declared with the  
> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
> Curious about the resson for the change.the benefit of it .Is it  
> neccessory?  
> I have read the SOLR-5734 ,  
> https://issues.apache.org/jira/browse/SOLR-5734  
> Do some google about the difference of currentTimeMillis and nano,but  
> still can not figure out it.  
>  
> Thank you very much.  
>  
>  
> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>  
> Hi  
>> I have just compare the difference between the version 4.6.0 and  
>> 4.7.1. Notice that the time in the getConnection function is declared  
>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>> Curious about the resson for the change.the benefit of it .Is it  
>> neccessory?  
>> I have read the SOLR-5734 ,  
>> https://issues.apache.org/jira/browse/SOLR-5734  
>> Do some google about the difference of currentTimeMillis and nano,but  
>> still can not figure out it.  
>>  
>>  
>>  
>>  
>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>  
>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
>>>  
>>>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
>>>> process that we are using takes 4x as long to complete. The only odd  
>>>> thing I notice is when I enable debug logging for the dataimporthandler  
>>>> process, it appears that in the new version each sql query is resulting  
>>>> in  
>>>> a new connection opened through jdbcdatasource (log:  
>>>> http://pastebin.com/JKh4gpmu). Were there any changes that would  
>>>> affect  
>>>> the speed of running a full import?  
>>>>  
>>>  
>>> This is most likely the problem you are experiencing:  
>>>  
>>> https://issues.apache.org/jira/browse/SOLR-5954  
>>>  
>>> The fix will be in the new 4.8 version. The release process for 4.8 is  
>>> underway right now. A second release candidate was required yesterday. If  
>>> no further problems are encountered, the release should be made around the  
>>> middle of next week. If problems are encountered, the release will be  
>>> delayed.  
>>>  
>>> Here's something very important that has been mentioned before: Solr  
>>> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
>>> current release from Oracle as I write this) is recommended as a minimum.  
>>>  
>>> If a 4.7.3 version is built, this is a fix that we should backport.  
>>>  
>>> Thanks,  
>>> Shawn  
>>>  
>>>  
>>  
>  


Re: Confusion when using go-live and MapReduceIndexerTool

2014-04-23 Thread Mark Miller
Currently, go-live is only supported when you are running Solr on HDFS.

bq. The indexes must exist on the disk of the Solr host

This does not apply when you are running Solr on HDFS. It’s a shared 
filesystem, so local does not matter here.

"no writes should be allowed on either core until the merge is complete. If 
writes are allowed, corruption may occur on the merged index.”
Doesn’t sound right to me at all.

-- 
Mark Miller
about.me/markrmiller

On April 22, 2014 at 10:38:08 AM, Brett Hoerner (br...@bretthoerner.com) wrote:

I think I'm just misunderstanding the use of go-live. From mergeindexes  
docs: "The indexes must exist on the disk of the Solr host, which may make  
using this in a distributed environment cumbersome."  

I'm guessing I'll have to write some sort of tool that pulls each completed  
index out of HDFS and onto the respective SolrCloud machines and manually  
do some kind of merge? I don't want to (can't) be running my Hadoop jobs on  
the same nodes that SolrCloud is running on...  

Also confusing to me: "no writes should be allowed on either core until the  
merge is complete. If writes are allowed, corruption may occur on the  
merged index." Is that saying that Solr will block writes, or is that  
saying the end user has to ensure no writes are happening against the  
collection during a merge? That seems... risky?  


On Tue, Apr 22, 2014 at 9:29 AM, Brett Hoerner wrote:  

> Anyone have any thoughts on this?  
>  
> In general, am I expected to be able to go-live from an unrelated cluster  
> of Hadoop machines to a SolrCloud that isn't running off of HDFS?  
>  
> intput: HDFS  
> output: HDFS  
> go-live cluster: SolrCloud cluster on different machines running on plain  
> MMapDirectory  
>  
> I'm back to looking at the code but holy hell is debugging Hadoop hard. :)  
>  
>  
> On Thu, Apr 17, 2014 at 12:33 PM, Brett Hoerner 
> wrote:  
>  
>> https://gist.github.com/bretthoerner/0dc6bfdbf45a18328d4b  
>>  
>>  
>> On Thu, Apr 17, 2014 at 11:31 AM, Mark Miller wrote:  
>>  
>>> Odd - might be helpful if you can share your sorlconfig.xml being used.  
>>>  
>>> --  
>>> Mark Miller  
>>> about.me/markrmiller  
>>>  
>>> On April 17, 2014 at 12:18:37 PM, Brett Hoerner (br...@bretthoerner.com)  
>>> wrote:  
>>>  
>>> I'm doing HDFS input and output in my job, with the following:  
>>>  
>>> hadoop jar /mnt/faas-solr.jar \  
>>> -D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \  
>>> --update-conflict-resolver com.massrel.faassolr.SolrConflictResolver  
>>> \  
>>> --morphline-file /mnt/morphline-ignore.conf \  
>>> --zk-host $ZKHOST \  
>>> --output-dir hdfs://$MASTERIP:9000/output/ \  
>>> --collection $COLLECTION \  
>>> --go-live \  
>>> --verbose \  
>>> hdfs://$MASTERIP:9000/input/  
>>>  
>>> Index creation works,  
>>>  
>>> $ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0  
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data  
>>> drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index  
>>> -rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdt  
>>> -rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fdx  
>>> -rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.fnm  
>>> -rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0.si  
>>> -rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc  
>>> -rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos  
>>> -rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim  
>>> -rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip  
>>> -rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs://  
>>> 10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd  
>>> -rwxr-xr-x 

Re: Confusion when using go-live and MapReduceIndexerTool

2014-04-17 Thread Mark Miller
Odd - might be helpful if you can share your sorlconfig.xml being used.

-- 
Mark Miller
about.me/markrmiller

On April 17, 2014 at 12:18:37 PM, Brett Hoerner (br...@bretthoerner.com) wrote:

I'm doing HDFS input and output in my job, with the following:  

hadoop jar /mnt/faas-solr.jar \  
-D mapreduce.job.map.class=com.massrel.faassolr.SolrMapper \  
--update-conflict-resolver com.massrel.faassolr.SolrConflictResolver  
\  
--morphline-file /mnt/morphline-ignore.conf \  
--zk-host $ZKHOST \  
--output-dir hdfs://$MASTERIP:9000/output/ \  
--collection $COLLECTION \  
--go-live \  
--verbose \  
hdfs://$MASTERIP:9000/input/  

Index creation works,  

$ hadoop fs -ls -R hdfs://$MASTERIP:9000/output/results/part-0  
drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data  
drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index  
-rwxr-xr-x 1 hadoop supergroup 61 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0.fdt  
-rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0.fdx  
-rwxr-xr-x 1 hadoop supergroup 1681 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0.fnm  
-rwxr-xr-x 1 hadoop supergroup 396 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0.si  
-rwxr-xr-x 1 hadoop supergroup 67 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.doc  
-rwxr-xr-x 1 hadoop supergroup 37 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.pos  
-rwxr-xr-x 1 hadoop supergroup 508 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tim  
-rwxr-xr-x 1 hadoop supergroup 305 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene41_0.tip  
-rwxr-xr-x 1 hadoop supergroup 120 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvd  
-rwxr-xr-x 1 hadoop supergroup 351 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/_0_Lucene45_0.dvm  
-rwxr-xr-x 1 hadoop supergroup 45 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/segments_1  
-rwxr-xr-x 1 hadoop supergroup 110 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/index/segments_2  
drwxr-xr-x - hadoop supergroup 0 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/tlog  
-rw-r--r-- 1 hadoop supergroup 333 2014-04-17 16:00 hdfs://  
10.98.33.114:9000/output/results/part-0/data/tlog/tlog.000  

But the go-live step fails, it's trying to use the HDFS path as the remote  
index path?  

14/04/17 16:00:31 INFO hadoop.GoLive: Live merging of output shards into  
Solr cluster...  
14/04/17 16:00:31 INFO hadoop.GoLive: Live merge hdfs://  
10.98.33.114:9000/output/results/part-0 into  
http://discover8-test-1d.i.massrel.com:8983/solr  
14/04/17 16:00:31 ERROR hadoop.GoLive: Error sending live merge command  
java.util.concurrent.ExecutionException:  
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:  
directory '/mnt/solr_8983/home/hdfs:/  
10.98.33.114:9000/output/results/part-0/data/index' does not exist  
at java.util.concurrent.FutureTask.report(FutureTask.java:122)  
at java.util.concurrent.FutureTask.get(FutureTask.java:188)  
at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)  
at  
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)  
at  
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)  
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)  
at  
org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596) 
 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
at  
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
at  
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
at java.lang.reflect.Method.invoke(Method.java:606)  
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)  
Caused by:  
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:  
directory '/mnt/solr_8983/home/hdfs:/  
10.98.33.114:9000/output/results/part-0/data/index' does not exist  
at  
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
  
at  
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
  
at  
org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
  
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)  
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)  
at java.util.concurrent.FutureTask

Re: waitForLeaderToSeeDownState when leader is down

2014-04-16 Thread Mark Miller
What version are you testing? Thought we had addressed this.
-- 
Mark Miller
about.me/markrmiller

On April 16, 2014 at 6:02:09 PM, Jessica Mallet (mewmewb...@gmail.com) wrote:

Hi Furkan,  

Thanks for the reply. I understand the intent. However, in the case I  
described, the follower is blocked on looking for a leader (throws the  
pasted exception because it can't find the leader) before it participates  
in election; therefore, it will never come up while the leader waits for it  
to come up (they're deadlocked waiting for each other). What I'm suggesting  
is that maybe the follower should just just skip waitForLeaderToSeeDownState  
when there's no leader (instead of failing with the pasted stacktrace) and  
go ahead and start participating in election. That way the leader will see  
more replicas come up, and they can sync with each other and move on.  

Thanks,  
Jessica  


On Sat, Apr 12, 2014 at 4:14 PM, Furkan KAMACI wrote:  

> Hi;  
>  
> There is an explanation as follows: "This is meant to protect the case  
> where you stop a shard or it fails and then the first node to get started  
> back up has stale data - you don't want it to just become the leader. So we  
> wait to see everyone we know about in the shard up to 3 or 5 min by  
> default. Then we know all the shards participate in the leader election and  
> the leader will end up with all updates it should have." You can check it  
> from here:  
>  
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3ccajt9wng_yykcxggentgcxguhhcjhidear-jygpgrnkaedrz...@mail.gmail.com%3E
>   
>  
> Thanks;  
> Furkan KAMACI  
>  
>  
> 2014-04-08 23:51 GMT+03:00 Jessica Mallet :  
>  
> > To clarify, when I said "leader" and "follower" I meant the old leader  
> and  
> > follower before the zookeeper session expiration. When they're recovering  
> > there's no leader.  
> >  
> >  
> > On Tue, Apr 8, 2014 at 1:49 PM, Jessica Mallet   
> > wrote:  
> >  
> > > I'm playing with dropping the cluster's connections to zookeeper and  
> then  
> > > reconnecting them, and during recovery, I always see this on the  
> leader's  
> > > logs:  
> > >  
> > > ElectionContext.java (line 361) Waiting until we see more replicas up  
> for  
> > > shard shard1: total=2 found=1 timeoutin=139902  
> > >  
> > > and then on the follower, I see:  
> > > SolrException.java (line 121) There was a problem finding the leader in  
> > > zk:org.apache.solr.common.SolrException: Could not get leader props  
> > > at  
> > >  
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:958)  
> > > at  
> > >  
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:922)  
> > > at  
> > >  
> >  
> org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1463)
>   
> > > at  
> > >  
> >  
> org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:380)
>   
> > > at  
> > > org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)  
> > > at  
> > > org.apache.solr.cloud.ZkController$1.command(ZkController.java:232)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.ConnectionManager$2$1.run(ConnectionManager.java:179)
>   
> > > Caused by: org.apache.zookeeper.KeeperException$NoNodeException:  
> > > KeeperErrorCode = NoNode for /collections/lc4/leaders/shard1  
> > > at  
> > > org.apache.zookeeper.KeeperException.create(KeeperException.java:111)  
> > > at  
> > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)  
> > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:273)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:270)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
>   
> > > at  
> > >  
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:270)  
> > > at  
> > >  
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:936)  
> > > ... 6 more  
> > >  
> > > They block each other's progress until leader decides to give up and  
> not  
> > > wait for more replicas to come up:  
> > >  
> > > ElectionContext.java (line 368) Was waiting for replicas to come up,  
> but  
> > > they are taking too long - assuming they won't come back till later  
> > >  
> > > and then recovery moves forward again.  
> > >  
> > > Should waitForLeaderToSeeDownState move on if there's no leader at the  
> > > moment?  
> > > Thanks,  
> > > Jessica  
> > >  
> >  
>  


Re: clusterstate.json does not reflect current state of down versus active

2014-04-16 Thread Mark Miller
bq.  before any of Solr gets to do its shutdown sequence
Yeah, this is kind of an open issue. There might be a JIRA for it, but I cannot 
remember. What we really need is an explicit shutdown call that can be made 
before stopping jetty so that it’s done gracefully.

-- 
Mark Miller
about.me/markrmiller

On April 16, 2014 at 2:54:15 PM, Daniel Collins (danwcoll...@gmail.com) wrote:

We actually have a similar scenario, we have 64 cores per machine, and even  
that sometimes has issues when we shutdown all cores at once. We did start  
to write a "force election for Shard X" tool but it was harder than we  
expected, its still on our to-do list.  

Some context, we run 256 shards spread over 4 machines, and several Solr  
instances per machine (16 cores per instance, 4 instances per machine).  
Our machines regularly go down for maintenance, and shutting down the Solr  
core closes the HTTP interface (at Jetty level) before any of Solr gets to  
do its shutdown sequence: publishing as down, election, etc. Since we run  
an NRT system, that causes all kinds of backlogs in the indexing pipeline  
whilst Solr queues up indexing requests waiting for a valid leader...  
Hence the need for an API to move leadership off the instance, *before* we  
begin shutdown.  

Any insight would be appreciated, we are happy to contribute this back if  
we can get it working!  


On 16 April 2014 15:49, Shawn Heisey  wrote:  

> On 4/16/2014 8:02 AM, Rich Mayfield wrote:  
> > However there doesn’t appear to be a way to force leadership to/from a  
> > particular replica.  
>  
> I would have expected that doing a core reload on the current leader  
> would force an election and move the leader, but on my 4.2.1 SolrCloud  
> (the only version I have running at the moment) that does not appear to  
> be happening. IMHO we need a way to force a leader change on a shard.  
> An API for "move all leaders currently on this Solr instance" would  
> actually be a very useful feature.  
>  
> I can envision two issues for you to file in Jira. The first would be  
> an Improvement issue, the second would be a Bug:  
>  
> * SolrCloud: Add API to move leader off a Solr instance  
> * SolrCloud: LotsOfCollections takes a long time to stabilize  
>  
> If we can get a dev who specializes in SolrCloud to respond, perhaps  
> they'll have a recommendation about whether these are sensible issues,  
> and if not, what they'd recommend.  
>  
> Thanks,  
> Shawn  
>  
>  


Re: Distributed commits in CloudSolrServer

2014-04-15 Thread Mark Miller
Inline responses below.
-- 
Mark Miller
about.me/markrmiller

On April 15, 2014 at 2:12:31 PM, Peter Keegan (peterlkee...@gmail.com) wrote:

I have a SolrCloud index, 1 shard, with a leader and one replica, and 3 
ZKs. The Solr indexes are behind a load balancer. There is one 
CloudSolrServer client updating the indexes. The index schema includes 3 
ExternalFileFields. When the CloudSolrServer client issues a hard commit, I 
observe that the commits occur sequentially, not in parallel, on the leader 
and replica. The duration of each commit is about a minute. Most of this 
time is spent reloading the 3 ExternalFileField files. Because of the 
sequential commits, there is a period of time (1 minute+) when the index 
searchers will return different results, which can cause a bad user 
experience. This will get worse as replicas are added to handle 
auto-scaling. The goal is to keep all replicas in sync w.r.t. the user 
queries. 

My questions: 

1. Is there a reason that the distributed commits are done in sequence, not 
in parallel? Is there a way to change this behavior? 


The reason is that updates are currently done this way - it’s the only safe way 
to do it without solving some more problems. I don’t think you can easily 
change this. I think we should probably file a JIRA issue to track a better 
solution for commit handling. I think there are some complications because of 
how commits can be added on update requests, but its something we probably want 
to try and solve before tackling *all* updates to replicas in parallel with the 
leader.



2. If instead, the commits were done in parallel by a separate client via a 
GET to each Solr instance, how would this client get the host/port values 
for each Solr instance from zookeeper? Are there any downsides to doing 
commits this way? 

Not really, other than the extra management.





Thanks, 
Peter 


Re: Race condition in Leader Election

2014-04-15 Thread Mark Miller
We have to fix that then.

-- 
Mark Miller
about.me/markrmiller

On April 15, 2014 at 12:20:03 PM, Rich Mayfield (mayfield.r...@gmail.com) wrote:

I see something similar where, given ~1000 shards, both nodes spend a LOT of 
time sorting through the leader election process. Roughly 30 minutes.  

I too am wondering - if I force all leaders onto one node, then shut down both, 
then start up the node with all of the leaders on it first, then start up the 
other node, then I think I would have a much faster startup sequence.  

Does that sound reasonable? And if so, is there a way to trigger the leader 
election process without taking the time to unload and recreate the shards?  

> Hi  
>  
> When restarting a node in solrcloud, i run into scenarios where both the  
> replicas for a shard get into "recovering" state and never come up causing  
> the error "No servers hosting this shard". To fix this, I either unload one  
> core or restart one of the nodes again so that one of them becomes the  
> leader.  
>  
> Is there a way to "force" leader election for a shard for solrcloud? Is  
> there a way to break ties automatically (without restarting nodes) to make  
> a node as the leader for the shard?  
>  
>  
> Thanks  
> Nitin  


Re: zookeeper reconnect failure

2014-03-30 Thread Mark Miller
We don’t currently retry, but I don’t think it would hurt much if we did - at 
least briefly.

If you want to file a JIRA issue, that would be the best way to get it in a 
future release.

-- 
Mark Miller
about.me/markrmiller

On March 28, 2014 at 5:40:47 PM, Michael Della Bitta 
(michael.della.bi...@appinions.com) wrote:

Hi, Jessica,  

We've had a similar problem when DNS resolution of our Hadoop task nodes  
has failed. They tend to take a dirt nap until you fix the problem  
manually. Are you experiencing this in AWS as well?  

I'd say the two things to do are to poll the node state via HTTP using a  
monitoring tool so you get an immediate notification of the problem, and to  
install some sort of caching server like nscd if you expect to have DNS  
resolution failures regularly.  



Michael Della Bitta  

Applications Developer  

o: +1 646 532 3062  

appinions inc.  

"The Science of Influence Marketing"  

18 East 41st Street  

New York, NY 10017  

t: @appinions <https://twitter.com/Appinions> | g+:  
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
  
w: appinions.com <http://www.appinions.com/>  


On Fri, Mar 28, 2014 at 4:27 PM, Jessica Mallet wrote:  

> Hi,  
>  
> First off, I'd like to give a disclaimer that this probably is a very edge  
> case issue. However, since it happened to us, I would like to get some  
> advice on how to best handle this failure scenario.  
>  
> Basically, we had some network issue where we temporarily lost connection  
> and DNS. The zookeeper client properly triggered the watcher. However, when  
> trying to reconnect, this following Exception is thrown:  
>  
> 2014-03-27 17:24:46,882 ERROR [main-EventThread] SolrException.java (line  
> 121) :java.net.UnknownHostException: : Name or  
> service not known  
> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)  
> at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)  
> at  
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)  
> at java.net.InetAddress.getAllByName0(InetAddress.java:1211)  
> at java.net.InetAddress.getAllByName(InetAddress.java:1127)  
> at java.net.InetAddress.getAllByName(InetAddress.java:1063)  
> at  
>  
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60)
>   
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)  
> at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)  
> at  
> org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:41)  
> at  
>  
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
>   
> at  
>  
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:147)
>   
> at  
>  
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) 
>  
> at  
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)  
>  
> I tried to look at the code and it seems that there'd be no further retries  
> to connect to Zookeeper, and the node is basically left in a bad state and  
> will not recover on its own. (Please correct me if I'm reading this wrong.)  
> Thinking about it, this is probably fair, since normally you wouldn't  
> expect retries to fix an "unknown host" issue--even though in our case it  
> would have--but I'm wondering what we should do to handle this situation if  
> it happens again in the future.  
>  
> Any advice is appreciated.  
>  
> Thanks,  
> Jessica  
>  


  1   2   3   4   5   6   7   8   9   10   >