Solr Core initialization failure while writing junit test cases

2019-07-27 Thread Bharath Kumar
Hi All,

I am getting the below exception when trying to use EmbeddedServer for my
junit test cases. I have given the correct path to the config files. I get
the core failure even when using MiniSolrCloudCluster. The collection is
created and then deleted because core is not there. Can you please help me
on this?

Test Code:-
initCore("src/test/resources/solr/collection-name/conf/solrconfig.xml",
"src/test/resources/solr/collection-name/conf/schema.xml",
"src/test/resources/solr", "collection-name");

org.apache.solr.core.SolrCoreInitializationException: SolrCore
'collection-name' is not available due to init failure:
org/restlet/resource/ResourceException
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1593)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:158)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at
org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:936)
at
org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:899)
at
org.apache.solr.client.solrj.SolrClient.deleteByQuery(SolrClient.java:914)
at com.ss8.fusion.GetByIdTest.setUp(GetByIdTest.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1742)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:969)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:985)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:944)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:830)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:880)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:891)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException:
org/restlet/resource/ResourceException
at org.apache.solr.core.SolrCore.(SolrCore.java:1014)
at org.apache.solr.core.SolrCore.(SolrCore.java:869)
at

Re: Creating shard with core.properties

2019-02-01 Thread Bharath Kumar
Thanks Shawn for your inputs and the pointer to the documentation. Our
setup currently has 1 shard and 2 replicas for that shard and we do not
want a manual step which involves creating a collection since for SOLR
Cloud at least more than 50% of the shard nodes should be up and running.
Also if the zookeeper states go bad for some reason, we will need to
re-create the collection, whereas in the legacy cloud mode with manual
core.properties creation it has helped us bring up the solr cloud even
without any known zookeeper states after an upgrade and not do any
additional step.

On Wed, Jan 30, 2019 at 3:49 PM Shawn Heisey  wrote:

> On 1/30/2019 3:36 PM, Bharath Kumar wrote:
> > Thanks Erick. We cleanup the zookeeper state on every installation, so
> the
> > zookeeper states are gone. So what should we do in case of a new 7.6
> > installation where we want to manually create core.properties and use the
> > non-legacy cloud option? Is it in order to use non-legacy cloud, we
> should
> > use the collections api to create a collection first and then use the
> > manual core.properties for auto-discovery?
>
> *ALL* creations and modifications to SolrCloud collections should be
> done using the Collections API.  Creating cores directly (either with
> core.properties or the CoreAdmin API) is something that will almost
> certainly bite you hard.  Based on what Erick has said, I don't think
> you can even do it at all when legacy mode is disabled.  Even when you
> can ... don't.
>
> > Because in the legacy cloud mode we were just creating the
> core.properties
> > manually and that would update the zookeeper state when the solr boots
> up.
> > Can you please help me with this?
>
> Use the Collections API.  This is the recommendation even for experts
> who really know the code.  Creating cores manually in ANY SolrCloud
> install is a recipe for problems, even in legacy mode.
>
> There is a very large warning box (red triangle with an exclamation
> point) in this section of the documentation:
>
>
> https://lucene.apache.org/solr/guide/7_6/coreadmin-api.html#coreadmin-create
>
> One of the first things it says there in that warning box is that the
> CoreAdmin API should not be used in SolrCloud.  Manually creating
> core.properties files and restarting Solr is effectively the same thing
> as using the CoreAdmin API.
>
> Thanks,
> Shawn
>


-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: Creating shard with core.properties

2019-01-30 Thread Bharath Kumar
Thanks Erick. We cleanup the zookeeper state on every installation, so the
zookeeper states are gone. So what should we do in case of a new 7.6
installation where we want to manually create core.properties and use the
non-legacy cloud option? Is it in order to use non-legacy cloud, we should
use the collections api to create a collection first and then use the
manual core.properties for auto-discovery?
Because in the legacy cloud mode we were just creating the core.properties
manually and that would update the zookeeper state when the solr boots up.
Can you please help me with this?

On Wed, Jan 30, 2019 at 9:08 AM Erick Erickson 
wrote:

> This seems very confused. When you say your zookeeper
> state is new, you mean there's no remnant of your old 6.1
> collection? Then manually creating a core.properties file
> won't do you any good as there's no collection to add it to.
>
> You cannot just create a core.properties file and expect Solr
> to reconstruct the entire collection information with legacyCloud
> false. That means that ZooKeeper is considered "the one
> source of truth" and if something on a local disk isn't reflected
> in the state.json file, it's considered invalid.
>
> This really sounds like an XY problem and your goal is to
> upgrade a cluster from 6.1 to 7.6. You should just be able
> to install 7.6 over 6.1 and fire it up. It should not be necessary
> to do anything else. By "over" here I mean
> > install 7.6
> > shut down 6.1
> > start 7.6 with the SOLR_HOME pointing to the same place
>as your 6.1 for each Solr instance.
>
> If you want to create a new cluster, say for testing purposes or
> whatever what I'd do is just create an identical collection ("Identical"
> here means same number of shards, one replica each) with the
> collections API. Then shut down your new Solr and copy the data
> directory from a replica from your 6.1 install to the corresponding
> replica in your 7.6 install. You should NOT be actively indexing at this
> time and should have issued a commit to the 6.1 or shut 6.1 down.
> "corresponding replica" here is the replica with the same "range", but
> in this case it doesn't matter since you only have one shard.
>
> Then use the collections ADDREPLICA command to  add as many
> replicas as you want.
>
> Best,
> Erick
>
> On Tue, Jan 29, 2019 at 8:14 PM Bharath Kumar 
> wrote:
> >
> > Hi All,
> >
> > I am trying to create a shard using solr 7.6.0 using just core.properties
> > file (like auto-discovering the shard) with legacyCloud set to false.
> But i
> > am getting an error message like below even though i specify the
> > coreNodeName in the core.properties file:-
> >
> > "coreNodeName " + coreNodeName + " does not exist in shard " +
> > cloudDesc.getShardId() +
> > ", ignore the exception if the replica was deleted");
> >
> > Please note my zookeeper state is new and does not have any state
> > registered earlier. Can you please help? The reason i need this is, we
> are
> > trying to migrate from 6.1 to 7.6.0 and i have a single shard with 2
> > replicas created using core.properties and not using the collection api.
> > --
> > Thanks & Regards,
> > Bharath MV Kumar
> >
> > "Life is short, enjoy every moment of it"
>


-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Creating shard with core.properties

2019-01-29 Thread Bharath Kumar
Hi All,

I am trying to create a shard using solr 7.6.0 using just core.properties
file (like auto-discovering the shard) with legacyCloud set to false. But i
am getting an error message like below even though i specify the
coreNodeName in the core.properties file:-

"coreNodeName " + coreNodeName + " does not exist in shard " +
cloudDesc.getShardId() +
", ignore the exception if the replica was deleted");

Please note my zookeeper state is new and does not have any state
registered earlier. Can you please help? The reason i need this is, we are
trying to migrate from 6.1 to 7.6.0 and i have a single shard with 2
replicas created using core.properties and not using the collection api.
-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: SOLR as nosql database store

2017-05-10 Thread Bharath Kumar
Thanks Walter and Mike. In our use case we have same schema on both source
and target sites. The idea is if we can avoid mysql replication on the
target site for a particular table in our mysql schema. Currently, we index
some of the fields in that table in solr, we want to move all the fields to
solr and index some of them and so store only for others.

On Wed, May 10, 2017 at 10:09 AM, Bharath Kumar <bharath.mvku...@gmail.com>
wrote:

> Yes Mike we have CDCR replication as well.
>
> On Wed, May 10, 2017 at 1:15 AM, Mike Drob <md...@apache.org> wrote:
>
>> > The searching install will be able to rebuild itself from the data
>> storage install when that
>> is required.
>>
>> Is this a use case for CDCR?
>>
>> Mike
>>
>> On Tue, May 9, 2017 at 6:39 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>>
>> > On 5/9/2017 12:58 AM, Bharath Kumar wrote:
>> > > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
>> > will that not serve as backup when something goes wrong? Also we use
>> latest
>> > solr 6 and from the documentation of solr, the indexing performance has
>> > been good. The reason is that we are using MySQL as the primary data
>> store
>> > and the performance might not be optimal if we write data at a very
>> rapid
>> > rate. Already we index almost half the fields that are in MySQL in solr.
>> >
>> > A replica is protection against data loss in the event of hardware
>> > failure, but there are classes of problems that it cannot protect
>> against.
>> >
>> > Although Solr (Lucene) does try *really* hard to never lose data that it
>> > hasn't been asked to delete, it is not designed to be a database.  It's
>> > a search engine.  Solr doesn't offer the same kinds of guarantees about
>> > the data it contains that software like MySQL does.
>> >
>> > I personally don't recommend trying to use Solr as a primary data store,
>> > but if that's what you really want to do, then I would suggest that you
>> > have two complete Solr installs, with multiple replicas on both.  One of
>> > them will be used for searching and have a configuration you're already
>> > familiar with, the other will be purely for data storage -- only certain
>> > fields like the uniqueKey will be indexed, but every other field will be
>> > stored only.
>> >
>> > Running with two separate Solr installs will allow you to optimize one
>> > for searching and the other for data storage.  The searching install
>> > will be able to rebuild itself from the data storage install when that
>> > is required.  If better performance is needed for the rebuild, you have
>> > the option of writing a multi-threaded or multi-process program that
>> > reads from one and writes to the other.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: SOLR as nosql database store

2017-05-10 Thread Bharath Kumar
Yes Mike we have CDCR replication as well.

On Wed, May 10, 2017 at 1:15 AM, Mike Drob <md...@apache.org> wrote:

> > The searching install will be able to rebuild itself from the data
> storage install when that
> is required.
>
> Is this a use case for CDCR?
>
> Mike
>
> On Tue, May 9, 2017 at 6:39 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
> > On 5/9/2017 12:58 AM, Bharath Kumar wrote:
> > > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
> > will that not serve as backup when something goes wrong? Also we use
> latest
> > solr 6 and from the documentation of solr, the indexing performance has
> > been good. The reason is that we are using MySQL as the primary data
> store
> > and the performance might not be optimal if we write data at a very rapid
> > rate. Already we index almost half the fields that are in MySQL in solr.
> >
> > A replica is protection against data loss in the event of hardware
> > failure, but there are classes of problems that it cannot protect
> against.
> >
> > Although Solr (Lucene) does try *really* hard to never lose data that it
> > hasn't been asked to delete, it is not designed to be a database.  It's
> > a search engine.  Solr doesn't offer the same kinds of guarantees about
> > the data it contains that software like MySQL does.
> >
> > I personally don't recommend trying to use Solr as a primary data store,
> > but if that's what you really want to do, then I would suggest that you
> > have two complete Solr installs, with multiple replicas on both.  One of
> > them will be used for searching and have a configuration you're already
> > familiar with, the other will be purely for data storage -- only certain
> > fields like the uniqueKey will be indexed, but every other field will be
> > stored only.
> >
> > Running with two separate Solr installs will allow you to optimize one
> > for searching and the other for data storage.  The searching install
> > will be able to rebuild itself from the data storage install when that
> > is required.  If better performance is needed for the rebuild, you have
> > the option of writing a multi-threaded or multi-process program that
> > reads from one and writes to the other.
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: SOLR as nosql database store

2017-05-10 Thread Bharath Kumar
Thanks Shawn and Rick for your suggestions. We will surely look at these
options.

On Tue, May 9, 2017 at 4:39 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 5/9/2017 12:58 AM, Bharath Kumar wrote:
> > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
> will that not serve as backup when something goes wrong? Also we use latest
> solr 6 and from the documentation of solr, the indexing performance has
> been good. The reason is that we are using MySQL as the primary data store
> and the performance might not be optimal if we write data at a very rapid
> rate. Already we index almost half the fields that are in MySQL in solr.
>
> A replica is protection against data loss in the event of hardware
> failure, but there are classes of problems that it cannot protect against.
>
> Although Solr (Lucene) does try *really* hard to never lose data that it
> hasn't been asked to delete, it is not designed to be a database.  It's
> a search engine.  Solr doesn't offer the same kinds of guarantees about
> the data it contains that software like MySQL does.
>
> I personally don't recommend trying to use Solr as a primary data store,
> but if that's what you really want to do, then I would suggest that you
> have two complete Solr installs, with multiple replicas on both.  One of
> them will be used for searching and have a configuration you're already
> familiar with, the other will be purely for data storage -- only certain
> fields like the uniqueKey will be indexed, but every other field will be
> stored only.
>
> Running with two separate Solr installs will allow you to optimize one
> for searching and the other for data storage.  The searching install
> will be able to rebuild itself from the data storage install when that
> is required.  If better performance is needed for the rebuild, you have
> the option of writing a multi-threaded or multi-process program that
> reads from one and writes to the other.
>
> Thanks,
> Shawn
>
>


-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: SOLR as nosql database store

2017-05-09 Thread Bharath Kumar
Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas, will
that not serve as backup when something goes wrong? Also we use latest solr
6 and from the documentation of solr, the indexing performance has been
good. The reason is that we are using MySQL as the primary data store and
the performance might not be optimal if we write data at a very rapid rate.
Already we index almost half the fields that are in MySQL in solr.

On Mon, May 8, 2017 at 9:24 PM, Dave <hastings.recurs...@gmail.com> wrote:

> You will want to have both solr and a sql/nosql data storage option. They
> serve different purposes
>
>
> > On May 8, 2017, at 10:43 PM, bharath.mvkumar <bharath.mvku...@gmail.com>
> wrote:
> >
> > Hi All,
> >
> > We have a use case where we have mysql database which stores documents
> and
> > also some of the fields in the document is also indexed in solr.
> > We plan to move all those documents to solr by making solr as the nosql
> > datastore for storing those documents. The reason we plan to do this is
> > because we have to support cross center data replication for both mysql
> and
> > solr and we are in a way duplicating the same data.The number of writes
> we
> > do per second is around 10,000. Also currently we have only one shard
> and we
> > have around 70 million records and we plan to support close to 1 billion
> > records and also perform sharding.
> >
> > Using solr as the nosql database is a good choice or should we look at
> > Cassandra for our use case?
> >
> > Thanks,
> > Bharath Kumar
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> nabble.com/SOLR-as-nosql-database-store-tp4334119.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Recovery failed in SOLR cloud - SOLR 6

2016-11-30 Thread Bharath Kumar
Hi All,

We have an issue in production, where we have the SOLR cloud with 3 nodes
and in that on one of the node is failing with the below error:-

index fetcher

error recovery failed 1 of 500 attempts

no content received for file :tlog.0105987.1552

We have cdcr logging enabled, and have just the below configuration for
cdcr:-

 
 disabled
 


Can you please help with this and let me know if the buffering is disabled,
will it cause the replicas in the same cluster to error out with the above
errors? Because by default if the cdcr is not configured, the buffering is
disabled and the transaction logs will be deleted normally by the system.

-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: How to re-index SOLR data

2016-08-10 Thread Bharath Kumar
Hi All,

Thanks so much for your inputs. We have a MYSQL data source and i think we
will try to re-index using the MYSQL data.

I wanted something where i can export all my current data say to an excel
file or some data source and then import it on another node with the same
collection with empty data.

On Tue, Aug 9, 2016 at 8:44 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Assuming you can re-index
>
> Consider "collection aliasing". Say your current collection is C1.
> Create C2 (using the same cluster, Zookeeper and the like). Go
> ahead and index to C2 (however you do that). NOTE: the physical
> machines may be _different_ than C1, or not. That's up to you. The
> critical bit is that you use the same Zookeeper.
>
> Now, when you are done you use the Collections API CREATEALIAS
> command to point a "pseudo collection" to C1 (call it "prod"). This is
> seamless to the users.
>
> The flaw in my plan so far is that you probably go at Collection C1
> directly. So what you might do is create the "prod" alias and point it at
> C1. Now change your LB (or client or whatever) to use the "prod"
> collection,
> then when indexing is complete use CREATEALIAS to point "prod" at C2
> instead.
>
> This is actually a quite well-tested process, often used when you want to
> change "atomically", e.g. when you reindex the same data nightly but want
> all the new data available in its entirety only after it has been QA'd or
> such.
>
> Best,
> Erick
>
> On Tue, Aug 9, 2016 at 2:43 PM, John Bickerstaff
> <j...@johnbickerstaff.com> wrote:
> > In my case, I've done two things  neither of them involved taking the
> > data from SOLR to SOLR...  although in my reading, I've seen that this is
> > theoretically possible (I.E. sending data from one SOLR server to another
> > SOLR server and  having the second SOLR instance re-index...)
> >
> > I haven't used the python script...  that was news to me, but it sounds
> > interesting...
> >
> > What I've done is one of the following:
> >
> > a. Get the data from the original source (database, whatever) and massage
> > it again so that i's ready for SOLR and then submit it to my new
> SolrCloud
> > for indexing.
> >
> > b. Keep a separate store of EVERY Solr document as it comes out of my
> code
> > (in xml) and store it in Kafka or a text file.  Then it's easy to push
> back
> > into another SOLR instance any time - multiple times if necessary.
> >
> > I'm guessing you don't have the data stored away as in "b"...  And if you
> > don't have a way of getting the data from some central source, then "a"
> > won't work either...  Which leaves you with the concept of sending data
> > from SOLR "A" to SOLR "B" and having "B" reindex...
> >
> > This might serve as a starting point in that case...
> > https://wiki.apache.org/solr/HowToReindex
> >
> > You'll note that there are limitations and a strong caveat against doing
> > this with SOLR, but if you have no other option, then it's the best you
> can
> > do.
> >
> > Do you have the ability to get all the data again from an authoritative
> > source?  (Relational Database or something similar?)
> >
> > On Tue, Aug 9, 2016 at 3:21 PM, Bharath Kumar <bharath.mvku...@gmail.com
> >
> > wrote:
> >
> >> Hi John,
> >>
> >> Thanks so much for your inputs. We have time to build another system. So
> >> how did you index the same data on the main SOLR node to the new SOLR
> node?
> >> Did you use the re-index python script? The new data will be indexed
> >> correctly with the new rules, but what about the old data?
> >>
> >> Our SOLR data is around 30GB with around 60 million documents. We use
> SOLR
> >> cloud with 3 solr nodes and 3 zookeepers.
> >>
> >> On Tue, Aug 9, 2016 at 2:13 PM, John Bickerstaff <
> j...@johnbickerstaff.com
> >> >
> >> wrote:
> >>
> >> > In case this helps...
> >> >
> >> > Assuming you have the resources to build a copy of your production
> >> > environment and assuming you have the time, you don't need to take
> your
> >> > production down - or even affect it's processing...
> >> >
> >> > What I've done (with admittedly smaller data sets) is build a separate
> >> > environment (usually on VM's) and once it's set up, I do the new
> indexing
> >> > according to the new "rules"  (Like your cha

Re: How to re-index SOLR data

2016-08-09 Thread Bharath Kumar
Hi John,

Thanks so much for your inputs. We have time to build another system. So
how did you index the same data on the main SOLR node to the new SOLR node?
Did you use the re-index python script? The new data will be indexed
correctly with the new rules, but what about the old data?

Our SOLR data is around 30GB with around 60 million documents. We use SOLR
cloud with 3 solr nodes and 3 zookeepers.

On Tue, Aug 9, 2016 at 2:13 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> In case this helps...
>
> Assuming you have the resources to build a copy of your production
> environment and assuming you have the time, you don't need to take your
> production down - or even affect it's processing...
>
> What I've done (with admittedly smaller data sets) is build a separate
> environment (usually on VM's) and once it's set up, I do the new indexing
> according to the new "rules"  (Like your change of long to string)
>
> Then, in a sense, I don't care how long it takes because it is not
> affecting Prod.
>
> When it's done, I simply switch my load balancer to point to the new
> environment and shut down the old one.
>
> To users, this could be seamless if you handle the load balancer correctly
> and have it refuse new connections to the old servers while routing all new
> connections to the new Solr servers...
>
> On Tue, Aug 9, 2016 at 3:04 PM, Bharath Kumar <bharath.mvku...@gmail.com>
> wrote:
>
> > Hi Nick and Shawn,
> >
> > Thanks so much for the pointers. I will try that out. Thank you again!
> >
> > On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev <nick.vasily...@gmail.com>
> > wrote:
> >
> > > Hi, I work on a python Solr Client
> > > <http://solrclient.readthedocs.io/en/latest/> library and there is a
> > > reindexing helper module that you can use if you are on Solr 4.9+. I
> use
> > it
> > > all the time and I think it works pretty well. You can re-index all
> > > documents from a collection into another collection or dump them to the
> > > filesystem as JSON. It also supports parallel execution and can run
> > > independently on each shard. There is also a way to resume if your job
> > > craps out half way through if your existing schema is set up with a
> good
> > > date field and unique id.
> > >
> > > You can read the documentation here:
> > > http://solrclient.readthedocs.io/en/latest/Reindexer.html
> > >
> > > Code is pretty short and is here:
> > > https://github.com/moonlitesolutions/SolrClient/
> blob/master/SolrClient/
> > > helpers/reindexer.py
> > >
> > > Here is sample:
> > > from SolrClient import SolrClient
> > > from SolrClient.helpers import Reindexer
> > >
> > > r = Reindexer(SolrClient('http://source_solr:8983/solr'), SolrClient('
> > > http://destination_solr:8983/solr') , source_coll='source_collection',
> > > dest_coll='destination-collection')
> > > r.reindex()
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey <apa...@elyograg.org>
> > wrote:
> > >
> > > > On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> > > > > What would be the best way to re-index the data in the SOLR cloud?
> We
> > > > > have around 65 million data and we are planning to change the
> schema
> > > > > by changing the unique key type from long to string. How long does
> it
> > > > > take to re-index 65 million documents in SOLR and can you please
> > > > > suggest how to do that?
> > > >
> > > > There is no magic bullet.  And there's no way for anybody but you to
> > > > determine how long it's going to take.  There are people who have
> > > > achieved over 50K inserts per second, and others who have difficulty
> > > > reaching 1000 per second.  Many factors affect indexing speed,
> > including
> > > > the size of your documents, the complexity of your analysis, the
> > > > capabilities of your hardware, and how many threads/processes you are
> > > > using at the same time when you index.
> > > >
> > > > Here's some more detailed info about reindexing, but it's probably
> not
> > > > what you wanted to hear:
> > > >
> > > > https://wiki.apache.org/solr/HowToReindex
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Bharath MV Kumar
> >
> > "Life is short, enjoy every moment of it"
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: Unique key field type in solr 6.1 schema

2016-08-09 Thread Bharath Kumar
,
> >
> > I have an issue with cross data center replication, when we delete the
> > document by id from the main site. The target site document is not
> deleted.
> > I have the id field which is a unique field for my schema which is
> > configured as "long".
> >
> > If i change the type to "string" it works fine. Is there any issue using
> > long. Because we migrated from 4.4 to 6.1, and we had the id field as
> long.
> > Can you please help me with this. Really appreciate your help.
> >
> > I see the below error on the target site:-
> >
> >  o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
> Invalid
> > Number:
> >   at org.apache.solr.schema.TrieField.readableToIndexed(
> > TrieField.java:537)
> > at
> > org.apache.solr.update.DeleteUpdateCommand.getIndexedId(
> > DeleteUpdateCommand.java:65)
> > at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.
> versionDelete(
> > DistributedUpdateProcessor.java:1495)
> > at
> > org.apache.solr.update.processor.CdcrUpdateProcessor.versionDelete(
> > CdcrUpdateProcessor.java:85)
> >
> > Thanks,
> > Bharath Kumar
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Unique-key-field-type-in-solr-6-1-schema-tp4290895.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: Solr DeleteByQuery vs DeleteById

2016-08-09 Thread Bharath Kumar
Hi Danny and Daniel,

Thank you so much for your inputs.

Actually we use deleteByIds, but because we need the CDCR solution to work
for us, we are having issues when we use deleteById. The deleteById logs a
transaction in the transaction logs and that when passed over to the target
site, the CDCR update processor is not able to process that transaction.
The issue occurs when we use unique key "id" field type as long. If we use
it as "string", there are no problems. But we have already data in
production, if we change the schema we need to re-index. So that is one of
the reason we are thinking of using delete by query.

I opened a ticket in JIRA - https://issues.apache.org/jira/browse/SOLR-9394
as well.

On Tue, Aug 9, 2016 at 8:58 AM, Daniel Collins <danwcoll...@gmail.com>
wrote:

> Seconding that point, we currently do DBQ to "tidy" some of our collections
> and time-bound them (so running "delete anything older than X").  They have
> similar issues with reordering and blocking from time to time.
>
> On 9 August 2016 at 14:20, danny teichthal <dannyt...@gmail.com> wrote:
>
> > Hi Bharath,
> > I'm no expert, but we had some major problems because of deleteByQuery (
> in
> > short DBQ).
> > We ended up replacing all of our DBQ to delete by ids.
> >
> > My suggestion is that if you don't realy need it - don't use it.
> > Especially in your case, since you already know the population of ids, it
> > is redundant to query for it.
> >
> > I don't know how CDCR works, but we have a replication factor of 2 on our
> > SolrCloud cluster.
> > Since Solr 5.x , DBQ were stuck for a long while on the replicas,
> blocking
> > all updates.
> > It appears that on the replica side, there's an overhead of reordering
> and
> > executing the same DBQ over and over again, for consistency reasons.
> > It ends up buffering many delete by queries and blocks all updates.
> > In addition there's another defect on related slowness on DBQ -
> LUCENE-7049
> >
> >
> >
> >
> >
> > On Tue, Aug 9, 2016 at 7:14 AM, Bharath Kumar <bharath.mvku...@gmail.com
> >
> > wrote:
> >
> > > Hi All,
> > >
> > > We are using SOLR 6.1 and i wanted to know which is better to use -
> > > deleteById or deleteByQuery?
> > >
> > > We have a program which deletes 10 documents every 5 minutes from
> the
> > > SOLR and we do it in a batch of 200 to delete those documents. For that
> > we
> > > now use deleteById(List ids, 1) to delete.
> > > I wanted to know if we change it to deleteByQuery(query, 1) where
> the
> > > query is like this - (id:1 OR id:2 OR id:3 OR id:4). Will this have a
> > > performance impact?
> > >
> > > We use SOLR cloud with 3 SOLR nodes in the cluster and also we have a
> > > similar setup on the target site and we use Cross Data Center
> Replication
> > > to replicate from main site.
> > >
> > > Can you please let me know if using deleteByQuery will have any
> impact? I
> > > see it opens real time searcher on all the nodes in cluster.
> > >
> > > --
> > > Thanks & Regards,
> > > Bharath MV Kumar
> > >
> > > "Life is short, enjoy every moment of it"
> > >
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: How to re-index SOLR data

2016-08-09 Thread Bharath Kumar
Hi Nick and Shawn,

Thanks so much for the pointers. I will try that out. Thank you again!

On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev 
wrote:

> Hi, I work on a python Solr Client
>  library and there is a
> reindexing helper module that you can use if you are on Solr 4.9+. I use it
> all the time and I think it works pretty well. You can re-index all
> documents from a collection into another collection or dump them to the
> filesystem as JSON. It also supports parallel execution and can run
> independently on each shard. There is also a way to resume if your job
> craps out half way through if your existing schema is set up with a good
> date field and unique id.
>
> You can read the documentation here:
> http://solrclient.readthedocs.io/en/latest/Reindexer.html
>
> Code is pretty short and is here:
> https://github.com/moonlitesolutions/SolrClient/blob/master/SolrClient/
> helpers/reindexer.py
>
> Here is sample:
> from SolrClient import SolrClient
> from SolrClient.helpers import Reindexer
>
> r = Reindexer(SolrClient('http://source_solr:8983/solr'), SolrClient('
> http://destination_solr:8983/solr') , source_coll='source_collection',
> dest_coll='destination-collection')
> r.reindex()
>
>
>
>
>
>
> On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey  wrote:
>
> > On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> > > What would be the best way to re-index the data in the SOLR cloud? We
> > > have around 65 million data and we are planning to change the schema
> > > by changing the unique key type from long to string. How long does it
> > > take to re-index 65 million documents in SOLR and can you please
> > > suggest how to do that?
> >
> > There is no magic bullet.  And there's no way for anybody but you to
> > determine how long it's going to take.  There are people who have
> > achieved over 50K inserts per second, and others who have difficulty
> > reaching 1000 per second.  Many factors affect indexing speed, including
> > the size of your documents, the complexity of your analysis, the
> > capabilities of your hardware, and how many threads/processes you are
> > using at the same time when you index.
> >
> > Here's some more detailed info about reindexing, but it's probably not
> > what you wanted to hear:
> >
> > https://wiki.apache.org/solr/HowToReindex
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Solr DeleteByQuery vs DeleteById

2016-08-08 Thread Bharath Kumar
Hi All,

We are using SOLR 6.1 and i wanted to know which is better to use -
deleteById or deleteByQuery?

We have a program which deletes 10 documents every 5 minutes from the
SOLR and we do it in a batch of 200 to delete those documents. For that we
now use deleteById(List ids, 1) to delete.
I wanted to know if we change it to deleteByQuery(query, 1) where the
query is like this - (id:1 OR id:2 OR id:3 OR id:4). Will this have a
performance impact?

We use SOLR cloud with 3 SOLR nodes in the cluster and also we have a
similar setup on the target site and we use Cross Data Center Replication
to replicate from main site.

Can you please let me know if using deleteByQuery will have any impact? I
see it opens real time searcher on all the nodes in cluster.

-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Unique key field type in solr 6.1 schema

2016-08-07 Thread Bharath Kumar
Hi All,

I have an issue with cross data center replication, when we delete the
document by id from the main site. The target site document is not deleted.
I have the id field which is a unique field for my schema which is
configured as "long".

If i change the type to "string" it works fine. Is there any issue using
long. Because we migrated from 4.4 to 6.1, and we had the id field as long.
Can you please help me with this. Really appreciate your help.

*I see the below error on the target site:-*

 o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Invalid
Number:
  at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:537)
at org.apache.solr.update.DeleteUpdateCommand.getIndexedId(
DeleteUpdateCommand.java:65)
at org.apache.solr.update.processor.DistributedUpdateProcessor.
versionDelete(DistributedUpdateProcessor.java:1495)
at org.apache.solr.update.processor.CdcrUpdateProcessor.
versionDelete(CdcrUpdateProcessor.java:85)

-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


CDCR delete document issue on target site

2016-08-06 Thread Bharath Kumar
Hi All,

I am using the cdcr solution available in SOLR 6.1 and i have setup the
cross data center replication on both the sites. When i add and update
documents on the main site, the data is replicated to the target site with
no issues. But when i delete a document on the main site, i see the below
errors. However on the main site SOLR node, that document gets deleted, but
on the target site we get an error while deleting that index.

*Error stacktrace on main site SOLR node:-*

org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
from server at http://:port_number/solr/collection: Invalid
Number:  ^A^@^@^@^@^@^@C$U
at
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:697)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1109)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:998)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:934)
at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:166)
at
org.apache.solr.handler.CdcrReplicator.sendRequest(CdcrReplicator.java:135)
at
org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:99)
at
org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$59(CdcrReplicatorScheduler.java:80)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


*Error stacktrace on the target site SOLR node leader:-*

2016-08-06 08:09:21.091 ERROR (qtp472654579-2699) [c:collection s:shard1
r:core_node3 x:collection] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Invalid Number:  ^A^@^@^@^@^@^L^K0W
at
org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:537)
at
org.apache.solr.update.DeleteUpdateCommand.getIndexedId(DeleteUpdateCommand.java:65)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1495)
at
org.apache.solr.update.processor.CdcrUpdateProcessor.versionDelete(CdcrUpdateProcessor.java:85)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:1154)
at
org.apache.solr.handler.loader.JavabinLoader.delete(JavabinLoader.java:151)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:112)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at

Re: SOLR war for SOLR 6

2016-07-13 Thread Bharath Kumar
Hi All,

Thanks so much for the response. We upgraded to SOLR 6.1 and moved to use
jetty instead of deploying the solr war on jboss, till now it looks good.

Danny,

I too faced the same problem with the servlet-api, then fixed that, but
still was getting 404 after that, so decided to deploy solr as standalone
on jetty instead.

On Sun, Jun 19, 2016 at 2:17 AM, danny teichthal <dannyt...@gmail.com>
wrote:

> If you are running on tomcat you will probably have a deployment problem.
> On version 5.2.1 it worked fine for me, I manually packaged solr.war on
> build time.
> But, when trying to upgrade to Solr 5.5.1, I had problems with incompatible
> servlet-api of Solr's jetty version and my tomcat servlert-api.
> Solr code explicitly use some new methods that existed in the Jetty, but
> not in my tomcat.
> For me it was a no-go, from all the reasons that Shawn stated.
>
>
>
>
> On Sat, Jun 18, 2016 at 12:26 AM, Shawn Heisey <apa...@elyograg.org>
> wrote:
>
> > On 6/16/2016 1:20 AM, Bharath Kumar wrote:
> > > I was trying to generate a solr war out of the solr 6 source, but even
> > > after i create the war, i was not able to get it deployed correctly on
> > > jboss. Wanted to know if anyone was able to successfully generate solr
> > > war and deploy it on tomcat or jboss? Really appreciate your help on
> > > this.
> >
> > FYI: If you do this, you're running an unsupported configuration.
> > You're on your own for both getting it working AND any problems that are
> > related to the deployment rather than Solr itself.
> >
> > You actually don't need to create a war.  Just run "ant clean server" in
> > the solr directory of the source code and then install the exploded
> > webapp (found in server/solr-webapp/webapp) into the container.  There
> > should be instructions available for how to install an exploded webapp
> > into tomcat or jboss.  As already stated, you are on your own for
> > finding and following those instructions, and if Solr doesn't deploy,
> > you will need to talk to somebody who knows the container for help.
> > Once they are sure you have the config for the container right, they may
> > refer you back here ... but because it's an unsupported config, the
> > amount of support we can offer is minimal.
> >
> > https://wiki.apache.org/solr/WhyNoWar
> >
> > If you want the admin UI to work when you install into a user-supplied
> > container, then you must set the context path for the app to "/solr".
> > The admin UI in 6.x will not work if you use another path, and that is
> > not considered a bug, because the only supported container has the path
> > hardcoded to /solr.
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Frange query apostrophe issue

2016-07-13 Thread Bharath Kumar
Hi All,

I have a query - {!frange l=1}sum(product(termfreq(content,'hasn't'),1))
where the content field has keyword - "hasn't" with apostrophe and when
provided with this query i get the below SOLR exception:-

"msg":"org.apache.solr.search.SyntaxError: Expected ',' at position 35
in 'sum(product(termfreq(content,'hasn't'),1))'"

Can you please help me with this? If i dont have the apostrophe there are
no errors.
-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


SOLR war for SOLR 6

2016-06-16 Thread Bharath Kumar
Hi,

I was trying to generate a solr war out of the solr 6 source, but even
after i create the war, i was not able to get it deployed correctly on
jboss.

Wanted to know if anyone was able to successfully generate solr war and
deploy it on tomcat or jboss? Really appreciate your help on this.

-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Regarding threadPoolSize for cross center data replication

2016-06-15 Thread Bharath Kumar
Hi,

I was trying to find what would be the best number of thread pool size that
needs to be configured on the source site in solrconfig.xml for cross
datacenter replication. We have one target replica and one shard, is it
recommended to have more than one thread?

If we have more than 1 thread, will the updates not get un-ordered on the
target site? Can you please let me know?

  
8
1000
128
  

-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: Regarding CDCR SOLR 6

2016-06-14 Thread Bharath Kumar
Hi Renaud,

Thank you so much for your response. It is very helpful and it helped me
understand the need for turning on buffering.

Is it recommended to keep the buffering enabled all the time on the source
cluster? If the target cluster is up and running and the cdcr is started,
can i turn off the buffering on the source site?

As you have mentioned, the transaction logs are kept on the source cluster,
until the data is replicated on the target cluster, once the cdcr is
started. Is there a possibility that target cluster is out of sync with the
source cluster and we need to do a hard recovery from the source cluster to
sync up the target cluster?

Also i have the below configuration on the source cluster to synchronize
the update logs.
   
1000
  

Regarding the monitoring of the replication, i am planning to add a script
to check the queue size, to make sure the disk is not full in case the
target site is down and the transaction log size keeps growing on the
source site.
Is there any other recommended approach?

Thanks again, your inputs were very helpful.

On Tue, Jun 14, 2016 at 7:10 PM, Bharath Kumar <bharath.mvku...@gmail.com>
wrote:

> Hi Renaud,
>
> Thank you so much for your response. It is very helpful and it helped me
> understand the need for turning on buffering.
>
> Is it recommended to keep the buffering enabled all the time on the source
> cluster? If the target cluster is up and running and the cdcr is started,
> can i turn off the buffering on the source site?
>
> As you have mentioned, the transaction logs are kept on the source
> cluster, until the data is replicated on the target cluster, once the cdcr
> is started, is there a possibility that if on the target cluster
>
>
>
> On Tue, Jun 14, 2016 at 6:50 AM, Davis, Daniel (NIH/NLM) [C] <
> daniel.da...@nih.gov> wrote:
>
>> I must chime in to clarify something - in case 2, would the source
>> cluster eventually start a log reader on its own?   That is, would the CDCR
>> heal over time, or would manual action be required?
>>
>> -Original Message-
>> From: Renaud Delbru [mailto:renaud@siren.solutions]
>> Sent: Tuesday, June 14, 2016 4:51 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Regarding CDCR SOLR 6
>>
>> Hi Bharath,
>>
>> The buffer is useful when you need to buffer updates on the source
>> cluster before starting cdcr, if the source cluster might receive updates
>> in the meanwhile and you want to be sure to not miss them.
>>
>> To understand this better, you need to understand how cdcr clean
>> transaction logs. Cdcr when started (with the START action) will
>> instantiate a log reader for each target cluster. The position of the log
>> reader will indicate cdcr which transaction logs it can clean. If all the
>> log readers are beyond a certain point, then cdcr can clean all the
>> transaction logs up to this point.
>>
>> However, there might be cases when the source cluster will be up without
>> any log readers instantiated:
>> 1) The source cluster is started, but cdcr is not started yet
>> 2) the source cluster is started, cdcr is started, but the target cluster
>> was not accessible when cdcr was started. In this case, cdcr will not be
>> able to instantiate a log reader for this cluster.
>>
>> In these two scenarios, if updates are received by the source cluster,
>> then they might be cleaned out from the transaction log as per the normal
>> update log cleaning procedure.
>> That is where the buffer becomes useful. When you know that while
>> starting up your clusters and cdcr, you will be in one of these two
>> scenarios, then you can activate the buffer to be sure to not miss updates.
>> Then when the source and target clusters are properly up and cdcr
>> replication is properly started, you can turn off this buffer.
>>
>> --
>> Renaud Delbru
>>
>> On 14/06/16 06:41, Bharath Kumar wrote:
>> > Hi,
>> >
>> > I have setup cross data center replication using solr 6, i want to
>> > know why the buffer needs to be enabled on the source cluster? Even if
>> > the buffer is not enabled, i am able to replicate the data between
>> > source and target sites. What is the advantages of enabling the buffer
>> > on the source site? If i enable the buffer, the transaction logs are
>> > never deleted and over a period of time we are running out of disk.
>> > Can you please let me know why the buffer enabling is required?
>> >
>>
>>
>
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: Regarding CDCR SOLR 6

2016-06-14 Thread Bharath Kumar
Hi Renaud,

Thank you so much for your response. It is very helpful and it helped me
understand the need for turning on buffering.

Is it recommended to keep the buffering enabled all the time on the source
cluster? If the target cluster is up and running and the cdcr is started,
can i turn off the buffering on the source site?

As you have mentioned, the transaction logs are kept on the source cluster,
until the data is replicated on the target cluster, once the cdcr is
started, is there a possibility that if on the target cluster



On Tue, Jun 14, 2016 at 6:50 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> I must chime in to clarify something - in case 2, would the source cluster
> eventually start a log reader on its own?   That is, would the CDCR heal
> over time, or would manual action be required?
>
> -Original Message-
> From: Renaud Delbru [mailto:renaud@siren.solutions]
> Sent: Tuesday, June 14, 2016 4:51 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Regarding CDCR SOLR 6
>
> Hi Bharath,
>
> The buffer is useful when you need to buffer updates on the source cluster
> before starting cdcr, if the source cluster might receive updates in the
> meanwhile and you want to be sure to not miss them.
>
> To understand this better, you need to understand how cdcr clean
> transaction logs. Cdcr when started (with the START action) will
> instantiate a log reader for each target cluster. The position of the log
> reader will indicate cdcr which transaction logs it can clean. If all the
> log readers are beyond a certain point, then cdcr can clean all the
> transaction logs up to this point.
>
> However, there might be cases when the source cluster will be up without
> any log readers instantiated:
> 1) The source cluster is started, but cdcr is not started yet
> 2) the source cluster is started, cdcr is started, but the target cluster
> was not accessible when cdcr was started. In this case, cdcr will not be
> able to instantiate a log reader for this cluster.
>
> In these two scenarios, if updates are received by the source cluster,
> then they might be cleaned out from the transaction log as per the normal
> update log cleaning procedure.
> That is where the buffer becomes useful. When you know that while starting
> up your clusters and cdcr, you will be in one of these two scenarios, then
> you can activate the buffer to be sure to not miss updates. Then when the
> source and target clusters are properly up and cdcr replication is properly
> started, you can turn off this buffer.
>
> --
> Renaud Delbru
>
> On 14/06/16 06:41, Bharath Kumar wrote:
> > Hi,
> >
> > I have setup cross data center replication using solr 6, i want to
> > know why the buffer needs to be enabled on the source cluster? Even if
> > the buffer is not enabled, i am able to replicate the data between
> > source and target sites. What is the advantages of enabling the buffer
> > on the source site? If i enable the buffer, the transaction logs are
> > never deleted and over a period of time we are running out of disk.
> > Can you please let me know why the buffer enabling is required?
> >
>
>


-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Regarding CDCR SOLR 6

2016-06-13 Thread Bharath Kumar
Hi,

I have setup cross data center replication using solr 6, i want to know why
the buffer needs to be enabled on the source cluster? Even if the buffer is
not enabled, i am able to replicate the data between source and target
sites. What is the advantages of enabling the buffer on the source site? If
i enable the buffer, the transaction logs are never deleted and over a
period of time we are running out of disk. Can you please let me know why
the buffer enabling is required?

-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"