Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Arnold Bronley
@Andrea: Yeah, I would try to avoid getting that information from
System.getProperty. I am also looking for some class that will give this
information.

@Erick: Is there any way to get the information about current Solr
endpoint/Zk ensemble info from inside  StatelessScriptUpdateProcessorFactory
so that I can make that http request?

On Thu, Aug 29, 2019 at 5:18 PM Andrea Gazzarini 
wrote:

> I remember ZK coordinates (hosts, ports and root) are set as system
> properties in Solr nodes (please open the admin console and see their
> names). So, it would be just a matter of
>
> System.getProperty(ZK ensemble coordinates|root)
>
> Prior to go in that direction: I don't know/remember if there's some ZK
> Solr specific class where they can be asked. If that class exists, it would
> be a better way, otherwise you can go with the system property approach.
>
> Andrea
>
> On Thu, 29 Aug 2019, 21:32 Arnold Bronley, 
> wrote:
>
> > @Andrea: I agree with you. Do you know if there is a way to initialize
> > SolrCloudClient directly from some information that I get
> > from SolrQueryRequest or from AddUpdateCommand object?
> >
> > @Erick: Thank you for the information about
> > StatelessScriptUpdateProcessorFactory.
> >
> > "In your situation, add this _before_ the update is distributed and
> instead
> > of
> > coreB, ask for collectionB."
> >
> > Right, but how do I ask for for collectionB?
> >
> > "Next, you want to get the value from “coreB”. Don’t do that, get it from
> > _collection_ B."
> >
> > Right, but how do I get value _collection_B?
> >
> >
> >
> > On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson 
> > wrote:
> >
> > > Have you looked at using one of the update processors?
> > >
> > > Consider StatelessScriptUpdateProcessorFactory for instance. You can do
> > > anything
> > > you’d like to do in a script (Groovy, Postscript. Python I think, and
> > > others). See:
> > > ./example/files/conf/update-script.js for one example.
> > >
> > > You put it in your solrconfig file in the update handler, then put the
> > > script in your
> > > conf directory and push it to ZK and the rest is automagical.
> > >
> > > There are a bunch of other update processors that you can use that are
> > also
> > > pretty much by configuration, but the one I referenced is the one that
> is
> > > the
> > > most general-purpose.
> > >
> > > In your situation, add this _before_ the update is distributed and
> > instead
> > > of
> > > coreB, ask for collectionB.
> > >
> > > Distributed updates go like this:
> > > 1. the doc gets routed to a leader for a shard
> > > 2. the doc gets forwarded to each replica.
> > >
> > > Now, depending on where you put the update processor (and you’ll have
> to
> > > dig a bit. Much of this distribution logic is implicit, but you can
> > > explicitly
> > > define it in solrconfig.xml), this either happens  _before_ the docs
> are
> > > sent
> > > to the rest of the replicas or _after_ the docs arrive at each replica.
> > > From what
> > > you’ve described, you want to do this before distribution so all copies
> > > have
> > > the new field. You don’t care what replica is the leader. You don’t
> care
> > > how many
> > > other replicas exist or where they are. You don’t even care if there’s
> > any
> > > replica hosting this particular collection on the node that does this,
> it
> > > happens
> > > before distribution.
> > >
> > > Next, you want to get the value from “coreB”. Don’t do that, get it
> from
> > > _collection_ B. Since you have the doc ID (presumably the ),
> > > using get-by-id instead of a standard query will be very efficient. I
> can
> > > imagine
> > > under very heavy load this might introduce too much overhead, but it’s
> > > where I’d start.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Aug 29, 2019, at 1:45 PM, Arnold Bronley  >
> > > wrote:
> > > >
> > > > I can't use  CloudSolrClient  because I need to intercept the
> incoming
> > > > indexing request and then add one more field to it. All this happens
> on
> > > > Solr side and not client side.
> > > >
> > > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini <
> a.gazzar...@sease.io
> > >
> > > > wrote:
> > > >
> > > >> Hi Arnold,
> > > >> why don't you use solrj (in this case a CloudSolrClient) instead of
> > > dealing
> > > >> with such low-level details? The actual location of the document you
> > are
> > > >> looking for would be completely abstracted.
> > > >>
> > > >> Best,
> > > >> Andrea
> > > >>
> > > >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley,  >
> > > >> wrote:
> > > >>
> > > >>> So, here is the problem that I am trying to solve. I am moving from
> > > Solr
> > > >>> master-slave architecture to SolrCloud architecture. I have one
> > custom
> > > >> Solr
> > > >>> plugin that does following:
> > > >>>
> > > >>> 1. When a document (say document with unique id doc1)is getting
> > indexed
> > > >> to
> > > >>> a core say core A then this plugin adds one more field to the
> > indexing
> > > >>> request. It fetches this new 

Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Andrea Gazzarini
I remember ZK coordinates (hosts, ports and root) are set as system
properties in Solr nodes (please open the admin console and see their
names). So, it would be just a matter of

System.getProperty(ZK ensemble coordinates|root)

Prior to go in that direction: I don't know/remember if there's some ZK
Solr specific class where they can be asked. If that class exists, it would
be a better way, otherwise you can go with the system property approach.

Andrea

On Thu, 29 Aug 2019, 21:32 Arnold Bronley,  wrote:

> @Andrea: I agree with you. Do you know if there is a way to initialize
> SolrCloudClient directly from some information that I get
> from SolrQueryRequest or from AddUpdateCommand object?
>
> @Erick: Thank you for the information about
> StatelessScriptUpdateProcessorFactory.
>
> "In your situation, add this _before_ the update is distributed and instead
> of
> coreB, ask for collectionB."
>
> Right, but how do I ask for for collectionB?
>
> "Next, you want to get the value from “coreB”. Don’t do that, get it from
> _collection_ B."
>
> Right, but how do I get value _collection_B?
>
>
>
> On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson 
> wrote:
>
> > Have you looked at using one of the update processors?
> >
> > Consider StatelessScriptUpdateProcessorFactory for instance. You can do
> > anything
> > you’d like to do in a script (Groovy, Postscript. Python I think, and
> > others). See:
> > ./example/files/conf/update-script.js for one example.
> >
> > You put it in your solrconfig file in the update handler, then put the
> > script in your
> > conf directory and push it to ZK and the rest is automagical.
> >
> > There are a bunch of other update processors that you can use that are
> also
> > pretty much by configuration, but the one I referenced is the one that is
> > the
> > most general-purpose.
> >
> > In your situation, add this _before_ the update is distributed and
> instead
> > of
> > coreB, ask for collectionB.
> >
> > Distributed updates go like this:
> > 1. the doc gets routed to a leader for a shard
> > 2. the doc gets forwarded to each replica.
> >
> > Now, depending on where you put the update processor (and you’ll have to
> > dig a bit. Much of this distribution logic is implicit, but you can
> > explicitly
> > define it in solrconfig.xml), this either happens  _before_ the docs are
> > sent
> > to the rest of the replicas or _after_ the docs arrive at each replica.
> > From what
> > you’ve described, you want to do this before distribution so all copies
> > have
> > the new field. You don’t care what replica is the leader. You don’t care
> > how many
> > other replicas exist or where they are. You don’t even care if there’s
> any
> > replica hosting this particular collection on the node that does this, it
> > happens
> > before distribution.
> >
> > Next, you want to get the value from “coreB”. Don’t do that, get it from
> > _collection_ B. Since you have the doc ID (presumably the ),
> > using get-by-id instead of a standard query will be very efficient. I can
> > imagine
> > under very heavy load this might introduce too much overhead, but it’s
> > where I’d start.
> >
> > Best,
> > Erick
> >
> > > On Aug 29, 2019, at 1:45 PM, Arnold Bronley 
> > wrote:
> > >
> > > I can't use  CloudSolrClient  because I need to intercept the incoming
> > > indexing request and then add one more field to it. All this happens on
> > > Solr side and not client side.
> > >
> > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini  >
> > > wrote:
> > >
> > >> Hi Arnold,
> > >> why don't you use solrj (in this case a CloudSolrClient) instead of
> > dealing
> > >> with such low-level details? The actual location of the document you
> are
> > >> looking for would be completely abstracted.
> > >>
> > >> Best,
> > >> Andrea
> > >>
> > >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, 
> > >> wrote:
> > >>
> > >>> So, here is the problem that I am trying to solve. I am moving from
> > Solr
> > >>> master-slave architecture to SolrCloud architecture. I have one
> custom
> > >> Solr
> > >>> plugin that does following:
> > >>>
> > >>> 1. When a document (say document with unique id doc1)is getting
> indexed
> > >> to
> > >>> a core say core A then this plugin adds one more field to the
> indexing
> > >>> request. It fetches this new field from core B. Core B in our case
> > >>> maintains popularity score field for each document which gets
> > calculated
> > >> in
> > >>> a different project. It fetches the popularity score from score B for
> > >> doc1
> > >>> and adds it to indexing request.
> > >>> 2. In following code, dataInfo.dataSource is the name of the core B.
> > >>>
> > >>> I can use the name of the core B like collection_shard1_replica_n21
> and
> > >> it
> > >>> works. But it is not a good solution. What if I had a multiple shards
> > for
> > >>> core B? In that case the the doc1 that I am trying to find might not
> be
> > >>> present in collection_shard1_replica_n21.
> > >>>
> > >>> So is there something 

Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Erick Erickson
Just like any other SolrCloud request. Simplest case is to fire an HTTP
request from the update processor just like you would from a browser.



> On Aug 29, 2019, at 3:31 PM, Arnold Bronley  wrote:
> 
> @Andrea: I agree with you. Do you know if there is a way to initialize
> SolrCloudClient directly from some information that I get
> from SolrQueryRequest or from AddUpdateCommand object?
> 
> @Erick: Thank you for the information about
> StatelessScriptUpdateProcessorFactory.
> 
> "In your situation, add this _before_ the update is distributed and instead
> of
> coreB, ask for collectionB."
> 
> Right, but how do I ask for for collectionB?
> 
> "Next, you want to get the value from “coreB”. Don’t do that, get it from
> _collection_ B."
> 
> Right, but how do I get value _collection_B?
> 
> 
> 
> On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson 
> wrote:
> 
>> Have you looked at using one of the update processors?
>> 
>> Consider StatelessScriptUpdateProcessorFactory for instance. You can do
>> anything
>> you’d like to do in a script (Groovy, Postscript. Python I think, and
>> others). See:
>> ./example/files/conf/update-script.js for one example.
>> 
>> You put it in your solrconfig file in the update handler, then put the
>> script in your
>> conf directory and push it to ZK and the rest is automagical.
>> 
>> There are a bunch of other update processors that you can use that are also
>> pretty much by configuration, but the one I referenced is the one that is
>> the
>> most general-purpose.
>> 
>> In your situation, add this _before_ the update is distributed and instead
>> of
>> coreB, ask for collectionB.
>> 
>> Distributed updates go like this:
>> 1. the doc gets routed to a leader for a shard
>> 2. the doc gets forwarded to each replica.
>> 
>> Now, depending on where you put the update processor (and you’ll have to
>> dig a bit. Much of this distribution logic is implicit, but you can
>> explicitly
>> define it in solrconfig.xml), this either happens  _before_ the docs are
>> sent
>> to the rest of the replicas or _after_ the docs arrive at each replica.
>> From what
>> you’ve described, you want to do this before distribution so all copies
>> have
>> the new field. You don’t care what replica is the leader. You don’t care
>> how many
>> other replicas exist or where they are. You don’t even care if there’s any
>> replica hosting this particular collection on the node that does this, it
>> happens
>> before distribution.
>> 
>> Next, you want to get the value from “coreB”. Don’t do that, get it from
>> _collection_ B. Since you have the doc ID (presumably the ),
>> using get-by-id instead of a standard query will be very efficient. I can
>> imagine
>> under very heavy load this might introduce too much overhead, but it’s
>> where I’d start.
>> 
>> Best,
>> Erick
>> 
>>> On Aug 29, 2019, at 1:45 PM, Arnold Bronley 
>> wrote:
>>> 
>>> I can't use  CloudSolrClient  because I need to intercept the incoming
>>> indexing request and then add one more field to it. All this happens on
>>> Solr side and not client side.
>>> 
>>> On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini 
>>> wrote:
>>> 
 Hi Arnold,
 why don't you use solrj (in this case a CloudSolrClient) instead of
>> dealing
 with such low-level details? The actual location of the document you are
 looking for would be completely abstracted.
 
 Best,
 Andrea
 
 On Thu, 29 Aug 2019, 18:50 Arnold Bronley, 
 wrote:
 
> So, here is the problem that I am trying to solve. I am moving from
>> Solr
> master-slave architecture to SolrCloud architecture. I have one custom
 Solr
> plugin that does following:
> 
> 1. When a document (say document with unique id doc1)is getting indexed
 to
> a core say core A then this plugin adds one more field to the indexing
> request. It fetches this new field from core B. Core B in our case
> maintains popularity score field for each document which gets
>> calculated
 in
> a different project. It fetches the popularity score from score B for
 doc1
> and adds it to indexing request.
> 2. In following code, dataInfo.dataSource is the name of the core B.
> 
> I can use the name of the core B like collection_shard1_replica_n21 and
 it
> works. But it is not a good solution. What if I had a multiple shards
>> for
> core B? In that case the the doc1 that I am trying to find might not be
> present in collection_shard1_replica_n21.
> 
> So is there something like,
> 
> SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
> 
> @Override
> public void processAdd(AddUpdateCommand cmd) throws IOException {
>  SolrInputDocument doc = cmd.getSolrInputDocument();
>  String uniqueId = getUniqueId(doc);
> 
>  SolrCore dataCore =
> req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
> 
>  if (dataCore == null){
>  

Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Arnold Bronley
@Andrea: I agree with you. Do you know if there is a way to initialize
SolrCloudClient directly from some information that I get
from SolrQueryRequest or from AddUpdateCommand object?

@Erick: Thank you for the information about
StatelessScriptUpdateProcessorFactory.

"In your situation, add this _before_ the update is distributed and instead
of
coreB, ask for collectionB."

Right, but how do I ask for for collectionB?

"Next, you want to get the value from “coreB”. Don’t do that, get it from
_collection_ B."

Right, but how do I get value _collection_B?



On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson 
wrote:

> Have you looked at using one of the update processors?
>
> Consider StatelessScriptUpdateProcessorFactory for instance. You can do
> anything
> you’d like to do in a script (Groovy, Postscript. Python I think, and
> others). See:
> ./example/files/conf/update-script.js for one example.
>
> You put it in your solrconfig file in the update handler, then put the
> script in your
> conf directory and push it to ZK and the rest is automagical.
>
> There are a bunch of other update processors that you can use that are also
> pretty much by configuration, but the one I referenced is the one that is
> the
> most general-purpose.
>
> In your situation, add this _before_ the update is distributed and instead
> of
> coreB, ask for collectionB.
>
> Distributed updates go like this:
> 1. the doc gets routed to a leader for a shard
> 2. the doc gets forwarded to each replica.
>
> Now, depending on where you put the update processor (and you’ll have to
> dig a bit. Much of this distribution logic is implicit, but you can
> explicitly
> define it in solrconfig.xml), this either happens  _before_ the docs are
> sent
> to the rest of the replicas or _after_ the docs arrive at each replica.
> From what
> you’ve described, you want to do this before distribution so all copies
> have
> the new field. You don’t care what replica is the leader. You don’t care
> how many
> other replicas exist or where they are. You don’t even care if there’s any
> replica hosting this particular collection on the node that does this, it
> happens
> before distribution.
>
> Next, you want to get the value from “coreB”. Don’t do that, get it from
> _collection_ B. Since you have the doc ID (presumably the ),
> using get-by-id instead of a standard query will be very efficient. I can
> imagine
> under very heavy load this might introduce too much overhead, but it’s
> where I’d start.
>
> Best,
> Erick
>
> > On Aug 29, 2019, at 1:45 PM, Arnold Bronley 
> wrote:
> >
> > I can't use  CloudSolrClient  because I need to intercept the incoming
> > indexing request and then add one more field to it. All this happens on
> > Solr side and not client side.
> >
> > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini 
> > wrote:
> >
> >> Hi Arnold,
> >> why don't you use solrj (in this case a CloudSolrClient) instead of
> dealing
> >> with such low-level details? The actual location of the document you are
> >> looking for would be completely abstracted.
> >>
> >> Best,
> >> Andrea
> >>
> >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, 
> >> wrote:
> >>
> >>> So, here is the problem that I am trying to solve. I am moving from
> Solr
> >>> master-slave architecture to SolrCloud architecture. I have one custom
> >> Solr
> >>> plugin that does following:
> >>>
> >>> 1. When a document (say document with unique id doc1)is getting indexed
> >> to
> >>> a core say core A then this plugin adds one more field to the indexing
> >>> request. It fetches this new field from core B. Core B in our case
> >>> maintains popularity score field for each document which gets
> calculated
> >> in
> >>> a different project. It fetches the popularity score from score B for
> >> doc1
> >>> and adds it to indexing request.
> >>> 2. In following code, dataInfo.dataSource is the name of the core B.
> >>>
> >>> I can use the name of the core B like collection_shard1_replica_n21 and
> >> it
> >>> works. But it is not a good solution. What if I had a multiple shards
> for
> >>> core B? In that case the the doc1 that I am trying to find might not be
> >>> present in collection_shard1_replica_n21.
> >>>
> >>> So is there something like,
> >>>
> >>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
> >>>
> >>> @Override
> >>> public void processAdd(AddUpdateCommand cmd) throws IOException {
> >>>   SolrInputDocument doc = cmd.getSolrInputDocument();
> >>>   String uniqueId = getUniqueId(doc);
> >>>
> >>>   SolrCore dataCore =
> >>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
> >>>
> >>>   if (dataCore == null){
> >>>   LOG.error("Solr core '{}' to use as data source could not be
> >>> found!  "
> >>>   + "Please check if it is loaded.", dataInfo.dataSource);
> >>>   } else{
> >>>
> >>>  Document sourceDoc = getSourceDocument(dataCore, uniqueId);
> >>>
> >>>  if (sourceDoc != null){
> >>>
> >>>  

Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Erick Erickson
Have you looked at using one of the update processors? 

Consider StatelessScriptUpdateProcessorFactory for instance. You can do anything
you’d like to do in a script (Groovy, Postscript. Python I think, and others). 
See:
./example/files/conf/update-script.js for one example.

You put it in your solrconfig file in the update handler, then put the script 
in your
conf directory and push it to ZK and the rest is automagical.

There are a bunch of other update processors that you can use that are also
pretty much by configuration, but the one I referenced is the one that is the
most general-purpose.

In your situation, add this _before_ the update is distributed and instead of
coreB, ask for collectionB.

Distributed updates go like this:
1. the doc gets routed to a leader for a shard
2. the doc gets forwarded to each replica.

Now, depending on where you put the update processor (and you’ll have to 
dig a bit. Much of this distribution logic is implicit, but you can explicitly
define it in solrconfig.xml), this either happens  _before_ the docs are sent
to the rest of the replicas or _after_ the docs arrive at each replica. From 
what
you’ve described, you want to do this before distribution so all copies have
the new field. You don’t care what replica is the leader. You don’t care how 
many
other replicas exist or where they are. You don’t even care if there’s any
replica hosting this particular collection on the node that does this, it 
happens
before distribution.

Next, you want to get the value from “coreB”. Don’t do that, get it from 
_collection_ B. Since you have the doc ID (presumably the ),
using get-by-id instead of a standard query will be very efficient. I can 
imagine
under very heavy load this might introduce too much overhead, but it’s
where I’d start.

Best,
Erick

> On Aug 29, 2019, at 1:45 PM, Arnold Bronley  wrote:
> 
> I can't use  CloudSolrClient  because I need to intercept the incoming
> indexing request and then add one more field to it. All this happens on
> Solr side and not client side.
> 
> On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini 
> wrote:
> 
>> Hi Arnold,
>> why don't you use solrj (in this case a CloudSolrClient) instead of dealing
>> with such low-level details? The actual location of the document you are
>> looking for would be completely abstracted.
>> 
>> Best,
>> Andrea
>> 
>> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, 
>> wrote:
>> 
>>> So, here is the problem that I am trying to solve. I am moving from Solr
>>> master-slave architecture to SolrCloud architecture. I have one custom
>> Solr
>>> plugin that does following:
>>> 
>>> 1. When a document (say document with unique id doc1)is getting indexed
>> to
>>> a core say core A then this plugin adds one more field to the indexing
>>> request. It fetches this new field from core B. Core B in our case
>>> maintains popularity score field for each document which gets calculated
>> in
>>> a different project. It fetches the popularity score from score B for
>> doc1
>>> and adds it to indexing request.
>>> 2. In following code, dataInfo.dataSource is the name of the core B.
>>> 
>>> I can use the name of the core B like collection_shard1_replica_n21 and
>> it
>>> works. But it is not a good solution. What if I had a multiple shards for
>>> core B? In that case the the doc1 that I am trying to find might not be
>>> present in collection_shard1_replica_n21.
>>> 
>>> So is there something like,
>>> 
>>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
>>> 
>>> @Override
>>> public void processAdd(AddUpdateCommand cmd) throws IOException {
>>>   SolrInputDocument doc = cmd.getSolrInputDocument();
>>>   String uniqueId = getUniqueId(doc);
>>> 
>>>   SolrCore dataCore =
>>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
>>> 
>>>   if (dataCore == null){
>>>   LOG.error("Solr core '{}' to use as data source could not be
>>> found!  "
>>>   + "Please check if it is loaded.", dataInfo.dataSource);
>>>   } else{
>>> 
>>>  Document sourceDoc = getSourceDocument(dataCore, uniqueId);
>>> 
>>>  if (sourceDoc != null){
>>> 
>>>  populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
>>>  }
>>>   }
>>> 
>>>   // pass it up the chain
>>>   super.processAdd(cmd);
>>> }
>>> 
>>> 
>>> On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson 
>>> wrote:
>>> 
 No, you cannot just use the collection name. Replicas are just cores.
 You can host many replicas of a single collection on a single Solr node
 in a single CoreContainer (there’s only one per Solr JVM). If you just
 specified a collection name how would the code have any clue which
 of the possibilities to return?
 
 The name is in the form collection_shard1_replica_n21
 
 How do you know where the doc you’re working on? Put the ID through
 the hashing mechanism.
 
 This isn’t the same at all if you’re running stand-alone, then there’s
>>> only
 one name.
 

Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Andrea Gazzarini
"client" and "server" side depends just on the perspective. In my opinion
it is not a black/white and can have different shapes. In your case, I
believe your component, which is on Solr side, can plays both roles (i.e.
"server" component for collection A and client component for collection B)

Andrea

On Thu, 29 Aug 2019, 19:46 Arnold Bronley,  wrote:

> I can't use  CloudSolrClient  because I need to intercept the incoming
> indexing request and then add one more field to it. All this happens on
> Solr side and not client side.
>
> On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini 
> wrote:
>
> > Hi Arnold,
> > why don't you use solrj (in this case a CloudSolrClient) instead of
> dealing
> > with such low-level details? The actual location of the document you are
> > looking for would be completely abstracted.
> >
> > Best,
> > Andrea
> >
> > On Thu, 29 Aug 2019, 18:50 Arnold Bronley, 
> > wrote:
> >
> > > So, here is the problem that I am trying to solve. I am moving from
> Solr
> > > master-slave architecture to SolrCloud architecture. I have one custom
> > Solr
> > > plugin that does following:
> > >
> > > 1. When a document (say document with unique id doc1)is getting indexed
> > to
> > > a core say core A then this plugin adds one more field to the indexing
> > > request. It fetches this new field from core B. Core B in our case
> > > maintains popularity score field for each document which gets
> calculated
> > in
> > > a different project. It fetches the popularity score from score B for
> > doc1
> > > and adds it to indexing request.
> > > 2. In following code, dataInfo.dataSource is the name of the core B.
> > >
> > > I can use the name of the core B like collection_shard1_replica_n21 and
> > it
> > > works. But it is not a good solution. What if I had a multiple shards
> for
> > > core B? In that case the the doc1 that I am trying to find might not be
> > > present in collection_shard1_replica_n21.
> > >
> > > So is there something like,
> > >
> > > SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
> > >
> > > @Override
> > > public void processAdd(AddUpdateCommand cmd) throws IOException {
> > >SolrInputDocument doc = cmd.getSolrInputDocument();
> > >String uniqueId = getUniqueId(doc);
> > >
> > >SolrCore dataCore =
> > > req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
> > >
> > >if (dataCore == null){
> > >LOG.error("Solr core '{}' to use as data source could not be
> > > found!  "
> > >+ "Please check if it is loaded.", dataInfo.dataSource);
> > >} else{
> > >
> > >   Document sourceDoc = getSourceDocument(dataCore, uniqueId);
> > >
> > >   if (sourceDoc != null){
> > >
> > >   populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
> > >   }
> > >}
> > >
> > >// pass it up the chain
> > >super.processAdd(cmd);
> > > }
> > >
> > >
> > > On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > > > No, you cannot just use the collection name. Replicas are just cores.
> > > > You can host many replicas of a single collection on a single Solr
> node
> > > > in a single CoreContainer (there’s only one per Solr JVM). If you
> just
> > > > specified a collection name how would the code have any clue which
> > > > of the possibilities to return?
> > > >
> > > > The name is in the form collection_shard1_replica_n21
> > > >
> > > > How do you know where the doc you’re working on? Put the ID through
> > > > the hashing mechanism.
> > > >
> > > > This isn’t the same at all if you’re running stand-alone, then
> there’s
> > > only
> > > > one name.
> > > >
> > > > But as I indicated above, your ask for just using the collection name
> > > isn’t
> > > > going to work by definition.
> > > >
> > > > So perhaps this is an XY problem. You’re asking about getCore, which
> is
> > > > a very specific, low-level concept. What are you trying to do at a
> > higher
> > > > level? Why do you think you need to get a core? What do you want to
> > _do_
> > > > with the doc that you need the core it resides in?
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > > On Aug 28, 2019, at 5:28 PM, Arnold Bronley <
> arnoldbron...@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > Wait, would I need to use core name like
> > collection1_shard1_replica_n4
> > > > > etc/? Can't I use collection name? What if  I have multiple shards,
> > how
> > > > > would I know where does the document that I am working with lives
> in
> > > > > currently.
> > > > > I would rather prefer to use collection name and expect the core
> > > > > information to be abstracted out that way.
> > > > >
> > > > > On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson <
> > > erickerick...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hmmm, should work. What is your core_name? There’s strings like
> > > > >> collection1_shard1_replica_n4 and core_node6. Are you sure you’re
> > > using
> > > > the
> > > > >> 

Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Arnold Bronley
I can't use  CloudSolrClient  because I need to intercept the incoming
indexing request and then add one more field to it. All this happens on
Solr side and not client side.

On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini 
wrote:

> Hi Arnold,
> why don't you use solrj (in this case a CloudSolrClient) instead of dealing
> with such low-level details? The actual location of the document you are
> looking for would be completely abstracted.
>
> Best,
> Andrea
>
> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, 
> wrote:
>
> > So, here is the problem that I am trying to solve. I am moving from Solr
> > master-slave architecture to SolrCloud architecture. I have one custom
> Solr
> > plugin that does following:
> >
> > 1. When a document (say document with unique id doc1)is getting indexed
> to
> > a core say core A then this plugin adds one more field to the indexing
> > request. It fetches this new field from core B. Core B in our case
> > maintains popularity score field for each document which gets calculated
> in
> > a different project. It fetches the popularity score from score B for
> doc1
> > and adds it to indexing request.
> > 2. In following code, dataInfo.dataSource is the name of the core B.
> >
> > I can use the name of the core B like collection_shard1_replica_n21 and
> it
> > works. But it is not a good solution. What if I had a multiple shards for
> > core B? In that case the the doc1 that I am trying to find might not be
> > present in collection_shard1_replica_n21.
> >
> > So is there something like,
> >
> > SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
> >
> > @Override
> > public void processAdd(AddUpdateCommand cmd) throws IOException {
> >SolrInputDocument doc = cmd.getSolrInputDocument();
> >String uniqueId = getUniqueId(doc);
> >
> >SolrCore dataCore =
> > req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
> >
> >if (dataCore == null){
> >LOG.error("Solr core '{}' to use as data source could not be
> > found!  "
> >+ "Please check if it is loaded.", dataInfo.dataSource);
> >} else{
> >
> >   Document sourceDoc = getSourceDocument(dataCore, uniqueId);
> >
> >   if (sourceDoc != null){
> >
> >   populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
> >   }
> >}
> >
> >// pass it up the chain
> >super.processAdd(cmd);
> > }
> >
> >
> > On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson 
> > wrote:
> >
> > > No, you cannot just use the collection name. Replicas are just cores.
> > > You can host many replicas of a single collection on a single Solr node
> > > in a single CoreContainer (there’s only one per Solr JVM). If you just
> > > specified a collection name how would the code have any clue which
> > > of the possibilities to return?
> > >
> > > The name is in the form collection_shard1_replica_n21
> > >
> > > How do you know where the doc you’re working on? Put the ID through
> > > the hashing mechanism.
> > >
> > > This isn’t the same at all if you’re running stand-alone, then there’s
> > only
> > > one name.
> > >
> > > But as I indicated above, your ask for just using the collection name
> > isn’t
> > > going to work by definition.
> > >
> > > So perhaps this is an XY problem. You’re asking about getCore, which is
> > > a very specific, low-level concept. What are you trying to do at a
> higher
> > > level? Why do you think you need to get a core? What do you want to
> _do_
> > > with the doc that you need the core it resides in?
> > >
> > > Best,
> > > Erick
> > >
> > > > On Aug 28, 2019, at 5:28 PM, Arnold Bronley  >
> > > wrote:
> > > >
> > > > Wait, would I need to use core name like
> collection1_shard1_replica_n4
> > > > etc/? Can't I use collection name? What if  I have multiple shards,
> how
> > > > would I know where does the document that I am working with lives in
> > > > currently.
> > > > I would rather prefer to use collection name and expect the core
> > > > information to be abstracted out that way.
> > > >
> > > > On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hmmm, should work. What is your core_name? There’s strings like
> > > >> collection1_shard1_replica_n4 and core_node6. Are you sure you’re
> > using
> > > the
> > > >> right one?
> > > >>
> > > >>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley <
> arnoldbron...@gmail.com
> > >
> > > >> wrote:
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> In a custom Solr plugin code,
> > > >>> req.getCore().getCoreContainer().getCore(core_name) is returning
> null
> > > >> even
> > > >>> if core by name core_name is loaded and up in Solr. req is object
> > > >>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
> > > >>>
> > > >>> Any ideas on why this might be the case?
> > > >>
> > > >>
> > >
> > >
> >
>


Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Andrea Gazzarini
Hi Arnold,
why don't you use solrj (in this case a CloudSolrClient) instead of dealing
with such low-level details? The actual location of the document you are
looking for would be completely abstracted.

Best,
Andrea

On Thu, 29 Aug 2019, 18:50 Arnold Bronley,  wrote:

> So, here is the problem that I am trying to solve. I am moving from Solr
> master-slave architecture to SolrCloud architecture. I have one custom Solr
> plugin that does following:
>
> 1. When a document (say document with unique id doc1)is getting indexed to
> a core say core A then this plugin adds one more field to the indexing
> request. It fetches this new field from core B. Core B in our case
> maintains popularity score field for each document which gets calculated in
> a different project. It fetches the popularity score from score B for doc1
> and adds it to indexing request.
> 2. In following code, dataInfo.dataSource is the name of the core B.
>
> I can use the name of the core B like collection_shard1_replica_n21 and it
> works. But it is not a good solution. What if I had a multiple shards for
> core B? In that case the the doc1 that I am trying to find might not be
> present in collection_shard1_replica_n21.
>
> So is there something like,
>
> SolrCollecton dataCollection = getCollection(dataInfo.dataSource);
>
> @Override
> public void processAdd(AddUpdateCommand cmd) throws IOException {
>SolrInputDocument doc = cmd.getSolrInputDocument();
>String uniqueId = getUniqueId(doc);
>
>SolrCore dataCore =
> req.getCore().getCoreContainer().getCore(dataInfo.dataSource);
>
>if (dataCore == null){
>LOG.error("Solr core '{}' to use as data source could not be
> found!  "
>+ "Please check if it is loaded.", dataInfo.dataSource);
>} else{
>
>   Document sourceDoc = getSourceDocument(dataCore, uniqueId);
>
>   if (sourceDoc != null){
>
>   populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
>   }
>}
>
>// pass it up the chain
>super.processAdd(cmd);
> }
>
>
> On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson 
> wrote:
>
> > No, you cannot just use the collection name. Replicas are just cores.
> > You can host many replicas of a single collection on a single Solr node
> > in a single CoreContainer (there’s only one per Solr JVM). If you just
> > specified a collection name how would the code have any clue which
> > of the possibilities to return?
> >
> > The name is in the form collection_shard1_replica_n21
> >
> > How do you know where the doc you’re working on? Put the ID through
> > the hashing mechanism.
> >
> > This isn’t the same at all if you’re running stand-alone, then there’s
> only
> > one name.
> >
> > But as I indicated above, your ask for just using the collection name
> isn’t
> > going to work by definition.
> >
> > So perhaps this is an XY problem. You’re asking about getCore, which is
> > a very specific, low-level concept. What are you trying to do at a higher
> > level? Why do you think you need to get a core? What do you want to _do_
> > with the doc that you need the core it resides in?
> >
> > Best,
> > Erick
> >
> > > On Aug 28, 2019, at 5:28 PM, Arnold Bronley 
> > wrote:
> > >
> > > Wait, would I need to use core name like  collection1_shard1_replica_n4
> > > etc/? Can't I use collection name? What if  I have multiple shards, how
> > > would I know where does the document that I am working with lives in
> > > currently.
> > > I would rather prefer to use collection name and expect the core
> > > information to be abstracted out that way.
> > >
> > > On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> Hmmm, should work. What is your core_name? There’s strings like
> > >> collection1_shard1_replica_n4 and core_node6. Are you sure you’re
> using
> > the
> > >> right one?
> > >>
> > >>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley  >
> > >> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> In a custom Solr plugin code,
> > >>> req.getCore().getCoreContainer().getCore(core_name) is returning null
> > >> even
> > >>> if core by name core_name is loaded and up in Solr. req is object
> > >>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
> > >>>
> > >>> Any ideas on why this might be the case?
> > >>
> > >>
> >
> >
>


Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-29 Thread Arnold Bronley
So, here is the problem that I am trying to solve. I am moving from Solr
master-slave architecture to SolrCloud architecture. I have one custom Solr
plugin that does following:

1. When a document (say document with unique id doc1)is getting indexed to
a core say core A then this plugin adds one more field to the indexing
request. It fetches this new field from core B. Core B in our case
maintains popularity score field for each document which gets calculated in
a different project. It fetches the popularity score from score B for doc1
and adds it to indexing request.
2. In following code, dataInfo.dataSource is the name of the core B.

I can use the name of the core B like collection_shard1_replica_n21 and it
works. But it is not a good solution. What if I had a multiple shards for
core B? In that case the the doc1 that I am trying to find might not be
present in collection_shard1_replica_n21.

So is there something like,

SolrCollecton dataCollection = getCollection(dataInfo.dataSource);

@Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
   SolrInputDocument doc = cmd.getSolrInputDocument();
   String uniqueId = getUniqueId(doc);

   SolrCore dataCore =
req.getCore().getCoreContainer().getCore(dataInfo.dataSource);

   if (dataCore == null){
   LOG.error("Solr core '{}' to use as data source could not be found!  "
   + "Please check if it is loaded.", dataInfo.dataSource);
   } else{

  Document sourceDoc = getSourceDocument(dataCore, uniqueId);

  if (sourceDoc != null){

  populateDocToBeAddedFromSourceDoc(doc,sourceDoc);
  }
   }

   // pass it up the chain
   super.processAdd(cmd);
}


On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson 
wrote:

> No, you cannot just use the collection name. Replicas are just cores.
> You can host many replicas of a single collection on a single Solr node
> in a single CoreContainer (there’s only one per Solr JVM). If you just
> specified a collection name how would the code have any clue which
> of the possibilities to return?
>
> The name is in the form collection_shard1_replica_n21
>
> How do you know where the doc you’re working on? Put the ID through
> the hashing mechanism.
>
> This isn’t the same at all if you’re running stand-alone, then there’s only
> one name.
>
> But as I indicated above, your ask for just using the collection name isn’t
> going to work by definition.
>
> So perhaps this is an XY problem. You’re asking about getCore, which is
> a very specific, low-level concept. What are you trying to do at a higher
> level? Why do you think you need to get a core? What do you want to _do_
> with the doc that you need the core it resides in?
>
> Best,
> Erick
>
> > On Aug 28, 2019, at 5:28 PM, Arnold Bronley 
> wrote:
> >
> > Wait, would I need to use core name like  collection1_shard1_replica_n4
> > etc/? Can't I use collection name? What if  I have multiple shards, how
> > would I know where does the document that I am working with lives in
> > currently.
> > I would rather prefer to use collection name and expect the core
> > information to be abstracted out that way.
> >
> > On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson 
> > wrote:
> >
> >> Hmmm, should work. What is your core_name? There’s strings like
> >> collection1_shard1_replica_n4 and core_node6. Are you sure you’re using
> the
> >> right one?
> >>
> >>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> In a custom Solr plugin code,
> >>> req.getCore().getCoreContainer().getCore(core_name) is returning null
> >> even
> >>> if core by name core_name is loaded and up in Solr. req is object
> >>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
> >>>
> >>> Any ideas on why this might be the case?
> >>
> >>
>
>


Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-28 Thread Erick Erickson
No, you cannot just use the collection name. Replicas are just cores.
You can host many replicas of a single collection on a single Solr node
in a single CoreContainer (there’s only one per Solr JVM). If you just
specified a collection name how would the code have any clue which 
of the possibilities to return?

The name is in the form collection_shard1_replica_n21

How do you know where the doc you’re working on? Put the ID through
the hashing mechanism.

This isn’t the same at all if you’re running stand-alone, then there’s only
one name.

But as I indicated above, your ask for just using the collection name isn’t
going to work by definition.

So perhaps this is an XY problem. You’re asking about getCore, which is
a very specific, low-level concept. What are you trying to do at a higher
level? Why do you think you need to get a core? What do you want to _do_
with the doc that you need the core it resides in?

Best,
Erick

> On Aug 28, 2019, at 5:28 PM, Arnold Bronley  wrote:
> 
> Wait, would I need to use core name like  collection1_shard1_replica_n4
> etc/? Can't I use collection name? What if  I have multiple shards, how
> would I know where does the document that I am working with lives in
> currently.
> I would rather prefer to use collection name and expect the core
> information to be abstracted out that way.
> 
> On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson 
> wrote:
> 
>> Hmmm, should work. What is your core_name? There’s strings like
>> collection1_shard1_replica_n4 and core_node6. Are you sure you’re using the
>> right one?
>> 
>>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> In a custom Solr plugin code,
>>> req.getCore().getCoreContainer().getCore(core_name) is returning null
>> even
>>> if core by name core_name is loaded and up in Solr. req is object
>>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
>>> 
>>> Any ideas on why this might be the case?
>> 
>> 



Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-28 Thread Arnold Bronley
Wait, would I need to use core name like  collection1_shard1_replica_n4
etc/? Can't I use collection name? What if  I have multiple shards, how
would I know where does the document that I am working with lives in
currently.
I would rather prefer to use collection name and expect the core
information to be abstracted out that way.

On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson 
wrote:

> Hmmm, should work. What is your core_name? There’s strings like
> collection1_shard1_replica_n4 and core_node6. Are you sure you’re using the
> right one?
>
> > On Aug 28, 2019, at 3:56 PM, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > In a custom Solr plugin code,
> > req.getCore().getCoreContainer().getCore(core_name) is returning null
> even
> > if core by name core_name is loaded and up in Solr. req is object
> > of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
> >
> > Any ideas on why this might be the case?
>
>


Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-28 Thread Erick Erickson
Hmmm, should work. What is your core_name? There’s strings like 
collection1_shard1_replica_n4 and core_node6. Are you sure you’re using the 
right one?

> On Aug 28, 2019, at 3:56 PM, Arnold Bronley  wrote:
> 
> Hi,
> 
> In a custom Solr plugin code,
> req.getCore().getCoreContainer().getCore(core_name) is returning null even
> if core by name core_name is loaded and up in Solr. req is object
> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.
> 
> Any ideas on why this might be the case?



req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0

2019-08-28 Thread Arnold Bronley
Hi,

In a custom Solr plugin code,
req.getCore().getCoreContainer().getCore(core_name) is returning null even
if core by name core_name is loaded and up in Solr. req is object
of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode.

Any ideas on why this might be the case?