Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
@Andrea: Yeah, I would try to avoid getting that information from System.getProperty. I am also looking for some class that will give this information. @Erick: Is there any way to get the information about current Solr endpoint/Zk ensemble info from inside StatelessScriptUpdateProcessorFactory so that I can make that http request? On Thu, Aug 29, 2019 at 5:18 PM Andrea Gazzarini wrote: > I remember ZK coordinates (hosts, ports and root) are set as system > properties in Solr nodes (please open the admin console and see their > names). So, it would be just a matter of > > System.getProperty(ZK ensemble coordinates|root) > > Prior to go in that direction: I don't know/remember if there's some ZK > Solr specific class where they can be asked. If that class exists, it would > be a better way, otherwise you can go with the system property approach. > > Andrea > > On Thu, 29 Aug 2019, 21:32 Arnold Bronley, > wrote: > > > @Andrea: I agree with you. Do you know if there is a way to initialize > > SolrCloudClient directly from some information that I get > > from SolrQueryRequest or from AddUpdateCommand object? > > > > @Erick: Thank you for the information about > > StatelessScriptUpdateProcessorFactory. > > > > "In your situation, add this _before_ the update is distributed and > instead > > of > > coreB, ask for collectionB." > > > > Right, but how do I ask for for collectionB? > > > > "Next, you want to get the value from “coreB”. Don’t do that, get it from > > _collection_ B." > > > > Right, but how do I get value _collection_B? > > > > > > > > On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson > > wrote: > > > > > Have you looked at using one of the update processors? > > > > > > Consider StatelessScriptUpdateProcessorFactory for instance. You can do > > > anything > > > you’d like to do in a script (Groovy, Postscript. Python I think, and > > > others). See: > > > ./example/files/conf/update-script.js for one example. > > > > > > You put it in your solrconfig file in the update handler, then put the > > > script in your > > > conf directory and push it to ZK and the rest is automagical. > > > > > > There are a bunch of other update processors that you can use that are > > also > > > pretty much by configuration, but the one I referenced is the one that > is > > > the > > > most general-purpose. > > > > > > In your situation, add this _before_ the update is distributed and > > instead > > > of > > > coreB, ask for collectionB. > > > > > > Distributed updates go like this: > > > 1. the doc gets routed to a leader for a shard > > > 2. the doc gets forwarded to each replica. > > > > > > Now, depending on where you put the update processor (and you’ll have > to > > > dig a bit. Much of this distribution logic is implicit, but you can > > > explicitly > > > define it in solrconfig.xml), this either happens _before_ the docs > are > > > sent > > > to the rest of the replicas or _after_ the docs arrive at each replica. > > > From what > > > you’ve described, you want to do this before distribution so all copies > > > have > > > the new field. You don’t care what replica is the leader. You don’t > care > > > how many > > > other replicas exist or where they are. You don’t even care if there’s > > any > > > replica hosting this particular collection on the node that does this, > it > > > happens > > > before distribution. > > > > > > Next, you want to get the value from “coreB”. Don’t do that, get it > from > > > _collection_ B. Since you have the doc ID (presumably the ), > > > using get-by-id instead of a standard query will be very efficient. I > can > > > imagine > > > under very heavy load this might introduce too much overhead, but it’s > > > where I’d start. > > > > > > Best, > > > Erick > > > > > > > On Aug 29, 2019, at 1:45 PM, Arnold Bronley > > > > wrote: > > > > > > > > I can't use CloudSolrClient because I need to intercept the > incoming > > > > indexing request and then add one more field to it. All this happens > on > > > > Solr side and not client side. > > > > > > > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini < > a.gazzar...@sease.io > > > > > > > wrote: > > > > > > > >> Hi Arnold, > > > >> why don't you use solrj (in this case a CloudSolrClient) instead of > > > dealing > > > >> with such low-level details? The actual location of the document you > > are > > > >> looking for would be completely abstracted. > > > >> > > > >> Best, > > > >> Andrea > > > >> > > > >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, > > > > >> wrote: > > > >> > > > >>> So, here is the problem that I am trying to solve. I am moving from > > > Solr > > > >>> master-slave architecture to SolrCloud architecture. I have one > > custom > > > >> Solr > > > >>> plugin that does following: > > > >>> > > > >>> 1. When a document (say document with unique id doc1)is getting > > indexed > > > >> to > > > >>> a core say core A then this plugin adds one more field to the > > indexing > > > >>> request. It fetches this new
Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
I remember ZK coordinates (hosts, ports and root) are set as system properties in Solr nodes (please open the admin console and see their names). So, it would be just a matter of System.getProperty(ZK ensemble coordinates|root) Prior to go in that direction: I don't know/remember if there's some ZK Solr specific class where they can be asked. If that class exists, it would be a better way, otherwise you can go with the system property approach. Andrea On Thu, 29 Aug 2019, 21:32 Arnold Bronley, wrote: > @Andrea: I agree with you. Do you know if there is a way to initialize > SolrCloudClient directly from some information that I get > from SolrQueryRequest or from AddUpdateCommand object? > > @Erick: Thank you for the information about > StatelessScriptUpdateProcessorFactory. > > "In your situation, add this _before_ the update is distributed and instead > of > coreB, ask for collectionB." > > Right, but how do I ask for for collectionB? > > "Next, you want to get the value from “coreB”. Don’t do that, get it from > _collection_ B." > > Right, but how do I get value _collection_B? > > > > On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson > wrote: > > > Have you looked at using one of the update processors? > > > > Consider StatelessScriptUpdateProcessorFactory for instance. You can do > > anything > > you’d like to do in a script (Groovy, Postscript. Python I think, and > > others). See: > > ./example/files/conf/update-script.js for one example. > > > > You put it in your solrconfig file in the update handler, then put the > > script in your > > conf directory and push it to ZK and the rest is automagical. > > > > There are a bunch of other update processors that you can use that are > also > > pretty much by configuration, but the one I referenced is the one that is > > the > > most general-purpose. > > > > In your situation, add this _before_ the update is distributed and > instead > > of > > coreB, ask for collectionB. > > > > Distributed updates go like this: > > 1. the doc gets routed to a leader for a shard > > 2. the doc gets forwarded to each replica. > > > > Now, depending on where you put the update processor (and you’ll have to > > dig a bit. Much of this distribution logic is implicit, but you can > > explicitly > > define it in solrconfig.xml), this either happens _before_ the docs are > > sent > > to the rest of the replicas or _after_ the docs arrive at each replica. > > From what > > you’ve described, you want to do this before distribution so all copies > > have > > the new field. You don’t care what replica is the leader. You don’t care > > how many > > other replicas exist or where they are. You don’t even care if there’s > any > > replica hosting this particular collection on the node that does this, it > > happens > > before distribution. > > > > Next, you want to get the value from “coreB”. Don’t do that, get it from > > _collection_ B. Since you have the doc ID (presumably the ), > > using get-by-id instead of a standard query will be very efficient. I can > > imagine > > under very heavy load this might introduce too much overhead, but it’s > > where I’d start. > > > > Best, > > Erick > > > > > On Aug 29, 2019, at 1:45 PM, Arnold Bronley > > wrote: > > > > > > I can't use CloudSolrClient because I need to intercept the incoming > > > indexing request and then add one more field to it. All this happens on > > > Solr side and not client side. > > > > > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini > > > > wrote: > > > > > >> Hi Arnold, > > >> why don't you use solrj (in this case a CloudSolrClient) instead of > > dealing > > >> with such low-level details? The actual location of the document you > are > > >> looking for would be completely abstracted. > > >> > > >> Best, > > >> Andrea > > >> > > >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, > > >> wrote: > > >> > > >>> So, here is the problem that I am trying to solve. I am moving from > > Solr > > >>> master-slave architecture to SolrCloud architecture. I have one > custom > > >> Solr > > >>> plugin that does following: > > >>> > > >>> 1. When a document (say document with unique id doc1)is getting > indexed > > >> to > > >>> a core say core A then this plugin adds one more field to the > indexing > > >>> request. It fetches this new field from core B. Core B in our case > > >>> maintains popularity score field for each document which gets > > calculated > > >> in > > >>> a different project. It fetches the popularity score from score B for > > >> doc1 > > >>> and adds it to indexing request. > > >>> 2. In following code, dataInfo.dataSource is the name of the core B. > > >>> > > >>> I can use the name of the core B like collection_shard1_replica_n21 > and > > >> it > > >>> works. But it is not a good solution. What if I had a multiple shards > > for > > >>> core B? In that case the the doc1 that I am trying to find might not > be > > >>> present in collection_shard1_replica_n21. > > >>> > > >>> So is there something
Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
Just like any other SolrCloud request. Simplest case is to fire an HTTP request from the update processor just like you would from a browser. > On Aug 29, 2019, at 3:31 PM, Arnold Bronley wrote: > > @Andrea: I agree with you. Do you know if there is a way to initialize > SolrCloudClient directly from some information that I get > from SolrQueryRequest or from AddUpdateCommand object? > > @Erick: Thank you for the information about > StatelessScriptUpdateProcessorFactory. > > "In your situation, add this _before_ the update is distributed and instead > of > coreB, ask for collectionB." > > Right, but how do I ask for for collectionB? > > "Next, you want to get the value from “coreB”. Don’t do that, get it from > _collection_ B." > > Right, but how do I get value _collection_B? > > > > On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson > wrote: > >> Have you looked at using one of the update processors? >> >> Consider StatelessScriptUpdateProcessorFactory for instance. You can do >> anything >> you’d like to do in a script (Groovy, Postscript. Python I think, and >> others). See: >> ./example/files/conf/update-script.js for one example. >> >> You put it in your solrconfig file in the update handler, then put the >> script in your >> conf directory and push it to ZK and the rest is automagical. >> >> There are a bunch of other update processors that you can use that are also >> pretty much by configuration, but the one I referenced is the one that is >> the >> most general-purpose. >> >> In your situation, add this _before_ the update is distributed and instead >> of >> coreB, ask for collectionB. >> >> Distributed updates go like this: >> 1. the doc gets routed to a leader for a shard >> 2. the doc gets forwarded to each replica. >> >> Now, depending on where you put the update processor (and you’ll have to >> dig a bit. Much of this distribution logic is implicit, but you can >> explicitly >> define it in solrconfig.xml), this either happens _before_ the docs are >> sent >> to the rest of the replicas or _after_ the docs arrive at each replica. >> From what >> you’ve described, you want to do this before distribution so all copies >> have >> the new field. You don’t care what replica is the leader. You don’t care >> how many >> other replicas exist or where they are. You don’t even care if there’s any >> replica hosting this particular collection on the node that does this, it >> happens >> before distribution. >> >> Next, you want to get the value from “coreB”. Don’t do that, get it from >> _collection_ B. Since you have the doc ID (presumably the ), >> using get-by-id instead of a standard query will be very efficient. I can >> imagine >> under very heavy load this might introduce too much overhead, but it’s >> where I’d start. >> >> Best, >> Erick >> >>> On Aug 29, 2019, at 1:45 PM, Arnold Bronley >> wrote: >>> >>> I can't use CloudSolrClient because I need to intercept the incoming >>> indexing request and then add one more field to it. All this happens on >>> Solr side and not client side. >>> >>> On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini >>> wrote: >>> Hi Arnold, why don't you use solrj (in this case a CloudSolrClient) instead of >> dealing with such low-level details? The actual location of the document you are looking for would be completely abstracted. Best, Andrea On Thu, 29 Aug 2019, 18:50 Arnold Bronley, wrote: > So, here is the problem that I am trying to solve. I am moving from >> Solr > master-slave architecture to SolrCloud architecture. I have one custom Solr > plugin that does following: > > 1. When a document (say document with unique id doc1)is getting indexed to > a core say core A then this plugin adds one more field to the indexing > request. It fetches this new field from core B. Core B in our case > maintains popularity score field for each document which gets >> calculated in > a different project. It fetches the popularity score from score B for doc1 > and adds it to indexing request. > 2. In following code, dataInfo.dataSource is the name of the core B. > > I can use the name of the core B like collection_shard1_replica_n21 and it > works. But it is not a good solution. What if I had a multiple shards >> for > core B? In that case the the doc1 that I am trying to find might not be > present in collection_shard1_replica_n21. > > So is there something like, > > SolrCollecton dataCollection = getCollection(dataInfo.dataSource); > > @Override > public void processAdd(AddUpdateCommand cmd) throws IOException { > SolrInputDocument doc = cmd.getSolrInputDocument(); > String uniqueId = getUniqueId(doc); > > SolrCore dataCore = > req.getCore().getCoreContainer().getCore(dataInfo.dataSource); > > if (dataCore == null){ >
Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
@Andrea: I agree with you. Do you know if there is a way to initialize SolrCloudClient directly from some information that I get from SolrQueryRequest or from AddUpdateCommand object? @Erick: Thank you for the information about StatelessScriptUpdateProcessorFactory. "In your situation, add this _before_ the update is distributed and instead of coreB, ask for collectionB." Right, but how do I ask for for collectionB? "Next, you want to get the value from “coreB”. Don’t do that, get it from _collection_ B." Right, but how do I get value _collection_B? On Thu, Aug 29, 2019 at 2:17 PM Erick Erickson wrote: > Have you looked at using one of the update processors? > > Consider StatelessScriptUpdateProcessorFactory for instance. You can do > anything > you’d like to do in a script (Groovy, Postscript. Python I think, and > others). See: > ./example/files/conf/update-script.js for one example. > > You put it in your solrconfig file in the update handler, then put the > script in your > conf directory and push it to ZK and the rest is automagical. > > There are a bunch of other update processors that you can use that are also > pretty much by configuration, but the one I referenced is the one that is > the > most general-purpose. > > In your situation, add this _before_ the update is distributed and instead > of > coreB, ask for collectionB. > > Distributed updates go like this: > 1. the doc gets routed to a leader for a shard > 2. the doc gets forwarded to each replica. > > Now, depending on where you put the update processor (and you’ll have to > dig a bit. Much of this distribution logic is implicit, but you can > explicitly > define it in solrconfig.xml), this either happens _before_ the docs are > sent > to the rest of the replicas or _after_ the docs arrive at each replica. > From what > you’ve described, you want to do this before distribution so all copies > have > the new field. You don’t care what replica is the leader. You don’t care > how many > other replicas exist or where they are. You don’t even care if there’s any > replica hosting this particular collection on the node that does this, it > happens > before distribution. > > Next, you want to get the value from “coreB”. Don’t do that, get it from > _collection_ B. Since you have the doc ID (presumably the ), > using get-by-id instead of a standard query will be very efficient. I can > imagine > under very heavy load this might introduce too much overhead, but it’s > where I’d start. > > Best, > Erick > > > On Aug 29, 2019, at 1:45 PM, Arnold Bronley > wrote: > > > > I can't use CloudSolrClient because I need to intercept the incoming > > indexing request and then add one more field to it. All this happens on > > Solr side and not client side. > > > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini > > wrote: > > > >> Hi Arnold, > >> why don't you use solrj (in this case a CloudSolrClient) instead of > dealing > >> with such low-level details? The actual location of the document you are > >> looking for would be completely abstracted. > >> > >> Best, > >> Andrea > >> > >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, > >> wrote: > >> > >>> So, here is the problem that I am trying to solve. I am moving from > Solr > >>> master-slave architecture to SolrCloud architecture. I have one custom > >> Solr > >>> plugin that does following: > >>> > >>> 1. When a document (say document with unique id doc1)is getting indexed > >> to > >>> a core say core A then this plugin adds one more field to the indexing > >>> request. It fetches this new field from core B. Core B in our case > >>> maintains popularity score field for each document which gets > calculated > >> in > >>> a different project. It fetches the popularity score from score B for > >> doc1 > >>> and adds it to indexing request. > >>> 2. In following code, dataInfo.dataSource is the name of the core B. > >>> > >>> I can use the name of the core B like collection_shard1_replica_n21 and > >> it > >>> works. But it is not a good solution. What if I had a multiple shards > for > >>> core B? In that case the the doc1 that I am trying to find might not be > >>> present in collection_shard1_replica_n21. > >>> > >>> So is there something like, > >>> > >>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource); > >>> > >>> @Override > >>> public void processAdd(AddUpdateCommand cmd) throws IOException { > >>> SolrInputDocument doc = cmd.getSolrInputDocument(); > >>> String uniqueId = getUniqueId(doc); > >>> > >>> SolrCore dataCore = > >>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource); > >>> > >>> if (dataCore == null){ > >>> LOG.error("Solr core '{}' to use as data source could not be > >>> found! " > >>> + "Please check if it is loaded.", dataInfo.dataSource); > >>> } else{ > >>> > >>> Document sourceDoc = getSourceDocument(dataCore, uniqueId); > >>> > >>> if (sourceDoc != null){ > >>> > >>>
Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
Have you looked at using one of the update processors? Consider StatelessScriptUpdateProcessorFactory for instance. You can do anything you’d like to do in a script (Groovy, Postscript. Python I think, and others). See: ./example/files/conf/update-script.js for one example. You put it in your solrconfig file in the update handler, then put the script in your conf directory and push it to ZK and the rest is automagical. There are a bunch of other update processors that you can use that are also pretty much by configuration, but the one I referenced is the one that is the most general-purpose. In your situation, add this _before_ the update is distributed and instead of coreB, ask for collectionB. Distributed updates go like this: 1. the doc gets routed to a leader for a shard 2. the doc gets forwarded to each replica. Now, depending on where you put the update processor (and you’ll have to dig a bit. Much of this distribution logic is implicit, but you can explicitly define it in solrconfig.xml), this either happens _before_ the docs are sent to the rest of the replicas or _after_ the docs arrive at each replica. From what you’ve described, you want to do this before distribution so all copies have the new field. You don’t care what replica is the leader. You don’t care how many other replicas exist or where they are. You don’t even care if there’s any replica hosting this particular collection on the node that does this, it happens before distribution. Next, you want to get the value from “coreB”. Don’t do that, get it from _collection_ B. Since you have the doc ID (presumably the ), using get-by-id instead of a standard query will be very efficient. I can imagine under very heavy load this might introduce too much overhead, but it’s where I’d start. Best, Erick > On Aug 29, 2019, at 1:45 PM, Arnold Bronley wrote: > > I can't use CloudSolrClient because I need to intercept the incoming > indexing request and then add one more field to it. All this happens on > Solr side and not client side. > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini > wrote: > >> Hi Arnold, >> why don't you use solrj (in this case a CloudSolrClient) instead of dealing >> with such low-level details? The actual location of the document you are >> looking for would be completely abstracted. >> >> Best, >> Andrea >> >> On Thu, 29 Aug 2019, 18:50 Arnold Bronley, >> wrote: >> >>> So, here is the problem that I am trying to solve. I am moving from Solr >>> master-slave architecture to SolrCloud architecture. I have one custom >> Solr >>> plugin that does following: >>> >>> 1. When a document (say document with unique id doc1)is getting indexed >> to >>> a core say core A then this plugin adds one more field to the indexing >>> request. It fetches this new field from core B. Core B in our case >>> maintains popularity score field for each document which gets calculated >> in >>> a different project. It fetches the popularity score from score B for >> doc1 >>> and adds it to indexing request. >>> 2. In following code, dataInfo.dataSource is the name of the core B. >>> >>> I can use the name of the core B like collection_shard1_replica_n21 and >> it >>> works. But it is not a good solution. What if I had a multiple shards for >>> core B? In that case the the doc1 that I am trying to find might not be >>> present in collection_shard1_replica_n21. >>> >>> So is there something like, >>> >>> SolrCollecton dataCollection = getCollection(dataInfo.dataSource); >>> >>> @Override >>> public void processAdd(AddUpdateCommand cmd) throws IOException { >>> SolrInputDocument doc = cmd.getSolrInputDocument(); >>> String uniqueId = getUniqueId(doc); >>> >>> SolrCore dataCore = >>> req.getCore().getCoreContainer().getCore(dataInfo.dataSource); >>> >>> if (dataCore == null){ >>> LOG.error("Solr core '{}' to use as data source could not be >>> found! " >>> + "Please check if it is loaded.", dataInfo.dataSource); >>> } else{ >>> >>> Document sourceDoc = getSourceDocument(dataCore, uniqueId); >>> >>> if (sourceDoc != null){ >>> >>> populateDocToBeAddedFromSourceDoc(doc,sourceDoc); >>> } >>> } >>> >>> // pass it up the chain >>> super.processAdd(cmd); >>> } >>> >>> >>> On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson >>> wrote: >>> No, you cannot just use the collection name. Replicas are just cores. You can host many replicas of a single collection on a single Solr node in a single CoreContainer (there’s only one per Solr JVM). If you just specified a collection name how would the code have any clue which of the possibilities to return? The name is in the form collection_shard1_replica_n21 How do you know where the doc you’re working on? Put the ID through the hashing mechanism. This isn’t the same at all if you’re running stand-alone, then there’s >>> only one name.
Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
"client" and "server" side depends just on the perspective. In my opinion it is not a black/white and can have different shapes. In your case, I believe your component, which is on Solr side, can plays both roles (i.e. "server" component for collection A and client component for collection B) Andrea On Thu, 29 Aug 2019, 19:46 Arnold Bronley, wrote: > I can't use CloudSolrClient because I need to intercept the incoming > indexing request and then add one more field to it. All this happens on > Solr side and not client side. > > On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini > wrote: > > > Hi Arnold, > > why don't you use solrj (in this case a CloudSolrClient) instead of > dealing > > with such low-level details? The actual location of the document you are > > looking for would be completely abstracted. > > > > Best, > > Andrea > > > > On Thu, 29 Aug 2019, 18:50 Arnold Bronley, > > wrote: > > > > > So, here is the problem that I am trying to solve. I am moving from > Solr > > > master-slave architecture to SolrCloud architecture. I have one custom > > Solr > > > plugin that does following: > > > > > > 1. When a document (say document with unique id doc1)is getting indexed > > to > > > a core say core A then this plugin adds one more field to the indexing > > > request. It fetches this new field from core B. Core B in our case > > > maintains popularity score field for each document which gets > calculated > > in > > > a different project. It fetches the popularity score from score B for > > doc1 > > > and adds it to indexing request. > > > 2. In following code, dataInfo.dataSource is the name of the core B. > > > > > > I can use the name of the core B like collection_shard1_replica_n21 and > > it > > > works. But it is not a good solution. What if I had a multiple shards > for > > > core B? In that case the the doc1 that I am trying to find might not be > > > present in collection_shard1_replica_n21. > > > > > > So is there something like, > > > > > > SolrCollecton dataCollection = getCollection(dataInfo.dataSource); > > > > > > @Override > > > public void processAdd(AddUpdateCommand cmd) throws IOException { > > >SolrInputDocument doc = cmd.getSolrInputDocument(); > > >String uniqueId = getUniqueId(doc); > > > > > >SolrCore dataCore = > > > req.getCore().getCoreContainer().getCore(dataInfo.dataSource); > > > > > >if (dataCore == null){ > > >LOG.error("Solr core '{}' to use as data source could not be > > > found! " > > >+ "Please check if it is loaded.", dataInfo.dataSource); > > >} else{ > > > > > > Document sourceDoc = getSourceDocument(dataCore, uniqueId); > > > > > > if (sourceDoc != null){ > > > > > > populateDocToBeAddedFromSourceDoc(doc,sourceDoc); > > > } > > >} > > > > > >// pass it up the chain > > >super.processAdd(cmd); > > > } > > > > > > > > > On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson < > erickerick...@gmail.com> > > > wrote: > > > > > > > No, you cannot just use the collection name. Replicas are just cores. > > > > You can host many replicas of a single collection on a single Solr > node > > > > in a single CoreContainer (there’s only one per Solr JVM). If you > just > > > > specified a collection name how would the code have any clue which > > > > of the possibilities to return? > > > > > > > > The name is in the form collection_shard1_replica_n21 > > > > > > > > How do you know where the doc you’re working on? Put the ID through > > > > the hashing mechanism. > > > > > > > > This isn’t the same at all if you’re running stand-alone, then > there’s > > > only > > > > one name. > > > > > > > > But as I indicated above, your ask for just using the collection name > > > isn’t > > > > going to work by definition. > > > > > > > > So perhaps this is an XY problem. You’re asking about getCore, which > is > > > > a very specific, low-level concept. What are you trying to do at a > > higher > > > > level? Why do you think you need to get a core? What do you want to > > _do_ > > > > with the doc that you need the core it resides in? > > > > > > > > Best, > > > > Erick > > > > > > > > > On Aug 28, 2019, at 5:28 PM, Arnold Bronley < > arnoldbron...@gmail.com > > > > > > > wrote: > > > > > > > > > > Wait, would I need to use core name like > > collection1_shard1_replica_n4 > > > > > etc/? Can't I use collection name? What if I have multiple shards, > > how > > > > > would I know where does the document that I am working with lives > in > > > > > currently. > > > > > I would rather prefer to use collection name and expect the core > > > > > information to be abstracted out that way. > > > > > > > > > > On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson < > > > erickerick...@gmail.com> > > > > > wrote: > > > > > > > > > >> Hmmm, should work. What is your core_name? There’s strings like > > > > >> collection1_shard1_replica_n4 and core_node6. Are you sure you’re > > > using > > > > the > > > > >>
Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
I can't use CloudSolrClient because I need to intercept the incoming indexing request and then add one more field to it. All this happens on Solr side and not client side. On Thu, Aug 29, 2019 at 1:05 PM Andrea Gazzarini wrote: > Hi Arnold, > why don't you use solrj (in this case a CloudSolrClient) instead of dealing > with such low-level details? The actual location of the document you are > looking for would be completely abstracted. > > Best, > Andrea > > On Thu, 29 Aug 2019, 18:50 Arnold Bronley, > wrote: > > > So, here is the problem that I am trying to solve. I am moving from Solr > > master-slave architecture to SolrCloud architecture. I have one custom > Solr > > plugin that does following: > > > > 1. When a document (say document with unique id doc1)is getting indexed > to > > a core say core A then this plugin adds one more field to the indexing > > request. It fetches this new field from core B. Core B in our case > > maintains popularity score field for each document which gets calculated > in > > a different project. It fetches the popularity score from score B for > doc1 > > and adds it to indexing request. > > 2. In following code, dataInfo.dataSource is the name of the core B. > > > > I can use the name of the core B like collection_shard1_replica_n21 and > it > > works. But it is not a good solution. What if I had a multiple shards for > > core B? In that case the the doc1 that I am trying to find might not be > > present in collection_shard1_replica_n21. > > > > So is there something like, > > > > SolrCollecton dataCollection = getCollection(dataInfo.dataSource); > > > > @Override > > public void processAdd(AddUpdateCommand cmd) throws IOException { > >SolrInputDocument doc = cmd.getSolrInputDocument(); > >String uniqueId = getUniqueId(doc); > > > >SolrCore dataCore = > > req.getCore().getCoreContainer().getCore(dataInfo.dataSource); > > > >if (dataCore == null){ > >LOG.error("Solr core '{}' to use as data source could not be > > found! " > >+ "Please check if it is loaded.", dataInfo.dataSource); > >} else{ > > > > Document sourceDoc = getSourceDocument(dataCore, uniqueId); > > > > if (sourceDoc != null){ > > > > populateDocToBeAddedFromSourceDoc(doc,sourceDoc); > > } > >} > > > >// pass it up the chain > >super.processAdd(cmd); > > } > > > > > > On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson > > wrote: > > > > > No, you cannot just use the collection name. Replicas are just cores. > > > You can host many replicas of a single collection on a single Solr node > > > in a single CoreContainer (there’s only one per Solr JVM). If you just > > > specified a collection name how would the code have any clue which > > > of the possibilities to return? > > > > > > The name is in the form collection_shard1_replica_n21 > > > > > > How do you know where the doc you’re working on? Put the ID through > > > the hashing mechanism. > > > > > > This isn’t the same at all if you’re running stand-alone, then there’s > > only > > > one name. > > > > > > But as I indicated above, your ask for just using the collection name > > isn’t > > > going to work by definition. > > > > > > So perhaps this is an XY problem. You’re asking about getCore, which is > > > a very specific, low-level concept. What are you trying to do at a > higher > > > level? Why do you think you need to get a core? What do you want to > _do_ > > > with the doc that you need the core it resides in? > > > > > > Best, > > > Erick > > > > > > > On Aug 28, 2019, at 5:28 PM, Arnold Bronley > > > > wrote: > > > > > > > > Wait, would I need to use core name like > collection1_shard1_replica_n4 > > > > etc/? Can't I use collection name? What if I have multiple shards, > how > > > > would I know where does the document that I am working with lives in > > > > currently. > > > > I would rather prefer to use collection name and expect the core > > > > information to be abstracted out that way. > > > > > > > > On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson < > > erickerick...@gmail.com> > > > > wrote: > > > > > > > >> Hmmm, should work. What is your core_name? There’s strings like > > > >> collection1_shard1_replica_n4 and core_node6. Are you sure you’re > > using > > > the > > > >> right one? > > > >> > > > >>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley < > arnoldbron...@gmail.com > > > > > > >> wrote: > > > >>> > > > >>> Hi, > > > >>> > > > >>> In a custom Solr plugin code, > > > >>> req.getCore().getCoreContainer().getCore(core_name) is returning > null > > > >> even > > > >>> if core by name core_name is loaded and up in Solr. req is object > > > >>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode. > > > >>> > > > >>> Any ideas on why this might be the case? > > > >> > > > >> > > > > > > > > >
Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
Hi Arnold, why don't you use solrj (in this case a CloudSolrClient) instead of dealing with such low-level details? The actual location of the document you are looking for would be completely abstracted. Best, Andrea On Thu, 29 Aug 2019, 18:50 Arnold Bronley, wrote: > So, here is the problem that I am trying to solve. I am moving from Solr > master-slave architecture to SolrCloud architecture. I have one custom Solr > plugin that does following: > > 1. When a document (say document with unique id doc1)is getting indexed to > a core say core A then this plugin adds one more field to the indexing > request. It fetches this new field from core B. Core B in our case > maintains popularity score field for each document which gets calculated in > a different project. It fetches the popularity score from score B for doc1 > and adds it to indexing request. > 2. In following code, dataInfo.dataSource is the name of the core B. > > I can use the name of the core B like collection_shard1_replica_n21 and it > works. But it is not a good solution. What if I had a multiple shards for > core B? In that case the the doc1 that I am trying to find might not be > present in collection_shard1_replica_n21. > > So is there something like, > > SolrCollecton dataCollection = getCollection(dataInfo.dataSource); > > @Override > public void processAdd(AddUpdateCommand cmd) throws IOException { >SolrInputDocument doc = cmd.getSolrInputDocument(); >String uniqueId = getUniqueId(doc); > >SolrCore dataCore = > req.getCore().getCoreContainer().getCore(dataInfo.dataSource); > >if (dataCore == null){ >LOG.error("Solr core '{}' to use as data source could not be > found! " >+ "Please check if it is loaded.", dataInfo.dataSource); >} else{ > > Document sourceDoc = getSourceDocument(dataCore, uniqueId); > > if (sourceDoc != null){ > > populateDocToBeAddedFromSourceDoc(doc,sourceDoc); > } >} > >// pass it up the chain >super.processAdd(cmd); > } > > > On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson > wrote: > > > No, you cannot just use the collection name. Replicas are just cores. > > You can host many replicas of a single collection on a single Solr node > > in a single CoreContainer (there’s only one per Solr JVM). If you just > > specified a collection name how would the code have any clue which > > of the possibilities to return? > > > > The name is in the form collection_shard1_replica_n21 > > > > How do you know where the doc you’re working on? Put the ID through > > the hashing mechanism. > > > > This isn’t the same at all if you’re running stand-alone, then there’s > only > > one name. > > > > But as I indicated above, your ask for just using the collection name > isn’t > > going to work by definition. > > > > So perhaps this is an XY problem. You’re asking about getCore, which is > > a very specific, low-level concept. What are you trying to do at a higher > > level? Why do you think you need to get a core? What do you want to _do_ > > with the doc that you need the core it resides in? > > > > Best, > > Erick > > > > > On Aug 28, 2019, at 5:28 PM, Arnold Bronley > > wrote: > > > > > > Wait, would I need to use core name like collection1_shard1_replica_n4 > > > etc/? Can't I use collection name? What if I have multiple shards, how > > > would I know where does the document that I am working with lives in > > > currently. > > > I would rather prefer to use collection name and expect the core > > > information to be abstracted out that way. > > > > > > On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson < > erickerick...@gmail.com> > > > wrote: > > > > > >> Hmmm, should work. What is your core_name? There’s strings like > > >> collection1_shard1_replica_n4 and core_node6. Are you sure you’re > using > > the > > >> right one? > > >> > > >>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley > > > >> wrote: > > >>> > > >>> Hi, > > >>> > > >>> In a custom Solr plugin code, > > >>> req.getCore().getCoreContainer().getCore(core_name) is returning null > > >> even > > >>> if core by name core_name is loaded and up in Solr. req is object > > >>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode. > > >>> > > >>> Any ideas on why this might be the case? > > >> > > >> > > > > >
Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
So, here is the problem that I am trying to solve. I am moving from Solr master-slave architecture to SolrCloud architecture. I have one custom Solr plugin that does following: 1. When a document (say document with unique id doc1)is getting indexed to a core say core A then this plugin adds one more field to the indexing request. It fetches this new field from core B. Core B in our case maintains popularity score field for each document which gets calculated in a different project. It fetches the popularity score from score B for doc1 and adds it to indexing request. 2. In following code, dataInfo.dataSource is the name of the core B. I can use the name of the core B like collection_shard1_replica_n21 and it works. But it is not a good solution. What if I had a multiple shards for core B? In that case the the doc1 that I am trying to find might not be present in collection_shard1_replica_n21. So is there something like, SolrCollecton dataCollection = getCollection(dataInfo.dataSource); @Override public void processAdd(AddUpdateCommand cmd) throws IOException { SolrInputDocument doc = cmd.getSolrInputDocument(); String uniqueId = getUniqueId(doc); SolrCore dataCore = req.getCore().getCoreContainer().getCore(dataInfo.dataSource); if (dataCore == null){ LOG.error("Solr core '{}' to use as data source could not be found! " + "Please check if it is loaded.", dataInfo.dataSource); } else{ Document sourceDoc = getSourceDocument(dataCore, uniqueId); if (sourceDoc != null){ populateDocToBeAddedFromSourceDoc(doc,sourceDoc); } } // pass it up the chain super.processAdd(cmd); } On Wed, Aug 28, 2019 at 6:15 PM Erick Erickson wrote: > No, you cannot just use the collection name. Replicas are just cores. > You can host many replicas of a single collection on a single Solr node > in a single CoreContainer (there’s only one per Solr JVM). If you just > specified a collection name how would the code have any clue which > of the possibilities to return? > > The name is in the form collection_shard1_replica_n21 > > How do you know where the doc you’re working on? Put the ID through > the hashing mechanism. > > This isn’t the same at all if you’re running stand-alone, then there’s only > one name. > > But as I indicated above, your ask for just using the collection name isn’t > going to work by definition. > > So perhaps this is an XY problem. You’re asking about getCore, which is > a very specific, low-level concept. What are you trying to do at a higher > level? Why do you think you need to get a core? What do you want to _do_ > with the doc that you need the core it resides in? > > Best, > Erick > > > On Aug 28, 2019, at 5:28 PM, Arnold Bronley > wrote: > > > > Wait, would I need to use core name like collection1_shard1_replica_n4 > > etc/? Can't I use collection name? What if I have multiple shards, how > > would I know where does the document that I am working with lives in > > currently. > > I would rather prefer to use collection name and expect the core > > information to be abstracted out that way. > > > > On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson > > wrote: > > > >> Hmmm, should work. What is your core_name? There’s strings like > >> collection1_shard1_replica_n4 and core_node6. Are you sure you’re using > the > >> right one? > >> > >>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley > >> wrote: > >>> > >>> Hi, > >>> > >>> In a custom Solr plugin code, > >>> req.getCore().getCoreContainer().getCore(core_name) is returning null > >> even > >>> if core by name core_name is loaded and up in Solr. req is object > >>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode. > >>> > >>> Any ideas on why this might be the case? > >> > >> > >
Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
No, you cannot just use the collection name. Replicas are just cores. You can host many replicas of a single collection on a single Solr node in a single CoreContainer (there’s only one per Solr JVM). If you just specified a collection name how would the code have any clue which of the possibilities to return? The name is in the form collection_shard1_replica_n21 How do you know where the doc you’re working on? Put the ID through the hashing mechanism. This isn’t the same at all if you’re running stand-alone, then there’s only one name. But as I indicated above, your ask for just using the collection name isn’t going to work by definition. So perhaps this is an XY problem. You’re asking about getCore, which is a very specific, low-level concept. What are you trying to do at a higher level? Why do you think you need to get a core? What do you want to _do_ with the doc that you need the core it resides in? Best, Erick > On Aug 28, 2019, at 5:28 PM, Arnold Bronley wrote: > > Wait, would I need to use core name like collection1_shard1_replica_n4 > etc/? Can't I use collection name? What if I have multiple shards, how > would I know where does the document that I am working with lives in > currently. > I would rather prefer to use collection name and expect the core > information to be abstracted out that way. > > On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson > wrote: > >> Hmmm, should work. What is your core_name? There’s strings like >> collection1_shard1_replica_n4 and core_node6. Are you sure you’re using the >> right one? >> >>> On Aug 28, 2019, at 3:56 PM, Arnold Bronley >> wrote: >>> >>> Hi, >>> >>> In a custom Solr plugin code, >>> req.getCore().getCoreContainer().getCore(core_name) is returning null >> even >>> if core by name core_name is loaded and up in Solr. req is object >>> of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode. >>> >>> Any ideas on why this might be the case? >> >>
Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
Wait, would I need to use core name like collection1_shard1_replica_n4 etc/? Can't I use collection name? What if I have multiple shards, how would I know where does the document that I am working with lives in currently. I would rather prefer to use collection name and expect the core information to be abstracted out that way. On Wed, Aug 28, 2019 at 5:13 PM Erick Erickson wrote: > Hmmm, should work. What is your core_name? There’s strings like > collection1_shard1_replica_n4 and core_node6. Are you sure you’re using the > right one? > > > On Aug 28, 2019, at 3:56 PM, Arnold Bronley > wrote: > > > > Hi, > > > > In a custom Solr plugin code, > > req.getCore().getCoreContainer().getCore(core_name) is returning null > even > > if core by name core_name is loaded and up in Solr. req is object > > of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode. > > > > Any ideas on why this might be the case? > >
Re: req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
Hmmm, should work. What is your core_name? There’s strings like collection1_shard1_replica_n4 and core_node6. Are you sure you’re using the right one? > On Aug 28, 2019, at 3:56 PM, Arnold Bronley wrote: > > Hi, > > In a custom Solr plugin code, > req.getCore().getCoreContainer().getCore(core_name) is returning null even > if core by name core_name is loaded and up in Solr. req is object > of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode. > > Any ideas on why this might be the case?
req.getCore().getCoreContainer().getCore(core_name) is returning null - Solr 8.2.0
Hi, In a custom Solr plugin code, req.getCore().getCoreContainer().getCore(core_name) is returning null even if core by name core_name is loaded and up in Solr. req is object of SolrQueryRequest class. I am using Solr 8.2.0 in SolrCloud mode. Any ideas on why this might be the case?