Re: Allow Join over two sharded collection
This doesn't appear to be being actively pursued, so it's anybody's guess. Depending on your use-case, the streaming capabilities may be an OOB solution. Best, Erick On Wed, Feb 6, 2019 at 1:22 AM mganeshs wrote: > > All, > > Any idea, whether this will be taken care or addressed in near future ? > > https://issues.apache.org/jira/browse/SOLR-8297 > > Regards, > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Allow Join over two sharded collection
All, Any idea, whether this will be taken care or addressed in near future ? https://issues.apache.org/jira/browse/SOLR-8297 Regards, -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Allow Join over two sharded collection
How are you planing to manual route? What key(s) are you thinking to use. Second the link i shared was collection aliasing and if you use that, you will end up with multiple collections. Just want to clarify as you said above "...manual routing and creating alias" Again until the join feature is available across shards, you can still continue with one shard (and replica's if needed). 20M + 1M/per month shouldn't be a big deal. Thanks, Susheel On Mon, Jul 3, 2017 at 11:16 PM, mganeshs <mgane...@live.in> wrote: > Hi Susheel, > > To make use of Joins only option is I should go for manual routing. If I go > for manual routing based on time, we miss the power of distributing the > load > while indexing. It will end up with all indexing happens in newly created > shard, which we feel this will not be efficient approach and degrades the > performance of indexing as we have lot of jvms running, but still all > indexing going to one single shard for indexing and we are also expecting > 1M+ docs per month in coming days. > > For your question on whether we will query old aged document... ? Mostly we > won't query old aged documents. With querying pattern, it's clear we should > go for manual routing and creating alias. But when it comes to indexing, in > order to distribute the load of indexing, we felt default routing is the > best option, but Join will not work. And that's the reason for asking when > this feature will be in place ? > > Regards, > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4344098.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Allow Join over two sharded collection
Hi Susheel, To make use of Joins only option is I should go for manual routing. If I go for manual routing based on time, we miss the power of distributing the load while indexing. It will end up with all indexing happens in newly created shard, which we feel this will not be efficient approach and degrades the performance of indexing as we have lot of jvms running, but still all indexing going to one single shard for indexing and we are also expecting 1M+ docs per month in coming days. For your question on whether we will query old aged document... ? Mostly we won't query old aged documents. With querying pattern, it's clear we should go for manual routing and creating alias. But when it comes to indexing, in order to distribute the load of indexing, we felt default routing is the best option, but Join will not work. And that's the reason for asking when this feature will be in place ? Regards, -- View this message in context: http://lucene.472066.n3.nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4344098.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Allow Join over two sharded collection
Unsubscribe Sent from my iPhone > On Jul 1, 2017, at 8:02 PM, Susheel Kumar <susheel2...@gmail.com> wrote: > > Depending on your use case people also use collection aliasing for time > series data. See below > > https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/ > >> On Sat, Jul 1, 2017 at 7:13 PM, Susheel Kumar <susheel2...@gmail.com> wrote: >> >> As Eric said 1docs/month isn't a big deal. I have 45+ million docs in one >> shard but YMMV depending on other factors. >> >> Also there is lot of confusion in the terminology. The default routing is >> compositeID routing. The implicit routing which Eric mentioned is the >> manual routing. https://issues.apache.org/jira/browse/SOLR-6630 >> >> Which routing you are suggesting to use? Can you clarify again. Also >> what's your exact use case. Do you query old aged documents or you don't >> need to and most or all of your queries are supposed to go to shard with >> newer documents. >> >> Thanks, >> Susheel >> >> On Sat, Jul 1, 2017 at 12:14 PM, Erick Erickson <erickerick...@gmail.com> >> wrote: >> >>> 1M docs/month shouldn't make Solr break a sweat. If it really worries >>> you and you're indexing in a big batch, index during off hours. At >>> very worst, if you're ingesting them all at once you might have to >>> throttle the indexing a bit. >>> >>> Frankly, most of the time acquiring the documents from the system of >>> record is where the bottleneck is and Solr easily handles the indexing >>> load. >>> >>> The other advantage is that if you use implicit routing rather than a >>> composite ID, you can add shards to your collection one at a time as >>> required, for time-series data that's an elegant way to "age out" old >>> documents. >>> >>> Best, >>> Erick >>> >>>> On Sat, Jul 1, 2017 at 8:57 AM, mganeshs <mgane...@live.in> wrote: >>>> Hi Susheel, >>>> >>>> Currently we have around 20M documents already and we are expecting now >>> on >>>> that every month 1M of documents. >>>> The reason why don't want to for time based implicit routing is that, >>> all >>>> documents will end up with recent shard and so indexing will be heavy >>> for >>>> the new shard, where as older shards will be used just for query >>> purpose. >>>> If we have default sharding, then load for indexing is distributed >>> across >>>> all the shards. That's the reason we would like to stick to default >>>> sharding. But Join is the issue over here when default sharding is used >>> :-( >>>> >>>> >>>> >>>> -- >>>> View this message in context: http://lucene.472066.n3.nabble >>> .com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >>
Re: Allow Join over two sharded collection
Depending on your use case people also use collection aliasing for time series data. See below https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/ On Sat, Jul 1, 2017 at 7:13 PM, Susheel Kumar <susheel2...@gmail.com> wrote: > As Eric said 1docs/month isn't a big deal. I have 45+ million docs in one > shard but YMMV depending on other factors. > > Also there is lot of confusion in the terminology. The default routing is > compositeID routing. The implicit routing which Eric mentioned is the > manual routing. https://issues.apache.org/jira/browse/SOLR-6630 > > Which routing you are suggesting to use? Can you clarify again. Also > what's your exact use case. Do you query old aged documents or you don't > need to and most or all of your queries are supposed to go to shard with > newer documents. > > Thanks, > Susheel > > On Sat, Jul 1, 2017 at 12:14 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> 1M docs/month shouldn't make Solr break a sweat. If it really worries >> you and you're indexing in a big batch, index during off hours. At >> very worst, if you're ingesting them all at once you might have to >> throttle the indexing a bit. >> >> Frankly, most of the time acquiring the documents from the system of >> record is where the bottleneck is and Solr easily handles the indexing >> load. >> >> The other advantage is that if you use implicit routing rather than a >> composite ID, you can add shards to your collection one at a time as >> required, for time-series data that's an elegant way to "age out" old >> documents. >> >> Best, >> Erick >> >> On Sat, Jul 1, 2017 at 8:57 AM, mganeshs <mgane...@live.in> wrote: >> > Hi Susheel, >> > >> > Currently we have around 20M documents already and we are expecting now >> on >> > that every month 1M of documents. >> > The reason why don't want to for time based implicit routing is that, >> all >> > documents will end up with recent shard and so indexing will be heavy >> for >> > the new shard, where as older shards will be used just for query >> purpose. >> > If we have default sharding, then load for indexing is distributed >> across >> > all the shards. That's the reason we would like to stick to default >> > sharding. But Join is the issue over here when default sharding is used >> :-( >> > >> > >> > >> > -- >> > View this message in context: http://lucene.472066.n3.nabble >> .com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html >> > Sent from the Solr - User mailing list archive at Nabble.com. >> > >
Re: Allow Join over two sharded collection
As Eric said 1docs/month isn't a big deal. I have 45+ million docs in one shard but YMMV depending on other factors. Also there is lot of confusion in the terminology. The default routing is compositeID routing. The implicit routing which Eric mentioned is the manual routing. https://issues.apache.org/jira/browse/SOLR-6630 Which routing you are suggesting to use? Can you clarify again. Also what's your exact use case. Do you query old aged documents or you don't need to and most or all of your queries are supposed to go to shard with newer documents. Thanks, Susheel On Sat, Jul 1, 2017 at 12:14 PM, Erick Erickson <erickerick...@gmail.com> wrote: > 1M docs/month shouldn't make Solr break a sweat. If it really worries > you and you're indexing in a big batch, index during off hours. At > very worst, if you're ingesting them all at once you might have to > throttle the indexing a bit. > > Frankly, most of the time acquiring the documents from the system of > record is where the bottleneck is and Solr easily handles the indexing > load. > > The other advantage is that if you use implicit routing rather than a > composite ID, you can add shards to your collection one at a time as > required, for time-series data that's an elegant way to "age out" old > documents. > > Best, > Erick > > On Sat, Jul 1, 2017 at 8:57 AM, mganeshs <mgane...@live.in> wrote: > > Hi Susheel, > > > > Currently we have around 20M documents already and we are expecting now > on > > that every month 1M of documents. > > The reason why don't want to for time based implicit routing is that, all > > documents will end up with recent shard and so indexing will be heavy for > > the new shard, where as older shards will be used just for query purpose. > > If we have default sharding, then load for indexing is distributed across > > all the shards. That's the reason we would like to stick to default > > sharding. But Join is the issue over here when default sharding is used > :-( > > > > > > > > -- > > View this message in context: http://lucene.472066.n3. > nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html > > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Allow Join over two sharded collection
1M docs/month shouldn't make Solr break a sweat. If it really worries you and you're indexing in a big batch, index during off hours. At very worst, if you're ingesting them all at once you might have to throttle the indexing a bit. Frankly, most of the time acquiring the documents from the system of record is where the bottleneck is and Solr easily handles the indexing load. The other advantage is that if you use implicit routing rather than a composite ID, you can add shards to your collection one at a time as required, for time-series data that's an elegant way to "age out" old documents. Best, Erick On Sat, Jul 1, 2017 at 8:57 AM, mganeshs <mgane...@live.in> wrote: > Hi Susheel, > > Currently we have around 20M documents already and we are expecting now on > that every month 1M of documents. > The reason why don't want to for time based implicit routing is that, all > documents will end up with recent shard and so indexing will be heavy for > the new shard, where as older shards will be used just for query purpose. > If we have default sharding, then load for indexing is distributed across > all the shards. That's the reason we would like to stick to default > sharding. But Join is the issue over here when default sharding is used :-( > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Allow Join over two sharded collection
Hi Susheel, Currently we have around 20M documents already and we are expecting now on that every month 1M of documents. The reason why don't want to for time based implicit routing is that, all documents will end up with recent shard and so indexing will be heavy for the new shard, where as older shards will be used just for query purpose. If we have default sharding, then load for indexing is distributed across all the shards. That's the reason we would like to stick to default sharding. But Join is the issue over here when default sharding is used :-( -- View this message in context: http://lucene.472066.n3.nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Allow Join over two sharded collection
How many documents you have currently and how much it will be after growing drastically. Either you can add hardware and keep one shard until the joins are fully available or You can shard and distribute using composite id router and that's still better even though some/one shard(s) may get high load compare to having just one single shard/node taking all the load, right? On Fri, Jun 30, 2017 at 2:19 AM, Mikhail Khludnev <m...@apache.org> wrote: > probably in November or December. > > On Thu, Jun 29, 2017 at 5:44 PM, mganeshs <mgane...@live.in> wrote: > > > All, > > > > Any idea when this ticket <https://issues.apache.org/ > > jira/browse/SOLR-8297> > > will be addressed. > > > > https://issues.apache.org/jira/browse/SOLR-8297 > > > > One of the comments says by SOLR 7.0. Can we expect that by 7.0 ? > > > > Regards, > > > > > > > > -- > > View this message in context: http://lucene.472066.n3. > > nabble.com/Allow-Join-over-two-sharded-collection-tp4343443.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: Allow Join over two sharded collection
probably in November or December. On Thu, Jun 29, 2017 at 5:44 PM, mganeshs <mgane...@live.in> wrote: > All, > > Any idea when this ticket <https://issues.apache.org/ > jira/browse/SOLR-8297> > will be addressed. > > https://issues.apache.org/jira/browse/SOLR-8297 > > One of the comments says by SOLR 7.0. Can we expect that by 7.0 ? > > Regards, > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Allow-Join-over-two-sharded-collection-tp4343443.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Sincerely yours Mikhail Khludnev
Re: Allow Join over two sharded collection
Joins will work with shards as long as the docs you're joining from/to are in the shard. Why not go compositeId routing (either ID=uniqueKey!docId or router.field)? Is there no 'uniqueKey' which will distribute randomly? You may need to put the same ACL docs in all shards depending on your use case. On 30 June 2017 at 12:57, mganeshs <mgane...@live.in> wrote: > Hi Erick, > > Initially I also thought of using Streaming for Joins. But looks like Joins > with Streaming is not for heavy QPS sort of queries and that's my use case. > Currently things are working fine with normal join for us as we have only > one shard. But in coming days number of documents to be indexed is going to > be increased drastically. So we need to split shards. The time I split > shards I can't use Joins. > > We thought of going with Implict routing for sharding. But if we go with > Implicit routing, all indexing will not be distributed and so one shard > could be getting more load which we don't want. > So we badly looking for default Join. > As I have posted in different questions in this forum itself and you too > have replied our joins are between real documents and it's ACL > documents. ACL document has multi value field whose value would be user or > groups. Why we want to keep ACL separately instead of keeping it in same > real document itself. It's because that our ACL can grow till 1L of users > or > even more. and for every change in ACL or its permission we don't want to > re-index the real document as well. > > Do you think is there any better alternative ? or the way we have kept ACLs > are wrong ? > > Regards, > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4343582.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Allow Join over two sharded collection
Hi Erick, Initially I also thought of using Streaming for Joins. But looks like Joins with Streaming is not for heavy QPS sort of queries and that's my use case. Currently things are working fine with normal join for us as we have only one shard. But in coming days number of documents to be indexed is going to be increased drastically. So we need to split shards. The time I split shards I can't use Joins. We thought of going with Implict routing for sharding. But if we go with Implicit routing, all indexing will not be distributed and so one shard could be getting more load which we don't want. So we badly looking for default Join. As I have posted in different questions in this forum itself and you too have replied our joins are between real documents and it's ACL documents. ACL document has multi value field whose value would be user or groups. Why we want to keep ACL separately instead of keeping it in same real document itself. It's because that our ACL can grow till 1L of users or even more. and for every change in ACL or its permission we don't want to re-index the real document as well. Do you think is there any better alternative ? or the way we have kept ACLs are wrong ? Regards, -- View this message in context: http://lucene.472066.n3.nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4343582.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Allow Join over two sharded collection
Probably won't be in 7.0. In fact it appears to have lost momentum so I don't know if it'll ever be committed. Don't know that it _won't_, but there's no way to say. There's been a lot of work in the Solr Streaming world to do joins and it's quite possible that that'll do what you need. Best, Erick On Thu, Jun 29, 2017 at 7:44 AM, mganeshs <mgane...@live.in> wrote: > All, > > Any idea when this ticket <https://issues.apache.org/jira/browse/SOLR-8297> > will be addressed. > > https://issues.apache.org/jira/browse/SOLR-8297 > > One of the comments says by SOLR 7.0. Can we expect that by 7.0 ? > > Regards, > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Allow-Join-over-two-sharded-collection-tp4343443.html > Sent from the Solr - User mailing list archive at Nabble.com.
Allow Join over two sharded collection
All, Any idea when this ticket <https://issues.apache.org/jira/browse/SOLR-8297> will be addressed. https://issues.apache.org/jira/browse/SOLR-8297 One of the comments says by SOLR 7.0. Can we expect that by 7.0 ? Regards, -- View this message in context: http://lucene.472066.n3.nabble.com/Allow-Join-over-two-sharded-collection-tp4343443.html Sent from the Solr - User mailing list archive at Nabble.com.