Re: list of collections

2018-07-17 Thread Kudrettin Güleryüz
Thanks. I saw older references to a CloudSolrServer.getCollectionsList()
which confused me a little.

On Tue, Jul 17, 2018 at 2:28 PM Webster Homer 
wrote:

> use the Solrcloud Collections API
> https://lucene.apache.org/solr/guide/7_3/collections-api.html#list
>
> On Tue, Jul 17, 2018 at 12:12 PM, Kudrettin Güleryüz 
> wrote:
>
> > Hi,
> >
> > What is the suggested way to get list of collections from a solr Cloud
> with
> > a ZKhost?
> >
> > Thank you
> >
>
> --
>
>
> This message and any attachment are confidential and may be
> privileged or
> otherwise protected from disclosure. If you are not the intended
> recipient,
> you must not copy this message or attachment or disclose the
> contents to
> any other person. If you have received this transmission in error,
> please
> notify the sender immediately and delete the message and any attachment
>
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do
> not accept liability for any omissions or errors in this
> message which may
> arise as a result of E-Mail-transmission or for damages
> resulting from any
> unauthorized changes of the content of this message and
> any attachment thereto.
> Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee
> that this message is free of viruses and does
> not accept liability for any
> damages caused by any virus transmitted
> therewith.
>
>
>
> Click http://www.emdgroup.com/disclaimer
>  to access the
> German, French, Spanish
> and Portuguese versions of this disclaimer.
>


Re: list of collections

2018-07-17 Thread Webster Homer
use the Solrcloud Collections API
https://lucene.apache.org/solr/guide/7_3/collections-api.html#list

On Tue, Jul 17, 2018 at 12:12 PM, Kudrettin Güleryüz 
wrote:

> Hi,
>
> What is the suggested way to get list of collections from a solr Cloud with
> a ZKhost?
>
> Thank you
>

-- 


This message and any attachment are confidential and may be
privileged or 
otherwise protected from disclosure. If you are not the intended
recipient, 
you must not copy this message or attachment or disclose the
contents to 
any other person. If you have received this transmission in error,
please 
notify the sender immediately and delete the message and any attachment

from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do
not accept liability for any omissions or errors in this 
message which may
arise as a result of E-Mail-transmission or for damages 
resulting from any
unauthorized changes of the content of this message and 
any attachment thereto.
Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee
that this message is free of viruses and does 
not accept liability for any
damages caused by any virus transmitted 
therewith.



Click http://www.emdgroup.com/disclaimer 
 to access the
German, French, Spanish 
and Portuguese versions of this disclaimer.


Re: List all Collections together with number of records

2015-06-08 Thread Zheng Lin Edwin Yeo
We're thinking of writing a custom request handler to do that, although the
handler will also query all the collections at the backend.

Will this lead to a faster response speed for the user?

Regards,
Edwin


On 8 June 2015 at 00:06, Erick Erickson erickerick...@gmail.com wrote:

 bq: we still need those information to be stored in a separate collection
 for security reasons.

 Not necessarily. I've seen lots of installations where auth tokens are
 embedded in the document (say groups that can see this doc). Then
 the front-end simply attaches fq=auth_field:(groups each user belongs to)
 to every query to restrict access.

 That said, some organizations aren't comfortable with this and demand
 separate collections, in which case you're stuck.

 You've defined an architecture though, and one of the consequences
 of that is if you have many collections, you'll have to fire off many
 queries (perhaps in parallel, but still). There's no magic to get around
 that. And it really doesn't matter, because in what you've described
 what has to happen is one query has to be fired to each collection.
 It doesn't matter whether Solr does that for you or you spawn a bunch
 of threads on the client, the same work has to happen somewhere.

 You also have to figure out how to present the results to the user,
 if it's simple count you're OK. But scores will _not_ be comparable
 across the various collections so the presentation will be challenging.

 Best,
 Erick

 On Sun, Jun 7, 2015 at 6:29 AM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  The reasons we want to have different collections is that each of the
  collections have different fields, and that some collections will contain
  information that are more sensitive than others.
 
  As such, we may need to restrict access to certain collections for some
  users. Although the restriction will be done on the front end client
 side,
  but we still need those information to be stored in a separate collection
  for security reasons..
 
  Regards,
  Edwin
 
 
  On 7 June 2015 at 12:23, Erick Erickson erickerick...@gmail.com wrote:
 
  bq: Yup this information will need to be collected each time the user
  search
  for a query, as we want to show the number of records that matches the
  search query in each of the collections.
 
  You're looking at something akin to federated search. About all you
 can
  do is send out parallel queries to each collection.
 
  This is an interesting requirement, and I really question whether
 it's a
  wise
  thing to insist on. I'd really think about going back to the design.
  For instance,
  could you consolidate all these collections into a single one, with
 perhaps
  a collection_id? Then the problem is relatively simple, use field
  collapsing
  (aka grouping).
 
  Best,
  Erick
 
  On Sat, Jun 6, 2015 at 6:40 PM, Zheng Lin Edwin Yeo
  edwinye...@gmail.com wrote:
   Yup this information will need to be collected each time the user
 search
   for a query, as we want to show the number of records that matches the
   search query in each of the collections.
  
   Currently I only have 6 collections, but it could increase to
 hundreds of
   collections in the future. So I'm worried that it could slow down the
   system a lot if we have to pass hundreds of queries for each search
  request.
  
   Regards,
   Edwin
  
  
   On 5 June 2015 at 21:00, Upayavira u...@odoko.co.uk wrote:
  
   I'm not so sure this is as bad as it sounds. When your collection is
   sharded, no single node knows about the documents in other
 shards/nodes,
   so to find the total number, a query will need to go to every node.
  
   Trying to work out something to do a single request to every node,
   combine their collection statistics and aggregate them into a single
   result sounds very complicated, and likely overkill.
  
   Are you needing to collect this information often? Do you have a lot
 of
   collections?
  
   Upayavira
  
  
   On Fri, Jun 5, 2015, at 06:29 AM, Zheng Lin Edwin Yeo wrote:
I'm trying to write a SolrJ program in Java to read and consolidate
  all
the
information into a JSON file, The client will just need to call
 this
SolrJ
program and read this JSON file to get the details. But the problem
  is we
are still querying the Solr once for each collection, just that
 this
  time
it is done in the SolrJ program in a for-loop, while previously
 it's
  done
on the client side. Not sure will this lead to performance
  improvement?
   
For your suggestion on spawning a bunch of threads, does it mean
 the
  same
thing as I did?
   
Regards,
Edwin
   
   
On 5 June 2015 at 12:03, Erick Erickson erickerick...@gmail.com
  wrote:
   
 Have you considered spawning a bunch of threads, one per
 collection
 and having them all run in parallel?

 Best,
 Erick

 On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  The reason we wanted to 

Re: List all Collections together with number of records

2015-06-07 Thread Erick Erickson
bq: we still need those information to be stored in a separate collection
for security reasons.

Not necessarily. I've seen lots of installations where auth tokens are
embedded in the document (say groups that can see this doc). Then
the front-end simply attaches fq=auth_field:(groups each user belongs to)
to every query to restrict access.

That said, some organizations aren't comfortable with this and demand
separate collections, in which case you're stuck.

You've defined an architecture though, and one of the consequences
of that is if you have many collections, you'll have to fire off many
queries (perhaps in parallel, but still). There's no magic to get around
that. And it really doesn't matter, because in what you've described
what has to happen is one query has to be fired to each collection.
It doesn't matter whether Solr does that for you or you spawn a bunch
of threads on the client, the same work has to happen somewhere.

You also have to figure out how to present the results to the user,
if it's simple count you're OK. But scores will _not_ be comparable
across the various collections so the presentation will be challenging.

Best,
Erick

On Sun, Jun 7, 2015 at 6:29 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 The reasons we want to have different collections is that each of the
 collections have different fields, and that some collections will contain
 information that are more sensitive than others.

 As such, we may need to restrict access to certain collections for some
 users. Although the restriction will be done on the front end client side,
 but we still need those information to be stored in a separate collection
 for security reasons..

 Regards,
 Edwin


 On 7 June 2015 at 12:23, Erick Erickson erickerick...@gmail.com wrote:

 bq: Yup this information will need to be collected each time the user
 search
 for a query, as we want to show the number of records that matches the
 search query in each of the collections.

 You're looking at something akin to federated search. About all you can
 do is send out parallel queries to each collection.

 This is an interesting requirement, and I really question whether it's a
 wise
 thing to insist on. I'd really think about going back to the design.
 For instance,
 could you consolidate all these collections into a single one, with perhaps
 a collection_id? Then the problem is relatively simple, use field
 collapsing
 (aka grouping).

 Best,
 Erick

 On Sat, Jun 6, 2015 at 6:40 PM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  Yup this information will need to be collected each time the user search
  for a query, as we want to show the number of records that matches the
  search query in each of the collections.
 
  Currently I only have 6 collections, but it could increase to hundreds of
  collections in the future. So I'm worried that it could slow down the
  system a lot if we have to pass hundreds of queries for each search
 request.
 
  Regards,
  Edwin
 
 
  On 5 June 2015 at 21:00, Upayavira u...@odoko.co.uk wrote:
 
  I'm not so sure this is as bad as it sounds. When your collection is
  sharded, no single node knows about the documents in other shards/nodes,
  so to find the total number, a query will need to go to every node.
 
  Trying to work out something to do a single request to every node,
  combine their collection statistics and aggregate them into a single
  result sounds very complicated, and likely overkill.
 
  Are you needing to collect this information often? Do you have a lot of
  collections?
 
  Upayavira
 
 
  On Fri, Jun 5, 2015, at 06:29 AM, Zheng Lin Edwin Yeo wrote:
   I'm trying to write a SolrJ program in Java to read and consolidate
 all
   the
   information into a JSON file, The client will just need to call this
   SolrJ
   program and read this JSON file to get the details. But the problem
 is we
   are still querying the Solr once for each collection, just that this
 time
   it is done in the SolrJ program in a for-loop, while previously it's
 done
   on the client side. Not sure will this lead to performance
 improvement?
  
   For your suggestion on spawning a bunch of threads, does it mean the
 same
   thing as I did?
  
   Regards,
   Edwin
  
  
   On 5 June 2015 at 12:03, Erick Erickson erickerick...@gmail.com
 wrote:
  
Have you considered spawning a bunch of threads, one per collection
and having them all run in parallel?
   
Best,
Erick
   
On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 The reason we wanted to do a single call is to improve on the
performance,
 as our application requires to list the total number of records in
  each
of
 the collections, and the number of records that matches the query
  each of
 the collections.

 Currently we are querying each collection one by one to retrieve
 the
 numFound value and display them, but this can slow down the system
 significantly when 

Re: List all Collections together with number of records

2015-06-07 Thread Zheng Lin Edwin Yeo
The reasons we want to have different collections is that each of the
collections have different fields, and that some collections will contain
information that are more sensitive than others.

As such, we may need to restrict access to certain collections for some
users. Although the restriction will be done on the front end client side,
but we still need those information to be stored in a separate collection
for security reasons..

Regards,
Edwin


On 7 June 2015 at 12:23, Erick Erickson erickerick...@gmail.com wrote:

 bq: Yup this information will need to be collected each time the user
 search
 for a query, as we want to show the number of records that matches the
 search query in each of the collections.

 You're looking at something akin to federated search. About all you can
 do is send out parallel queries to each collection.

 This is an interesting requirement, and I really question whether it's a
 wise
 thing to insist on. I'd really think about going back to the design.
 For instance,
 could you consolidate all these collections into a single one, with perhaps
 a collection_id? Then the problem is relatively simple, use field
 collapsing
 (aka grouping).

 Best,
 Erick

 On Sat, Jun 6, 2015 at 6:40 PM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  Yup this information will need to be collected each time the user search
  for a query, as we want to show the number of records that matches the
  search query in each of the collections.
 
  Currently I only have 6 collections, but it could increase to hundreds of
  collections in the future. So I'm worried that it could slow down the
  system a lot if we have to pass hundreds of queries for each search
 request.
 
  Regards,
  Edwin
 
 
  On 5 June 2015 at 21:00, Upayavira u...@odoko.co.uk wrote:
 
  I'm not so sure this is as bad as it sounds. When your collection is
  sharded, no single node knows about the documents in other shards/nodes,
  so to find the total number, a query will need to go to every node.
 
  Trying to work out something to do a single request to every node,
  combine their collection statistics and aggregate them into a single
  result sounds very complicated, and likely overkill.
 
  Are you needing to collect this information often? Do you have a lot of
  collections?
 
  Upayavira
 
 
  On Fri, Jun 5, 2015, at 06:29 AM, Zheng Lin Edwin Yeo wrote:
   I'm trying to write a SolrJ program in Java to read and consolidate
 all
   the
   information into a JSON file, The client will just need to call this
   SolrJ
   program and read this JSON file to get the details. But the problem
 is we
   are still querying the Solr once for each collection, just that this
 time
   it is done in the SolrJ program in a for-loop, while previously it's
 done
   on the client side. Not sure will this lead to performance
 improvement?
  
   For your suggestion on spawning a bunch of threads, does it mean the
 same
   thing as I did?
  
   Regards,
   Edwin
  
  
   On 5 June 2015 at 12:03, Erick Erickson erickerick...@gmail.com
 wrote:
  
Have you considered spawning a bunch of threads, one per collection
and having them all run in parallel?
   
Best,
Erick
   
On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 The reason we wanted to do a single call is to improve on the
performance,
 as our application requires to list the total number of records in
  each
of
 the collections, and the number of records that matches the query
  each of
 the collections.

 Currently we are querying each collection one by one to retrieve
 the
 numFound value and display them, but this can slow down the system
 significantly when the number of collection grows. So we are
  thinking of
 ways to improve the speed in this area.

 Any other methods which you can suggest that we can do to overcome
  this
 speed problem?

 Regards,
 Edwin
 On 5 Jun 2015 00:16, Erick Erickson erickerick...@gmail.com
  wrote:

 Not in a single call that I know of. These are really orthogonal
 concepts. Getting the cluster status merely involves reading the
 Zookeeper clusterstate whereas getting the total number of docs
 for
 each would involve querying each collection, i.e. going to the
 Solr
 nodes themselves. I'd guess it's unlikely to be combined.

 Best,
 Erick

 On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  Hi,
 
  Would like to check, are we able to use the Collection API or
 any
other
  method to list all the collections in the cluster together with
  the
 number
  of records in each of the collections in one output?
 
  Currently, I only know of the List Collections
  /admin/collections?action=LIST. However, this only list the
 names
  of
the
  collections that are in the cluster, but not the number of
  records.
 

Re: List all Collections together with number of records

2015-06-06 Thread Erick Erickson
bq: Yup this information will need to be collected each time the user search
for a query, as we want to show the number of records that matches the
search query in each of the collections.

You're looking at something akin to federated search. About all you can
do is send out parallel queries to each collection.

This is an interesting requirement, and I really question whether it's a wise
thing to insist on. I'd really think about going back to the design.
For instance,
could you consolidate all these collections into a single one, with perhaps
a collection_id? Then the problem is relatively simple, use field collapsing
(aka grouping).

Best,
Erick

On Sat, Jun 6, 2015 at 6:40 PM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 Yup this information will need to be collected each time the user search
 for a query, as we want to show the number of records that matches the
 search query in each of the collections.

 Currently I only have 6 collections, but it could increase to hundreds of
 collections in the future. So I'm worried that it could slow down the
 system a lot if we have to pass hundreds of queries for each search request.

 Regards,
 Edwin


 On 5 June 2015 at 21:00, Upayavira u...@odoko.co.uk wrote:

 I'm not so sure this is as bad as it sounds. When your collection is
 sharded, no single node knows about the documents in other shards/nodes,
 so to find the total number, a query will need to go to every node.

 Trying to work out something to do a single request to every node,
 combine their collection statistics and aggregate them into a single
 result sounds very complicated, and likely overkill.

 Are you needing to collect this information often? Do you have a lot of
 collections?

 Upayavira


 On Fri, Jun 5, 2015, at 06:29 AM, Zheng Lin Edwin Yeo wrote:
  I'm trying to write a SolrJ program in Java to read and consolidate all
  the
  information into a JSON file, The client will just need to call this
  SolrJ
  program and read this JSON file to get the details. But the problem is we
  are still querying the Solr once for each collection, just that this time
  it is done in the SolrJ program in a for-loop, while previously it's done
  on the client side. Not sure will this lead to performance improvement?
 
  For your suggestion on spawning a bunch of threads, does it mean the same
  thing as I did?
 
  Regards,
  Edwin
 
 
  On 5 June 2015 at 12:03, Erick Erickson erickerick...@gmail.com wrote:
 
   Have you considered spawning a bunch of threads, one per collection
   and having them all run in parallel?
  
   Best,
   Erick
  
   On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
   edwinye...@gmail.com wrote:
The reason we wanted to do a single call is to improve on the
   performance,
as our application requires to list the total number of records in
 each
   of
the collections, and the number of records that matches the query
 each of
the collections.
   
Currently we are querying each collection one by one to retrieve the
numFound value and display them, but this can slow down the system
significantly when the number of collection grows. So we are
 thinking of
ways to improve the speed in this area.
   
Any other methods which you can suggest that we can do to overcome
 this
speed problem?
   
Regards,
Edwin
On 5 Jun 2015 00:16, Erick Erickson erickerick...@gmail.com
 wrote:
   
Not in a single call that I know of. These are really orthogonal
concepts. Getting the cluster status merely involves reading the
Zookeeper clusterstate whereas getting the total number of docs for
each would involve querying each collection, i.e. going to the Solr
nodes themselves. I'd guess it's unlikely to be combined.
   
Best,
Erick
   
On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 Hi,

 Would like to check, are we able to use the Collection API or any
   other
 method to list all the collections in the cluster together with
 the
number
 of records in each of the collections in one output?

 Currently, I only know of the List Collections
 /admin/collections?action=LIST. However, this only list the names
 of
   the
 collections that are in the cluster, but not the number of
 records.

 Is there a way to show the number of records in each of the
   collections
as
 well?

 Regards,
 Edwin
   
  



Re: List all Collections together with number of records

2015-06-06 Thread Zheng Lin Edwin Yeo
The query for *:* with rows=0 is only for the initial startup. When there's
search query and filter, these need to be added in to the query as we
wanted to display the total number of records in each of the collections
with respect to the query and filter.

Regards,
Edwin


On 5 June 2015 at 21:14, Shawn Heisey apa...@elyograg.org wrote:

 On 6/5/2015 7:00 AM, Upayavira wrote:
  I'm not so sure this is as bad as it sounds. When your collection is
  sharded, no single node knows about the documents in other shards/nodes,
  so to find the total number, a query will need to go to every node.
 
  Trying to work out something to do a single request to every node,
  combine their collection statistics and aggregate them into a single
  result sounds very complicated, and likely overkill.
 
  Are you needing to collect this information often? Do you have a lot of
  collections?

 A query for *:* with rows=0 is quite fast on any Solr version, unless
 RAM is too tight.  If your commits are infrequent, subsequent queries
 for that information will even faster because they will be served from
 Solr caches.

 There's no reason to have user code talk to all the shards and aggregate
 the document count for the collection -- let SolrCloud handle it and
 just query the collection with q=*:*rows=0.  The numFound value in the
 response will cover the entire collection, and Solr will optimize the
 query as much as it possibly can be optimized.

 Thanks,
 Shawn




Re: List all Collections together with number of records

2015-06-06 Thread Zheng Lin Edwin Yeo
Yup this information will need to be collected each time the user search
for a query, as we want to show the number of records that matches the
search query in each of the collections.

Currently I only have 6 collections, but it could increase to hundreds of
collections in the future. So I'm worried that it could slow down the
system a lot if we have to pass hundreds of queries for each search request.

Regards,
Edwin


On 5 June 2015 at 21:00, Upayavira u...@odoko.co.uk wrote:

 I'm not so sure this is as bad as it sounds. When your collection is
 sharded, no single node knows about the documents in other shards/nodes,
 so to find the total number, a query will need to go to every node.

 Trying to work out something to do a single request to every node,
 combine their collection statistics and aggregate them into a single
 result sounds very complicated, and likely overkill.

 Are you needing to collect this information often? Do you have a lot of
 collections?

 Upayavira


 On Fri, Jun 5, 2015, at 06:29 AM, Zheng Lin Edwin Yeo wrote:
  I'm trying to write a SolrJ program in Java to read and consolidate all
  the
  information into a JSON file, The client will just need to call this
  SolrJ
  program and read this JSON file to get the details. But the problem is we
  are still querying the Solr once for each collection, just that this time
  it is done in the SolrJ program in a for-loop, while previously it's done
  on the client side. Not sure will this lead to performance improvement?
 
  For your suggestion on spawning a bunch of threads, does it mean the same
  thing as I did?
 
  Regards,
  Edwin
 
 
  On 5 June 2015 at 12:03, Erick Erickson erickerick...@gmail.com wrote:
 
   Have you considered spawning a bunch of threads, one per collection
   and having them all run in parallel?
  
   Best,
   Erick
  
   On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
   edwinye...@gmail.com wrote:
The reason we wanted to do a single call is to improve on the
   performance,
as our application requires to list the total number of records in
 each
   of
the collections, and the number of records that matches the query
 each of
the collections.
   
Currently we are querying each collection one by one to retrieve the
numFound value and display them, but this can slow down the system
significantly when the number of collection grows. So we are
 thinking of
ways to improve the speed in this area.
   
Any other methods which you can suggest that we can do to overcome
 this
speed problem?
   
Regards,
Edwin
On 5 Jun 2015 00:16, Erick Erickson erickerick...@gmail.com
 wrote:
   
Not in a single call that I know of. These are really orthogonal
concepts. Getting the cluster status merely involves reading the
Zookeeper clusterstate whereas getting the total number of docs for
each would involve querying each collection, i.e. going to the Solr
nodes themselves. I'd guess it's unlikely to be combined.
   
Best,
Erick
   
On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 Hi,

 Would like to check, are we able to use the Collection API or any
   other
 method to list all the collections in the cluster together with
 the
number
 of records in each of the collections in one output?

 Currently, I only know of the List Collections
 /admin/collections?action=LIST. However, this only list the names
 of
   the
 collections that are in the cluster, but not the number of
 records.

 Is there a way to show the number of records in each of the
   collections
as
 well?

 Regards,
 Edwin
   
  



Re: List all Collections together with number of records

2015-06-05 Thread Upayavira
I'm not so sure this is as bad as it sounds. When your collection is
sharded, no single node knows about the documents in other shards/nodes,
so to find the total number, a query will need to go to every node.

Trying to work out something to do a single request to every node,
combine their collection statistics and aggregate them into a single
result sounds very complicated, and likely overkill.

Are you needing to collect this information often? Do you have a lot of
collections?

Upayavira


On Fri, Jun 5, 2015, at 06:29 AM, Zheng Lin Edwin Yeo wrote:
 I'm trying to write a SolrJ program in Java to read and consolidate all
 the
 information into a JSON file, The client will just need to call this
 SolrJ
 program and read this JSON file to get the details. But the problem is we
 are still querying the Solr once for each collection, just that this time
 it is done in the SolrJ program in a for-loop, while previously it's done
 on the client side. Not sure will this lead to performance improvement?
 
 For your suggestion on spawning a bunch of threads, does it mean the same
 thing as I did?
 
 Regards,
 Edwin
 
 
 On 5 June 2015 at 12:03, Erick Erickson erickerick...@gmail.com wrote:
 
  Have you considered spawning a bunch of threads, one per collection
  and having them all run in parallel?
 
  Best,
  Erick
 
  On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
  edwinye...@gmail.com wrote:
   The reason we wanted to do a single call is to improve on the
  performance,
   as our application requires to list the total number of records in each
  of
   the collections, and the number of records that matches the query each of
   the collections.
  
   Currently we are querying each collection one by one to retrieve the
   numFound value and display them, but this can slow down the system
   significantly when the number of collection grows. So we are thinking of
   ways to improve the speed in this area.
  
   Any other methods which you can suggest that we can do to overcome this
   speed problem?
  
   Regards,
   Edwin
   On 5 Jun 2015 00:16, Erick Erickson erickerick...@gmail.com wrote:
  
   Not in a single call that I know of. These are really orthogonal
   concepts. Getting the cluster status merely involves reading the
   Zookeeper clusterstate whereas getting the total number of docs for
   each would involve querying each collection, i.e. going to the Solr
   nodes themselves. I'd guess it's unlikely to be combined.
  
   Best,
   Erick
  
   On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
   edwinye...@gmail.com wrote:
Hi,
   
Would like to check, are we able to use the Collection API or any
  other
method to list all the collections in the cluster together with the
   number
of records in each of the collections in one output?
   
Currently, I only know of the List Collections
/admin/collections?action=LIST. However, this only list the names of
  the
collections that are in the cluster, but not the number of records.
   
Is there a way to show the number of records in each of the
  collections
   as
well?
   
Regards,
Edwin
  
 


Re: List all Collections together with number of records

2015-06-05 Thread Shawn Heisey
On 6/5/2015 7:00 AM, Upayavira wrote:
 I'm not so sure this is as bad as it sounds. When your collection is
 sharded, no single node knows about the documents in other shards/nodes,
 so to find the total number, a query will need to go to every node.
 
 Trying to work out something to do a single request to every node,
 combine their collection statistics and aggregate them into a single
 result sounds very complicated, and likely overkill.
 
 Are you needing to collect this information often? Do you have a lot of
 collections?

A query for *:* with rows=0 is quite fast on any Solr version, unless
RAM is too tight.  If your commits are infrequent, subsequent queries
for that information will even faster because they will be served from
Solr caches.

There's no reason to have user code talk to all the shards and aggregate
the document count for the collection -- let SolrCloud handle it and
just query the collection with q=*:*rows=0.  The numFound value in the
response will cover the entire collection, and Solr will optimize the
query as much as it possibly can be optimized.

Thanks,
Shawn



Re: List all Collections together with number of records

2015-06-04 Thread Erick Erickson
Have you considered spawning a bunch of threads, one per collection
and having them all run in parallel?

Best,
Erick

On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 The reason we wanted to do a single call is to improve on the performance,
 as our application requires to list the total number of records in each of
 the collections, and the number of records that matches the query each of
 the collections.

 Currently we are querying each collection one by one to retrieve the
 numFound value and display them, but this can slow down the system
 significantly when the number of collection grows. So we are thinking of
 ways to improve the speed in this area.

 Any other methods which you can suggest that we can do to overcome this
 speed problem?

 Regards,
 Edwin
 On 5 Jun 2015 00:16, Erick Erickson erickerick...@gmail.com wrote:

 Not in a single call that I know of. These are really orthogonal
 concepts. Getting the cluster status merely involves reading the
 Zookeeper clusterstate whereas getting the total number of docs for
 each would involve querying each collection, i.e. going to the Solr
 nodes themselves. I'd guess it's unlikely to be combined.

 Best,
 Erick

 On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  Hi,
 
  Would like to check, are we able to use the Collection API or any other
  method to list all the collections in the cluster together with the
 number
  of records in each of the collections in one output?
 
  Currently, I only know of the List Collections
  /admin/collections?action=LIST. However, this only list the names of the
  collections that are in the cluster, but not the number of records.
 
  Is there a way to show the number of records in each of the collections
 as
  well?
 
  Regards,
  Edwin



Re: List all Collections together with number of records

2015-06-04 Thread Zheng Lin Edwin Yeo
I'm trying to write a SolrJ program in Java to read and consolidate all the
information into a JSON file, The client will just need to call this SolrJ
program and read this JSON file to get the details. But the problem is we
are still querying the Solr once for each collection, just that this time
it is done in the SolrJ program in a for-loop, while previously it's done
on the client side. Not sure will this lead to performance improvement?

For your suggestion on spawning a bunch of threads, does it mean the same
thing as I did?

Regards,
Edwin


On 5 June 2015 at 12:03, Erick Erickson erickerick...@gmail.com wrote:

 Have you considered spawning a bunch of threads, one per collection
 and having them all run in parallel?

 Best,
 Erick

 On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  The reason we wanted to do a single call is to improve on the
 performance,
  as our application requires to list the total number of records in each
 of
  the collections, and the number of records that matches the query each of
  the collections.
 
  Currently we are querying each collection one by one to retrieve the
  numFound value and display them, but this can slow down the system
  significantly when the number of collection grows. So we are thinking of
  ways to improve the speed in this area.
 
  Any other methods which you can suggest that we can do to overcome this
  speed problem?
 
  Regards,
  Edwin
  On 5 Jun 2015 00:16, Erick Erickson erickerick...@gmail.com wrote:
 
  Not in a single call that I know of. These are really orthogonal
  concepts. Getting the cluster status merely involves reading the
  Zookeeper clusterstate whereas getting the total number of docs for
  each would involve querying each collection, i.e. going to the Solr
  nodes themselves. I'd guess it's unlikely to be combined.
 
  Best,
  Erick
 
  On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
  edwinye...@gmail.com wrote:
   Hi,
  
   Would like to check, are we able to use the Collection API or any
 other
   method to list all the collections in the cluster together with the
  number
   of records in each of the collections in one output?
  
   Currently, I only know of the List Collections
   /admin/collections?action=LIST. However, this only list the names of
 the
   collections that are in the cluster, but not the number of records.
  
   Is there a way to show the number of records in each of the
 collections
  as
   well?
  
   Regards,
   Edwin
 



Re: List all Collections together with number of records

2015-06-04 Thread Zheng Lin Edwin Yeo
The reason we wanted to do a single call is to improve on the performance,
as our application requires to list the total number of records in each of
the collections, and the number of records that matches the query each of
the collections.

Currently we are querying each collection one by one to retrieve the
numFound value and display them, but this can slow down the system
significantly when the number of collection grows. So we are thinking of
ways to improve the speed in this area.

Any other methods which you can suggest that we can do to overcome this
speed problem?

Regards,
Edwin
On 5 Jun 2015 00:16, Erick Erickson erickerick...@gmail.com wrote:

 Not in a single call that I know of. These are really orthogonal
 concepts. Getting the cluster status merely involves reading the
 Zookeeper clusterstate whereas getting the total number of docs for
 each would involve querying each collection, i.e. going to the Solr
 nodes themselves. I'd guess it's unlikely to be combined.

 Best,
 Erick

 On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  Hi,
 
  Would like to check, are we able to use the Collection API or any other
  method to list all the collections in the cluster together with the
 number
  of records in each of the collections in one output?
 
  Currently, I only know of the List Collections
  /admin/collections?action=LIST. However, this only list the names of the
  collections that are in the cluster, but not the number of records.
 
  Is there a way to show the number of records in each of the collections
 as
  well?
 
  Regards,
  Edwin



Re: List all Collections together with number of records

2015-06-04 Thread Erick Erickson
Not in a single call that I know of. These are really orthogonal
concepts. Getting the cluster status merely involves reading the
Zookeeper clusterstate whereas getting the total number of docs for
each would involve querying each collection, i.e. going to the Solr
nodes themselves. I'd guess it's unlikely to be combined.

Best,
Erick

On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 Hi,

 Would like to check, are we able to use the Collection API or any other
 method to list all the collections in the cluster together with the number
 of records in each of the collections in one output?

 Currently, I only know of the List Collections
 /admin/collections?action=LIST. However, this only list the names of the
 collections that are in the cluster, but not the number of records.

 Is there a way to show the number of records in each of the collections as
 well?

 Regards,
 Edwin