Re: MongoDB plugin to Spark - too many open cursors

2020-10-26 Thread Daniel Stojanov

Hi,

Thanks.

I believe that this is an error message coming from the MongoDB server 
itself. Essentially there are multiple instances of my application 
running at the same time. So with a single or small number of 
applications there are never issues. It's an issue when a sufficient 
number of applications are running.


I am not aware of how the MongoDB client manages connections. For 
example, is it leaving connections hanging (rather than closing them) 
after it pulls data from MongoDB? I do not know if there is a way to 
specify to individual running applications to limit the number of active 
connections to the database. The database instance is running on AWS' 
DocumentDB, so the only way to allow additional cursors is to upgrade to 
a larger instance type. This seems unnecessary since my concern is just 
the number of open cursors, rather than performance needs of the 
hardware itself.



Regards,




On 26/10/20 1:52 pm, lec ssmi wrote:

Is the connection pool configured by mongodb full?

Daniel Stojanov > 于2020年10月26日周一 上午10:28写道:


Hi,


I receive an error message from the MongoDB server if there are
too many
Spark applications trying to access the database at the same time
(about
3 or 4), "Cannot open a new cursor since too many cursors are already
opened." I am not too sure of how to remedy this. I am not sure
how the
plugin behaves when it's pulling data.

It appears that a given running application will open many
connections
to the database. The total number of cursors in the database's
setting
is many more than the number of read operations occurring in Spark.


Does the plugin keep a connection/cursor open to the database even
after
it has pulled out the data into a dataframe?

Why are there so many open cursors for a single read operation?

Does catching the exception, sleeping for a while, then trying again
make sense? If cursors are kept open throughout the life of the
application this would not make sense.


Plugin version: org.mongodb.spark:mongo-spark-connector_2.12:2.4.1


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org




Re: MongoDB plugin to Spark - too many open cursors

2020-10-25 Thread lec ssmi
Is the connection pool configured by mongodb full?

Daniel Stojanov  于2020年10月26日周一 上午10:28写道:

> Hi,
>
>
> I receive an error message from the MongoDB server if there are too many
> Spark applications trying to access the database at the same time (about
> 3 or 4), "Cannot open a new cursor since too many cursors are already
> opened." I am not too sure of how to remedy this. I am not sure how the
> plugin behaves when it's pulling data.
>
> It appears that a given running application will open many connections
> to the database. The total number of cursors in the database's setting
> is many more than the number of read operations occurring in Spark.
>
>
> Does the plugin keep a connection/cursor open to the database even after
> it has pulled out the data into a dataframe?
>
> Why are there so many open cursors for a single read operation?
>
> Does catching the exception, sleeping for a while, then trying again
> make sense? If cursors are kept open throughout the life of the
> application this would not make sense.
>
>
> Plugin version: org.mongodb.spark:mongo-spark-connector_2.12:2.4.1
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


MongoDB plugin to Spark - too many open cursors

2020-10-25 Thread Daniel Stojanov

Hi,


I receive an error message from the MongoDB server if there are too many 
Spark applications trying to access the database at the same time (about 
3 or 4), "Cannot open a new cursor since too many cursors are already 
opened." I am not too sure of how to remedy this. I am not sure how the 
plugin behaves when it's pulling data.


It appears that a given running application will open many connections 
to the database. The total number of cursors in the database's setting 
is many more than the number of read operations occurring in Spark.



Does the plugin keep a connection/cursor open to the database even after 
it has pulled out the data into a dataframe?


Why are there so many open cursors for a single read operation?

Does catching the exception, sleeping for a while, then trying again 
make sense? If cursors are kept open throughout the life of the 
application this would not make sense.



Plugin version: org.mongodb.spark:mongo-spark-connector_2.12:2.4.1


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org