Hi Brian,
Sorry for the delayed response! The likely cause of your errors are
related to the servers being overloaded by clients, and the I/O
operations taking so long that the clients cancel them after a timeout
is reached. You can crank up the timeouts if you want to perform load
tests of this kind by modifying the configure options in the PVFS
config file. Check out:
http://www.pvfs.org/cvs/pvfs-2-7-branch-docs/doc//pvfs-config-options.php#ClientJobFlowTimeoutSecs
http://www.pvfs.org/cvs/pvfs-2-7-branch-docs/doc//pvfs-config-options.php#ClientJobBMITimeoutSecs
-sam
On Oct 10, 2008, at 10:37 AM, <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> wrote:
Hello,
I am trying to do some "load tests" with pvfs2, but find the following
in the logs (I produced them with 'pvfs2-set-debugmask -m /mnt/test
"network,server,client"'):
Client:
[D 11:34:10.421223] [INFO]: Mapping pointer 0x2b875cb28000 for I/O.
[D 11:34:10.433532] [INFO]: Mapping pointer 0x6a9000 for I/O.
[E 11:40:02.941501] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 31963.
Server01:
[D 10/08 11:40] BMI_tcp_post_send_generic: Sent: 24 bytes of data.
[D 10/08 11:40] [BMI CONTROL]: BMI_set_info: set_info: 7570864
option: 6
[D 10/08 11:40] [BMI CONTROL]: BMI_set_info: searching for ref 7570864
[D 10/08 11:40] [BMI CONTROL]: BMI_set_info: decremented ref 7570864
to: 0
[D 10/08 11:40] server_state_machine_complete 0x2aaab4022030
[D 10/08 11:40] server_state_machine_terminate 0x2aaab4022030
[D 10/08 11:40] Error: bmi_tcp: Connection reset by peer
[D 10/08 11:40] BMI_testcontext completing: 46912585631680
[E 10/08 11:40] handle_io_error: flow proto error cleanup started on
0x2aaab0008690: Connection reset by peer
[E 10/08 11:40] handle_io_error: flow proto 0x2aaab0008690 canceled 0
operations, will clean up.
[E 10/08 11:40] handle_io_error: flow proto 0x2aaab0008690 error
cleanup
finished: Connection reset by peer
[D 10/08 11:40] [BMI CONTROL]: BMI_set_info: set_info: 7811296
option: 6
[D 10/08 11:40] [BMI CONTROL]: BMI_set_info: searching for ref 7811296
[D 10/08 11:40] [BMI CONTROL]: BMI_set_info: decremented ref 7811296
to: 0
[D 10/08 11:40] [BMI CONTROL]: bmi_addr_drop: bmi discarding address:
7811296
[D 10/08 11:40] server_state_machine_complete 0x2aaab40381d0
The cluster configuration is as follows:
- three hosts with ~400Gb ext3 slice each mounted from a SAN via FC
acting as metadata servers, I/O servers and clients;
- two hosts acting as clients only.
- Debian 4.0, kernel 2.6.24, pvfs2 module 2.7.1
The hosts are connected to each other by gigabit Ethernet. I am
mounting the filesystem on each client-only host from a different
server: is this correct? What is the difference between mounting from
different servers and using one server for all clients?
Each server/client host instead uses itself as server. Again, would it
be better to use other hosts as servers?
Last, but not least: have you got any clues on the possible cause of
the error? I checked all the other logs, and are perfectly clean.
Also,
pvfs2-ping doesn't report anything wrong.
Please forgive me if the above questions have already been answered: I
tried searching the mailing list archives but without success...
Thank you very much for your kind attention!
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users