I also run the same codes (the same version of Libmesh, the same codes I
wrote, the same number of CPUs, and the same version of PETSc) on another
Cluster. The environment is Intel 32bit, RedHat Enterprise, GCC3.2 and
MPICH127P1. The following is the cost time. The total cost time is about
88secs (3895secs in the previous case). You can find "find_global_indices()"
spent little time. I don't know where the problem is. Could you give me some
help? thanks a lot.

Regards,
Yujie

 
-------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=94.0735, Active
time=88.2455                                                |
 
-------------------------------------------------------------------------------------------------------------
| Event                           nCalls    Total Time  Avg Time    Total
Time  Avg Time    % of Active Time  |
|                                           w/o Sub     w/o Sub     With
Sub    With Sub    w/o S    With S   |
|-------------------------------------------------------------------------------------------------------------|
|
|
|
|
|
DofMap
|
|   add_neighbors_to_send_list()  3         0.2863      0.095427
0.3427      0.114217    0.32     0.39     |
|   build_constraint_matrix()     33576     0.3030      0.000009
0.3030      0.000009    0.34     0.34     |
|   cnstrn_elem_mat_vec()         33576     0.2127      0.000006
0.2127      0.000006    0.24     0.24     |
|   compute_sparsity()            3         2.8864      0.962134
3.7640      1.254653    3.27     4.27     |
|   create_dof_constraints()      3         0.4083      0.136117
0.8024      0.267480    0.46     0.91     |
|   distribute_dofs()             3         0.6631      0.221018
1.4217      0.473890    0.75     1.61     |
|   dof_indices()                 426112    3.7301      0.000009
3.7301      0.000009    4.23     4.23     |
|   enforce_constraints_exactly() 2         0.0124      0.006191
0.0124      0.006191    0.01     0.01     |
|   old_dof_indices()             67152     0.5640      0.000008
0.5640      0.000008    0.64     0.64     |
|   prepare_send_list()           3         0.0102      0.003389
0.0102      0.003389    0.01     0.01     |
|   reinit()                      3         0.7196      0.239869
0.7196      0.239869    0.82     0.82     |
|
|
|
FE
|
|   compute_affine_map()          162423    6.5415      0.000040
6.5415      0.000040    7.41     7.41     |
|   compute_face_map()            64735     3.1745      0.000049
3.1745      0.000049    3.60     3.60     |
|   compute_shape_functions()     162423    7.4944      0.000046
7.4944      0.000046    8.49     8.49     |
|   init_face_shape_functions()   54151     1.2186      0.000023
1.2186      0.000023    1.38     1.38     |
|   init_shape_functions()        115651    6.6277      0.000057
6.6277      0.000057    7.51     7.51     |
|   inverse_map()                 521467    7.4752      0.000014
7.4752      0.000014    8.47     8.47     |
|
|
|
GMVIO
|
|   write_nodal_data()            1         0.5900      0.590033
0.5900      0.590033    0.67     0.67     |
|
|
|
JumpErrorEstimator
|
|   estimate_error()              2         7.4065      3.703241
30.4493     15.224661   8.39     34.51    |
|
|
|
LocationMap
|
|   find()                        50456     0.2714      0.000005
0.2714      0.000005    0.31     0.31     |
|   init()                        4         0.0948      0.023701
0.0948      0.023701    0.11     0.11     |
|
|
|
Mesh
|
|   contract()                    2         0.1155      0.057756
0.1519      0.075933    0.13     0.17     |
|   find_neighbors()              3         4.9775      1.659166
4.9798      1.659928    5.64     5.64     |
|   read()                        1         3.4427      3.442706
3.4427      3.442706    3.90     3.90     |
|   renumber_nodes_and_elem()     8         0.1274      0.015928
0.1274      0.015928    0.14     0.14     |
|
|
|
MeshCommunication
|
|   broadcast_bcs()               1         0.0030      0.002954
0.0092      0.009157    0.00     0.01     |
|   broadcast_mesh()              1         0.1153      0.115250
0.1212      0.121190    0.13     0.14     |
|   compute_hilbert_indices()     4         1.1428      0.285697
1.1428      0.285697    1.30     1.30     |
|   find_global_indices()         4         0.3963      0.099081
1.7243      0.431078    0.45     1.95     |
|   parallel_sort()               4         0.1184      0.029595
0.1558      0.038955    0.13     0.18     |
|
|
|
MeshRefinement
|
|   _coarsen_elements()           4         0.0447      0.011166
0.0481      0.012035    0.05     0.05     |
|   _refine_elements()            4         0.5717      0.142932
1.3206      0.330145    0.65     1.50     |
|   add_point()                   50456     0.3634      0.000007
0.6900      0.000014    0.41     0.78     |
|   make_coarsening_compatible()  11        0.7757      0.070514
0.7757      0.070514    0.88     0.88     |
|   make_refinement_compatible()  11        0.1117      0.010153
0.1164      0.010585    0.13     0.13     |
|
|
|
MetisPartitioner
|
|   partition()                   3         1.9354      0.645147
3.3210      1.106993    2.19     3.76     |
|
|
|
Parallel
|
|   allgather()                   16        0.0124      0.000772
0.0124      0.000772    0.01     0.01     |
|   broadcast()                   13        0.0119      0.000912
0.0119      0.000912    0.01     0.01     |
|   gather()                      3         0.0001      0.000044
0.0001      0.000044    0.00     0.00     |
|   max()                         267       0.0468      0.000175
0.0468      0.000175    0.05     0.05     |
|   min()                         467       0.5519      0.001182
0.5519      0.001182    0.63     0.63     |
|   probe()                       26        0.0155      0.000595
0.0155      0.000595    0.02     0.02     |
|   receive()                     26        0.0095      0.000365
0.0250      0.000962    0.01     0.03     |
|   send()                        26        0.0052      0.000201
0.0052      0.000201    0.01     0.01     |
|   send_receive()                34        0.0041      0.000121
0.0350      0.001030    0.00     0.04     |
|   sum()                         20        0.0889      0.004443
0.0889      0.004443    0.10     0.10     |
|   wait()                        26        0.0005      0.000021
0.0005      0.000021    0.00     0.00     |
|
|
|
Partitioner
|
|   set_node_processor_ids()      3         0.5467      0.182219
0.5731      0.191044    0.62     0.65     |
|   set_parent_processor_ids()    3         0.0542      0.018076
0.0542      0.018076    0.06     0.06     |
|
|
|
PetscLinearSolver
|
|   solve()                       3         8.0743      2.691427
8.0764      2.692117    9.15     9.15     |
|
|
|
ProjectVector
|
|   operator()                    2         0.5417      0.270829
1.2560      0.627999    0.61     1.42     |
|
|
|
System
|
|   assemble()                    3         13.0582     4.352722
26.0303     8.676766    14.80    29.50    |
|   project_vector()              2         0.2916      0.145808
1.8486      0.924310    0.33     2.09     |
 
-------------------------------------------------------------------------------------------------------------
| Totals:                         1743206
88.2455                                         100.00            |
 
-------------------------------------------------------------------------------------------------------------


On Tue, Jan 26, 2010 at 2:22 PM, Yujie <[email protected]> wrote:

> Dear Libmesh developers,
>
> In previous emails, I met a problem likely about data communication between
> nodes. However, when I run the codes on Master node with 2 CPUs. That means
> that there is not data communication between nodes. The problem is always
> there. the following is the table of cost time. You can find
> "find_global_indices()" took a very long time.
>
> I am using AMD x86_64, Redhat Enterprise, GCC4.0 and MPICH127p1. Could you
> give me some advice? Thanks a lot.
>
>  
> -------------------------------------------------------------------------------------------------------------
> | libMesh Performance: Alive time=3921.43, Active
> time=3895.64                                                |
>
>  
> -------------------------------------------------------------------------------------------------------------
> | Event                           nCalls    Total Time  Avg Time    Total
> Time  Avg Time    % of Active Time  |
> |                                           w/o Sub     w/o Sub     With
> Sub    With Sub    w/o S    With S   |
>
> |-------------------------------------------------------------------------------------------------------------|
> |
> |
> |
> |
> |
> DofMap
> |
> |   add_neighbors_to_send_list()  3         1.8758      0.625277
> 2.0094      0.669790    0.05     0.05     |
> |   build_constraint_matrix()     36160     1.1394      0.000032
> 1.1394      0.000032    0.03     0.03     |
> |   cnstrn_elem_mat_vec()         36160     1.0369      0.000029
> 1.0369      0.000029    0.03     0.03     |
> |   compute_sparsity()            3         66.9598     22.319928
> 69.6673     23.222438   1.72     1.79     |
> |   create_dof_constraints()      3         1.9375      0.645828
> 2.7731      0.924369    0.05     0.07     |
> |   distribute_dofs()             3         5.9749      1.991641
> 11.1858     3.728586    0.15     0.29     |
> |   dof_indices()                 451121    10.4046     0.000023
> 10.4046     0.000023    0.27     0.27     |
> |   enforce_constraints_exactly() 2         0.0860      0.043013
> 0.0860      0.043013    0.00     0.00     |
> |   old_dof_indices()             72320     1.6502      0.000023
> 1.6502      0.000023    0.04     0.04     |
> |   prepare_send_list()           3         1.3888      0.462939
> 1.3888      0.462939    0.04     0.04     |
> |   reinit()                      3         4.4227      1.474239
> 4.4227      1.474239    0.11     0.11     |
> |
> |
> |
> FE
> |
> |   compute_affine_map()          166087    28.4779     0.000171
> 28.4779     0.000171    0.73     0.73     |
> |   compute_face_map()            65290     12.9293     0.000198
> 12.9293     0.000198    0.33     0.33     |
> |   compute_shape_functions()     166087    53.0771     0.000320
> 53.0771     0.000320    1.36     1.36     |
> |   init_face_shape_functions()   54525     6.8743      0.000126
> 6.8743      0.000126    0.18     0.18     |
> |   init_shape_functions()        116731    41.2603     0.000353
> 41.2603     0.000353    1.06     1.06     |
> |   inverse_map()                 528671    15.4390     0.000029
> 15.4390     0.000029    0.40     0.40     |
> |
> |
> |
> GMVIO
> |
> |   write_nodal_data()            1         2.0390      2.038986
> 2.0390      2.038986    0.05     0.05     |
> |
> |
> |
> JumpErrorEstimator
> |
> |   estimate_error()              2         20.5754     10.287681
> 126.0162    63.008090   0.53     3.23     |
> |
> |
> |
> LocationMap
> |
> |   find()                        69104     0.9536      0.000014
> 0.9536      0.000014    0.02     0.02     |
> |   init()                        4         0.4922      0.123059
> 0.4922      0.123059    0.01     0.01     |
> |
> |
> |
> Mesh
> |
> |   contract()                    2         0.6653      0.332647
> 1.1356      0.567810    0.02     0.03     |
> |   find_neighbors()              3         30.7003     10.233423
> 30.8006     10.266880   0.79     0.79     |
> |   read()                        1         5.7922      5.792197
> 5.7922      5.792197    0.15     0.15     |
> |   renumber_nodes_and_elem()     8         1.9422      0.242779
> 1.9422      0.242779    0.05     0.05     |
> |
> |
> |
> MeshCommunication
> |
> |   broadcast_bcs()               1         0.0604      0.060440
> 0.0743      0.074266    0.00     0.00     |
> |   broadcast_mesh()              1         1.0069      1.006910
> 1.0271      1.027131    0.03     0.03     |
> |   compute_hilbert_indices()     4         4.1264      1.031604
> 4.1264      1.031604    0.11     0.11     |
> |   find_global_indices()         4         3172.3789   793.094713
> 3412.5255   853.131373  81.43    87.60    |
> |   parallel_sort()               4         158.8466    39.711649
> 161.0895    40.272380   4.08     4.14     |
> |
> |
> |
> MeshRefinement
> |
> |   _coarsen_elements()           4         0.4822      0.120546
> 0.4828      0.120710    0.01     0.01     |
> |   _refine_elements()            4         2.7347      0.683675
> 5.5430      1.385758    0.07     0.14     |
> |   add_point()                   69104     1.3920      0.000020
> 2.5913      0.000037    0.04     0.07     |
> |   make_coarsening_compatible()  12        7.9254      0.660450
> 7.9254      0.660450    0.20     0.20     |
> |   make_refinement_compatible()  12        1.3526      0.112718
> 1.3618      0.113480    0.03     0.03     |
> |
> |
> |
> MetisPartitioner
> |
> |   partition()                   3         9.6725      3.224183
> 2854.3422   951.447412  0.25     73.27    |
> |
> |
> |
> Parallel
> |
> |   allgather()                   16        0.4336      0.027100
> 0.4336      0.027100    0.01     0.01     |
> |   broadcast()                   13        0.0327      0.002513
> 0.0327      0.002513    0.00     0.00     |
> |   gather()                      3         0.0007      0.000229
> 0.0007      0.000229    0.00     0.00     |
> |   max()                         275       0.5796      0.002108
> 0.5796      0.002108    0.01     0.01     |
> |   min()                         482       38.8301     0.080560
> 38.8301     0.080560    1.00     1.00     |
> |   probe()                       26        56.6142     2.177470
> 56.6142     2.177470    1.45     1.45     |
> |   receive()                     26        0.0334      0.001284
> 56.6479     2.178767    0.00     1.45     |
> |   send()                        26        18.5027     0.711642
> 18.5027     0.711642    0.47     0.47     |
> |   send_receive()                34        0.0077      0.000225
> 75.1605     2.210604    0.00     1.93     |
> |   sum()                         20        2.8493      0.142467
> 2.8493      0.142467    0.07     0.07     |
> |   wait()                        26        0.0016      0.000061
> 0.0016      0.000061    0.00     0.00     |
> |
> |
> |
> Partitioner
> |
> |   set_node_processor_ids()      3         3.7781      1.259356
> 4.5113      1.503763    0.10     0.12     |
> |   set_parent_processor_ids()    3         0.5565      0.185501
> 0.5565      0.185501    0.01     0.01     |
> |
> |
> |
> PetscLinearSolver
> |
> |   solve()                       3         27.1758     9.058608
> 27.1827     9.060906    0.70     0.70     |
> |
> |
> |
> ProjectVector
> |
> |   operator()                    2         2.2219      1.110940
> 4.1955      2.097752    0.06     0.11     |
> |
> |
> |
> System
> |
> |   assemble()                    3         57.2052     19.068412
> 125.2392    41.746413   1.47     3.21     |
> |   project_vector()              2         8.7408      4.370384
> 13.9501     6.975064    0.22     0.36     |
>
>  
> -------------------------------------------------------------------------------------------------------------
> | Totals:                         1832413
> 3895.6373                                       100.00            |
>
>  
> -------------------------------------------------------------------------------------------------------------
> Regards,
> Yujie
>
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to