Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Hi Frederik, 'lustre-info.pl --monitor=io-size' seems to sit at collecting data, `io-size' reads its data from the 'disk I/O size' part of brw_stats: read | write disk I/O size ios % cum % | ios % cum % ..and in your case there are no stats, (for reasons unknown to me...) that's why lustre-info.pl cannot display anything. Otherwise the file looks fine: eg. `--monitor=io-time' should work. Regards, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
good job, I'll download and learn from them. On Wed, Jul 28, 2010 at 9:38 PM, Adrian Ulrich adr...@blinkenlights.ch wrote: First: Sorry for the shameless self advertising, but... I uploaded two lustre-related modules to the CPAN: #1: Lustre::Info provides easy access to information located at /proc/fs/lustre, it also comes with a 'performance monitoring' script called 'lustre-info.pl' #2 Lustre::LFS offers IO::Dir and IO::File-like filehandles but with additional lustre-specific features ($dir_fh-set_stripe...) Examples and details: Lustre::Info and lustre-info.pl --- Lustre::Info provides a Perl-OO interface to lustres procfs information. (confusing) example code to get the blockdevice of all OSTs: # my $l = Lustre::Info-new; print join(\n, map( { $l-get_ost($_)-get_name.: .$l-get_ost($_)-get_blockdevice } \ �...@{$l-get_ost_list}), '' ) if $l-is_ost; # ..output: $ perl test.pl lustre1-OST001e: /dev/md17 lustre1-OST0016: /dev/md15 lustre1-OST000e: /dev/md13 lustre1-OST0006: /dev/md11 The module also includes a script called 'lustre-info.pl' that can be used to gather some live performance statistics: Use `--ost-stats' to get a quick overview on what's going on: $ lustre-info.pl --ost-stats lustre1-OST0006 (@ /dev/md11) : write= 5.594 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.0 R/s lustre1-OST000e (@ /dev/md13) : write= 3.997 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 4.0 R/s lustre1-OST0016 (@ /dev/md15) : write= 5.502 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.0 R/s lustre1-OST001e (@ /dev/md17) : write= 5.905 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.7 R/s You can also get client-ost details via `--monitor=MODE' $ lustre-info.pl --monitor=ost --as-list # this will only show clients where read+write = 1MB/s client nid | lustre1-OST0006 | lustre1-OST000e | lustre1-OST0016 | lustre1-OST001e | +++ TOTALS +++ (MB/s) 10.201.46...@o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 1.1 | read= 0.0, write= 1.1 10.201.47...@o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 1.2 | r= 0.0, w= 2.0 | r= 0.0, w= 0.0 | read= 0.0, write= 3.2 There are many more options, checkout `lustre-info.pl --help' for details! Lustre::LFS::Dir and Lustre::LFS::File --- This two packages behave like IO::File and IO::Dir but both of them add some lustre-only features to the returned filehandle. Quick example: my $fh = Lustre::LFS::File; # $fh is a normal IO::File-like FH $fh-open( test) or die; print $fh Foo Bar!\n; my $stripe_info = $fh-get_stripe or die Not on a lustre filesystem?!\n; Keep in mind that both Lustre modules are far from being complete: Lustre::Info really needs some MDT support and Lustre::LFS is just a wrapper for /usr/bin/lfs: An XS-Version would be much better. But i'd love to hear some feedback if someone decides to play around with this modules + lustre-info.pl :-) Cheers, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Hi Adrian, thanks for sharing these with us. Adrian Ulrich wrote: I uploaded two lustre-related modules to the CPAN: #1: Lustre::Info provides easy access to information located at /proc/fs/lustre, it also comes with a 'performance monitoring' script called 'lustre-info.pl' I did have a bit of a play with the lustre-info.pl script on our test file system and it seems to work nicely. If you've got a lot of OSTs on your server you need a wide monitor for some of the options like --monitor=ost-patterns for all OSTs... We are currently running Lustre 1.6.7.2 (+ a few patches) on our OSTs, in case this makes a difference for my issues below. [snip] Examples and details: Lustre::Info and lustre-info.pl --- [snip] The module also includes a script called 'lustre-info.pl' that can be used to gather some live performance statistics: Use `--ost-stats' to get a quick overview on what's going on: $ lustre-info.pl --ost-stats In our case this looks like this (on a very quiet file system): play01-OST (@ /dev/sdb) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 0.0 R/s play01-OST0001 (@ /dev/sdc) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/sUse of uninitialized value in division (/) at /usr/local/bin/lustre-info.pl line 187. , setattr= 0.0 R/s, preprw= 0.0 R/s play01-OST0002 (@ /dev/sdd) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 0.0 R/s play01-OST0003 (@ /dev/sde) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 0.0 R/s play01-OST0004 (@ /dev/sdf) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/sUse of uninitialized value in division (/) at /usr/local/bin/lustre-info.pl line 187. , setattr= 0.0 R/s, preprw= 0.0 R/s play01-OST0005 (@ /dev/sdg) : write= 0.000 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/sUse of uninitialized value in division (/) at /usr/local/bin/lustre-info.pl line 187. , setattr= 0.0 R/s, preprw= 0.0 R/s Note the 'Use of uninitialized value in division...' errors. Looking at the code it seems the value for 'setattr' is missing from the stats file for some of our OSTs. Looking at the stats file, indeed the setattr line is missing for some OSTs. Has anyone seen this before? What could have caused this? You can also get client-ost details via `--monitor=MODE' $ lustre-info.pl --monitor=ost --as-list # this will only show clients where read+write = 1MB/s client nid | lustre1-OST0006| lustre1-OST000e| lustre1-OST0016 | lustre1-OST001e| +++ TOTALS +++ (MB/s) 10.201.46...@o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 1.1 | read= 0.0, write= 1.1 10.201.47...@o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 1.2 | r= 0.0, w= 2.0 | r= 0.0, w= 0.0 | read= 0.0, write= 3.2 'lustre-info.pl --monitor=io-size' seems to sit at collecting data, please wait... for a very long time until I killed it, I have not had the time to debug this yet. Kind regards, Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
On 2010-07-29, at 09:08, Frederik Ferner wrote: Note the 'Use of uninitialized value in division...' errors. Looking at the code it seems the value for 'setattr' is missing from the stats file for some of our OSTs. Looking at the stats file, indeed the setattr line is missing for some OSTs. Has anyone seen this before? What could have caused this? The statistics code for OBD devices is generic. For operations that have never been done on a particular target there is no stats value printed. Otherwise, the stats file would be 60 lines long and mostly be filled with counters that are all 0. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Thanks Andreas, that explains that. So the warning can be made to go by changing line 183 in lustre-info.pl from printf(, %s=%5.1f R/s,$type,$stats-{$type}/$slice) to be if (exists $stats-{$type}) { printf(, %s=%5.1f R/s,$type,$stats-{$type}/$slice); } (patch attached) Tina Andreas Dilger wrote: On 2010-07-29, at 09:08, Frederik Ferner wrote: Note the 'Use of uninitialized value in division...' errors. Looking at the code it seems the value for 'setattr' is missing from the stats file for some of our OSTs. Looking at the stats file, indeed the setattr line is missing for some OSTs. Has anyone seen this before? What could have caused this? The statistics code for OBD devices is generic. For operations that have never been done on a particular target there is no stats value printed. Otherwise, the stats file would be 60 lines long and mostly be filled with counters that are all 0. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd Diamond House, Harwell Science and Innovation Campus - 01235 77 8442 --- lustre-info.pl 2010-07-29 17:45:10.0 +0100 +++ lustre-info.pl_new 2010-07-29 17:45:54.0 +0100 @@ -180,7 +180,8 @@ sub loop_ost_stats { # Add some 'metadata' info foreach my $type (qw(create destroy setattr preprw)) { - printf(, %s=%5.1f R/s,$type,$stats-{$type}/$slice); + if (exists $stats-{$type}) + { printf(, %s=%5.1f R/s,$type,$stats-{$type}/$slice); } } print \n; } ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Hi Frederik, If you've got a lot of OSTs on your server you need a wide monitor for some of the options like --monitor=ost-patterns for all OSTs... The output format is not ideal, but it's a good reason to upgrade your workstation to a dualhead configuration ;-) Looking at the code it seems the value for 'setattr' is missing from the stats file for some of our OSTs. Looking at the stats file, indeed the setattr line is missing for some OSTs. As Andreas already said: If 'setattr' is missing, there was no setattr operation (yet). Changing printf(, %s=%5.1f R/s,$type,$stats-{$type}/$slice); into printf(, %s=%5.1f R/s,$type,(($stats-{$type}||0)/$slice) ); should fix the warning. (The totals are ok, because in perl undef/$x == 0/$x) 'lustre-info.pl --monitor=io-size' seems to sit at collecting data, please wait... for a very long time until I killed it, I have not had the time to debug this yet. I never tested it with anything else than 1.8.1.1 but this should be trivial to fix: Could you mail me the output of /proc/fs/lustre/obdfilter/##SOME_OST##/exports/##A_RANDOM_NID##/brw_stats ? Regards, Adrian -- RFC 1925: (11) Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
Adrian Ulrich wrote: If you've got a lot of OSTs on your server you need a wide monitor for some of the options like --monitor=ost-patterns for all OSTs... The output format is not ideal, but it's a good reason to upgrade your workstation to a dualhead configuration ;-) ;-) [snip] 'lustre-info.pl --monitor=io-size' seems to sit at collecting data, please wait... for a very long time until I killed it, I have not had the time to debug this yet. I never tested it with anything else than 1.8.1.1 but this should be trivial to fix: Could you mail me the output of /proc/fs/lustre/obdfilter/##SOME_OST##/exports/##A_RANDOM_NID##/brw_stats ? See attached. Thanks! Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) snapshot_time: 1280423140.470719 (secs.usecs) read | write pages per bulk r/w rpcs % cum % | rpcs % cum % 1: 657 0 0 | 2533 0 0 2: 119 0 0 | 2571 0 1 4: 146 0 0 | 44921 12 13 8: 107 0 0 | 280002 77 91 16:112 0 0 |6 0 91 32: 37 0 0 | 12 0 91 64: 19 0 0 | 10 0 91 128:26 0 0 | 21 0 91 256:727565 99 100 | 28991 8 100 read | write discontiguous pagesrpcs % cum % | rpcs % cum % 0: 726505 99 99 | 359067 100 100 1: 0 0 99 |0 0 100 2: 15 0 99 |0 0 100 3: 0 0 99 |0 0 100 4: 1 0 99 |0 0 100 5: 2 0 99 |0 0 100 6:2243 0 99 |0 0 100 7: 22 0 100 |0 0 100 read | write discontiguous blocks rpcs % cum % | rpcs % cum % 0: 725512 99 99 | 359067 100 100 1: 541 0 99 |0 0 100 2: 315 0 99 |0 0 100 3: 134 0 99 |0 0 100 4: 14 0 99 |0 0 100 5: 3 0 99 |0 0 100 6:2247 0 99 |0 0 100 7: 22 0 100 |0 0 100 read | write disk fragmented I/Os ios % cum % | ios % cum % 0: 18 0 0 |0 0 0 1: 719295 98 98 | 357926 99 99 2:6200 0 99 | 1141 0 100 3: 540 0 99 |0 0 100 4: 94 0 99 |0 0 100 5: 221 0 99 |0 0 100 6: 35 0 99 |0 0 100 7: 68 0 99 |0 0 100 8: 15 0 99 |0 0 100 9: 16 0 99 |0 0 100 10: 6 0 99 |0 0 100 11: 4 0 99 |0 0 100 12: 0 0 99 |0 0 100 13: 3 0 99 |0 0 100 14: 1 0 99 |0 0 100 15: 0 0 99 |0 0 100 16: 0 0 99 |0 0 100 17: 0 0 99 |0 0 100 18: 0 0 99 |0 0 100 19: 0 0 99 |0 0 100 20: 0 0 99 |0 0 100 21: 1 0 99 |0 0 100 22: 0 0 99 |0 0 100 23: 0 0 99 |0 0 100 24: 1 0 99 |0 0 100 25: 0 0 99 |0 0 100 26: 0 0 99 |0 0 100 27: 0 0 99 |0 0 100 28: 0 0 99 |0 0 100 29: 0 0 99 |0 0 100 30: 0 0 99 |0 0 100 31: 2270 0 100 |0 0 100 read | write disk I/Os in flightios % cum % | ios % cum % read | write I/O time (1/1000s) ios % cum % | ios % cum % 1: 865 0 0 | 331676 92 92 2: 10821 1 1 | 26464 7 99 4: 221868 30 32 | 875 0 99 8: 391101 53 85 | 16 0 99 16: 66126 9 94 |1 0 99 32: 31806 4 99 |7 0 99 64: 2475 0 99 | 17 0 99 128: 1288
[Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN
First: Sorry for the shameless self advertising, but... I uploaded two lustre-related modules to the CPAN: #1: Lustre::Info provides easy access to information located at /proc/fs/lustre, it also comes with a 'performance monitoring' script called 'lustre-info.pl' #2 Lustre::LFS offers IO::Dir and IO::File-like filehandles but with additional lustre-specific features ($dir_fh-set_stripe...) Examples and details: Lustre::Info and lustre-info.pl --- Lustre::Info provides a Perl-OO interface to lustres procfs information. (confusing) example code to get the blockdevice of all OSTs: # my $l = Lustre::Info-new; print join(\n, map( { $l-get_ost($_)-get_name.: .$l-get_ost($_)-get_blockdevice } \ @{$l-get_ost_list}), '' ) if $l-is_ost; # ..output: $ perl test.pl lustre1-OST001e: /dev/md17 lustre1-OST0016: /dev/md15 lustre1-OST000e: /dev/md13 lustre1-OST0006: /dev/md11 The module also includes a script called 'lustre-info.pl' that can be used to gather some live performance statistics: Use `--ost-stats' to get a quick overview on what's going on: $ lustre-info.pl --ost-stats lustre1-OST0006 (@ /dev/md11) : write= 5.594 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.0 R/s lustre1-OST000e (@ /dev/md13) : write= 3.997 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 4.0 R/s lustre1-OST0016 (@ /dev/md15) : write= 5.502 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.0 R/s lustre1-OST001e (@ /dev/md17) : write= 5.905 MB/s, read= 0.000 MB/s, create= 0.0 R/s, destroy= 0.0 R/s, setattr= 0.0 R/s, preprw= 6.7 R/s You can also get client-ost details via `--monitor=MODE' $ lustre-info.pl --monitor=ost --as-list # this will only show clients where read+write = 1MB/s client nid | lustre1-OST0006| lustre1-OST000e| lustre1-OST0016 | lustre1-OST001e| +++ TOTALS +++ (MB/s) 10.201.46...@o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 0.0 | r= 0.0, w= 1.1 | read= 0.0, write= 1.1 10.201.47...@o2ib | r= 0.0, w= 0.0 | r= 0.0, w= 1.2 | r= 0.0, w= 2.0 | r= 0.0, w= 0.0 | read= 0.0, write= 3.2 There are many more options, checkout `lustre-info.pl --help' for details! Lustre::LFS::Dir and Lustre::LFS::File --- This two packages behave like IO::File and IO::Dir but both of them add some lustre-only features to the returned filehandle. Quick example: my $fh = Lustre::LFS::File; # $fh is a normal IO::File-like FH $fh-open( test) or die; print $fh Foo Bar!\n; my $stripe_info = $fh-get_stripe or die Not on a lustre filesystem?!\n; Keep in mind that both Lustre modules are far from being complete: Lustre::Info really needs some MDT support and Lustre::LFS is just a wrapper for /usr/bin/lfs: An XS-Version would be much better. But i'd love to hear some feedback if someone decides to play around with this modules + lustre-info.pl :-) Cheers, Adrian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss