Re: [Lustre-discuss] df running slow

2014-11-26 Thread Alexander I Kulyavtsev
Try
/usr/sbin/lctl set_param osc.lustre-OST0001-*.active=0

as workarownd on client host, with proper names for filesystem and ost names 
for all retired OSTs.

We had 'df' hanging on client after we retired some OSTs on 1.8.9 system and 
now keep this mantra in rc.local .

What client version do you have,  2.5.3  or 1.8.x?

Alex.

On Nov 26, 2014, at 6:59 AM, Jon Tegner mailto:teg...@foi.se>> 
wrote:

Hi!

I recently got some help regarding removing an OSS/OST from the file system. 
Last thing I did was to permanently remove it with (on the MDS):

lctl conf_param ost_name.osc.active=0

This all seems to be working, and on the clients, the command


lctl dl

indicates the OSS/OST is inactive. However, after a reboot of the clients the 
same command no longer indicates the removed OSS/OST to be inactive. Besides, 
the command


df


Takes almost 2 minutes to complete.

I'm probably doing something really stupid here, and would be happy if someone 
could tell me what it is ;-)

Running 2.5.3.

Regards,

/jon
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [lustre-discuss] MDT partition getting full

2015-04-22 Thread Alexander I Kulyavtsev
Before you remounted as ldiskfs, what is the output of
  mount -t lustre
  lfs df -hi
  lfs df -h

the first command is to verify fs is actually mounted as lustre.
Alex.

On Apr 22, 2015, at 4:23 PM, Colin Faber 
mailto:cfa...@gmail.com>> wrote:

You could look at your MDT partition directly, either unmount it and remount as 
ldiskfs and examine where your space is going, or use debugfs to do the same, 
with it mounted.


On Wed, Apr 22, 2015 at 11:57 AM, Radu Popescu 
mailto:radu.pope...@amoma.com>> wrote:
Hi,

changelog is not enabled. I’ve checked 
/proc/fs/lustre/mdd/NAMEOFMDT/changelog_users and got:

current index: 0
IDindex

Thanks,
Radu


On 22 Apr 2015, at 19:52, Colin Faber 
mailto:cfa...@gmail.com>> wrote:

Do you have changelogs enabled?

On Wed, Apr 22, 2015 at 2:14 AM, Radu Popescu 
mailto:radu.pope...@amoma.com>> wrote:
Hi,

I have the following Lustre setup:

- servers
- number: 9
- Lustre version: 2.5.3
- OS: CentOS 6.6
- RPM URL: 
https://downloads.hpdd.intel.com/public/lustre/lustre-2.5.3/el6/server/RPMS/

- clients
- number: 90
- Lustre version: 2.5.56
- OS: Debian Wheezy
- Packages were manually created from sources
- all clients have all 9 Lustre mountpoints

Lustre setup:

MGS + MDT + OST all stay on a single LUN which has a VG (160GB) created and 3 
LVs for each of the partitions, all mounted on each server:

MGS - 4GB
MDT - 78.12GB
OST - 78.14GB

(I’ve chosen a comparable size for MDT and OST because of the small file size)
- Total number of files is at around 16 million, sizes between <1K and 1.7MB. 
They are not equally spread on all mountpoints so let’s say I have a 2M maximum 
number of files on a Lustre volume.

My problem is that MDT partition is getting full. Inodes are fine, only 3% 
used, which is ok, but the space used is > 50% used, and constantly dropping. 
So I think that within a week, I’ll be out of storage on all MDT partitions. 
And I didn’t specify any special options when creating MDT partitions, so bytes 
per inode should be at 16K (default setting).

Anyone has any ideas?

Thanks,
Radu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST partition sizes

2015-04-29 Thread Alexander I Kulyavtsev
What range of record sizes did you use for IOR? This is more important than 
file size.
100MB is small, overall data size (# of files) shall be twice as memory.
I ran series of test for small record size for raidz2 10+2; will re-run some 
tests after upgrading to 0.6.4.1 .

Single file performance differs substantially from file per process.

Alex.

On Apr 29, 2015, at 9:38 AM, Scott Nolin 
mailto:scott.no...@ssec.wisc.edu>> wrote:

I used IOR, singlefile, 100MB files. That's the most important workload for us. 
I tried several different file sizes, but 100MB seemed a reasonable compromise 
for what I see the most. We rarely or never do file striping.

I remember I did see a difference between 10+2 and 8+2. Especially at smaller 
numbers of clients and threads, the 8+2 performance numbers were more 
consistent, made a smoother curve. 10+2 with not a lot of threads the 
performance was more variable.

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST partition sizes

2015-04-29 Thread Alexander I Kulyavtsev
ior/bin/IOR.mpiio.mvapich2-2.0b -h

 -t N  transferSize -- size of transfer in bytes (e.g.: 8, 4k, 2m, 1g)

IOR reports it in the log :

Command line used: 
/home/aik/lustre/benchmark/git/ior/bin/IOR.mpiio.mvapich2-2.0b -v -a MPIIO -i5 
-g -e -w -r -b 16g -C -t 8k -o 
/mnt/lfs/admin/iotest/ior/stripe_2/ior-testfile.ssf
...
Summary:

api= MPIIO (version=3, subversion=0)
test filename  = /mnt/lfs/admin/iotest/ior/stripe_2/ior-testfile.ssf
access = single-shared-file, independent
pattern= segmented (1 segment)
ordering in a file = sequential offsets
ordering inter file=constant task offsets = 1
clients= 32 (8 per node)
repetitions= 5
xfersize   = 8192 bytes
blocksize  = 16 GiB
aggregate filesize = 512 GiB

Here we have xfersize 8k, each client of 32 writes 16GB, so the aggregate file 
size is 512GB.

I would expect records size to be ~1MB for our workloads.

Best regards, Alex.

On Apr 29, 2015, at 11:07 AM, Scott Nolin 
mailto:scott.no...@ssec.wisc.edu>> wrote:

Ok I looked up my notes.

I'm not really sure what you mean by record size. I assumed when I do a file 
per process the block size = file size. And that's what I see dropped on the 
filesystem.

I did -F -b 

With block sizes 1MB, 20MB, 100MB, 200MB, 500MB

2, 4, 8, 16 threads on 1 to 4 clients.

I assumed 2 threads on 1 client looks a lot like a client writing or reading 2 
files. I didn't bother looking at 1 thread.

Later I just started doing 100MB tests since it's a very common file size for 
us. Plus I didn't see real big difference once size gets bigger than that.

Scott


On 4/29/2015 10:24 AM, Alexander I Kulyavtsev wrote:
What range of record sizes did you use for IOR? This is more important
than file size.
100MB is small, overall data size (# of files) shall be twice as memory.
I ran series of test for small record size for raidz2 10+2; will re-run
some tests after upgrading to 0.6.4.1 .

Single file performance differs substantially from file per process.

Alex.

On Apr 29, 2015, at 9:38 AM, Scott Nolin 
mailto:scott.no...@ssec.wisc.edu>
<mailto:scott.no...@ssec.wisc.edu>> wrote:

I used IOR, singlefile, 100MB files. That's the most important
workload for us. I tried several different file sizes, but 100MB
seemed a reasonable compromise for what I see the most. We rarely or
never do file striping.

I remember I did see a difference between 10+2 and 8+2. Especially at
smaller numbers of clients and threads, the 8+2 performance numbers
were more consistent, made a smoother curve. 10+2 with not a lot of
threads the performance was more variable.



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] zfs -- mds/mdt -- ssd model / type recommendation

2015-05-05 Thread Alexander I Kulyavtsev
How much space is used per i-node on MDT in production installation.
What is recommended size of MDT?

I'm presently at about 10 KB/inode which seems too high compared with ldiskfs.

I ran out of inodes on zfs mdt in my tests and zfs got "locked". MDT zpool got 
all space used.

We have zpool created as stripe of mirrors ( mirror s0 s1 mirror s3 s3). Total 
size ~940 GB, get stuck at about 97 mil files.
zfs v 0.6.4.1 . default 128 KB record. Fragmentation went to 83% when things 
get locked at 98 % capacity; now I'm at 62% fragmentation after I removed some 
files (down to 97% space capacity.)

Shall we use smaller ZFS record size on MDT, say 8KB or 16KB? If inode is ~10KB 
and zfs record 128KB, we are dropping caches and read data we do not need.

 Alex.

On May 5, 2015, at 10:43 AM, Stearman, Marc  wrote:

> We are using the HGST S842 line of 2.5" SSDs.  We have them configures as a 
> raid10 setup in ZFS.  We started with SAS drives and found them to be too 
> slow, and were bottlenecked on the drives, so we upgraded to SSDs.  The nice 
> thing with ZFS is that it's not just a two device mirror.  You can do an 
> n-way mirror, so we added the SSDs to each of the vdevs with the SAS drives, 
> let them resilver online, and then removed the SAS drives.  Users did not 
> have to experience any downtime.
> 
> We have about 100PB of Lustre spread over 10 file systems.  All of them are 
> using SSDs.  We have a couple using OCZ SSDs, but I'm not a fan of their RMA 
> policies.  That has changed since they were bought by Toshiba, but I still 
> prefer the HGST drives.
> 
> We configure them as 10 mirror pairs (20 drives total), spread across two 
> JBODs so we can lose an entire JBOD and still have the pool up.
> 
> -Marc
> 
> 
> D. Marc Stearman
> Lustre Operations Lead
> stearm...@llnl.gov
> 925.423.9670
> 
> 
> 
> 
> On May 4, 2015, at 11:18 AM, Kevin Abbey  wrote:
> 
>> Hi,
>> 
>> For a single node OSS I'm planning to use a combined MGS/MDS.  Can anyone 
>> recommend an enterprise ssd designed for this workload?  I'd like to create 
>> a raid10  with 4x ssd using zfs as the backing fs.
>> 
>> Are there any published/documented systems using zfs in raid 10 using ssd?
>> 
>> Thanks,
>> Kevin
>> 
>> 
>> -- 
>> Kevin Abbey
>> Systems Administrator
>> Rutgers University
>> 
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] zfs -- mds/mdt -- ssd model / type recommendation

2015-05-05 Thread Alexander I Kulyavtsev
I checked lustre 1.8.8 ldiskfs MDT: 106*10^6 inodes take 610GB on MDT, or 3.5 
KB/inode. I've thought it is less.
So MDT size just 'factor three' more compared to old ldiskfs.
How many files do you plan to have?
Alex.

On May 5, 2015, at 12:16 PM, Alexander I Kulyavtsev 
mailto:a...@fnal.gov>> wrote:

I'm presently at about 10 KB/inode which seems too high compared with ldiskfs.

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OpenSFS / EOFS Presentations index

2015-05-11 Thread Alexander I Kulyavtsev
DocDB can be handy to manage documents.
http://docdb-v.sourceforge.net/

Check "public" instance here to see examples:
https://cd-docdb.fnal.gov/

Alex.

On May 11, 2015, at 8:46 PM, Scott Nolin 
mailto:scott.no...@ssec.wisc.edu>> wrote:

It would be really convenient if all the presentations for various LUG, LAD, 
and similar meetings were available in one page.

Ideally there would also be some kind of keywords for each presentation for 
easy searches, but even just having a comprehensive list of links would be 
valuable I think.

Scott
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] "tag" feature of ltop

2015-05-21 Thread Alexander I Kulyavtsev
It may have sense to keep tagging.

I marked OSS dslustre15 and then switched to OST view.  I have all OSTs on 
marked OSS highlighted:

005c F dslustre13  10160 1 0 16503001   95   80
005d F dslustre14  10160 0 0 07472001   95   79
005e F dslustre13  10160 0 0 08880001   95   75
005f F dslustre14  10160 0 0 07161001   95   77
0060 F dslustre15  10160 0 0 0747800   48   96   76
0061 F dslustre16  10160 0 0 07543001   96   78
0062 F dslustre15  10160 0 0 0690700   48   96   78
0063 F dslustre16  10160 0 0 07410001   96   75
0064 F dslustre15  10160 0 0 0611300   48   96   80
0065 F dslustre16  10160 0 0 06833001   96   78
0066 F dslustre15  10160 1 0 1654500   48   96   78
0067 F dslustre16  10160 1 0 17190001   96   78

I was about to say 'we do not use it' yesterday; tracking some issue today.
Thanks, ALex.


On May 18, 2015, at 7:29 PM, Faaland, Olaf P. 
mailto:faala...@llnl.gov>> wrote:

Hello,

I am working on updating ltop, the text client within LMT 
(https://github.com/chaos/lmt/wiki).  I am adding support for DNE (multiple 
active MDT's within a single filesystem).

In the interesting of keeping the tool free of cruft, I am asking the community 
about their usage.

Currently, ltop allows for the user to "tag" an OST or an OSS, which causes the 
row(s) for that OSS (or OST's on that OSS) to be underlined so that they stand 
out visually.  Presumably this is so that one can follow an OST as it bounces 
around the table, when the table is sorted by something that changes 
dynamically like CPU usage or lock count.

Does anyone use this feature?  The first few people I polled do not use it, but 
if others use it I will extend it to the MDT's.  If no one uses it, then I'll 
remove it entirely.

Thanks,

Olaf P. Faaland
Livermore Computing
Lawrence Livermore National Lab
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] "tag" feature of ltop

2015-05-21 Thread Alexander I Kulyavtsev
As you said in earlier mail, when I sort by IO rate, locks, etc., selected OST 
are jumping around or just in different rows as seen below.

Anyway, this is not strong desire in the long term: I may feed cerebro output 
to the web page.

Best regards, Alex.

On May 21, 2015, at 5:42 PM, Faaland, Olaf P. 
mailto:faala...@llnl.gov>> wrote:

Alexander,

Thanks for your reply.

ltop also lets you sort by OSS, so that the OSTs sharing an OSS are all next to 
each other.  Do you find tagging more helpful than that?

Olaf P. Faaland
LLNL

From: Alexander I Kulyavtsev [a...@fnal.gov<mailto:a...@fnal.gov>]
Sent: Thursday, May 21, 2015 2:59 PM
To: Faaland, Olaf P.
Cc: Alexander I Kulyavtsev; 
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] "tag" feature of ltop

It may have sense to keep tagging.

I marked OSS dslustre15 and then switched to OST view.  I have all OSTs on 
marked OSS highlighted:

005c F dslustre13  10160 1 0 16503001   95   80
005d F dslustre14  10160 0 0 07472001   95   79
005e F dslustre13  10160 0 0 08880001   95   75
005f F dslustre14  10160 0 0 07161001   95   77
0060 F dslustre15  10160 0 0 0747800   48   96   76
0061 F dslustre16  10160 0 0 07543001   96   78
0062 F dslustre15  10160 0 0 0690700   48   96   78
0063 F dslustre16  10160 0 0 07410001   96   75
0064 F dslustre15  10160 0 0 0611300   48   96   80
0065 F dslustre16  10160 0 0 06833001   96   78
0066 F dslustre15  10160 1 0 1654500   48   96   78
0067 F dslustre16  10160 1 0 17190001   96   78

I was about to say 'we do not use it' yesterday; tracking some issue today.
Thanks, ALex.


On May 18, 2015, at 7:29 PM, Faaland, Olaf P. 
mailto:faala...@llnl.gov>> wrote:

Hello,

I am working on updating ltop, the text client within LMT 
(https://github.com/chaos/lmt/wiki).  I am adding support for DNE (multiple 
active MDT's within a single filesystem).

In the interesting of keeping the tool free of cruft, I am asking the community 
about their usage.

Currently, ltop allows for the user to "tag" an OST or an OSS, which causes the 
row(s) for that OSS (or OST's on that OSS) to be underlined so that they stand 
out visually.  Presumably this is so that one can follow an OST as it bounces 
around the table, when the table is sorted by something that changes 
dynamically like CPU usage or lock count.

Does anyone use this feature?  The first few people I polled do not use it, but 
if others use it I will extend it to the MDT's.  If no one uses it, then I'll 
remove it entirely.

Thanks,

Olaf P. Faaland
Livermore Computing
Lawrence Livermore National Lab
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lustre manual formatting error

2015-06-18 Thread Alexander I Kulyavtsev
I believe path in  /proc/fs/lustre/obdfilter/*/brw_stats got broken in this 
manual subsection:


https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml


> 25.3.4.2. Visualizing Results
> 
> ... skip ...
> 
> It is also useful to monitor and record average disk I/O sizes during each 
> test using the 'disk io size' histogram in the file 
> /proc/fs/lustre/obdfilter/ (see Section 32.3.5, “Monitoring the OST Block I/O 
> Stream” for details). These numbers help identify problems in the system when 
> full-sized I/Os are not submitted to the underlying disk. This may be caused 
> by problems in the device driver or Linux block layer.
> 
>  */brw_stats

shall be

> It is also useful to monitor and record average disk I/O sizes during each 
> test using the 'disk io size' histogram in the file 
> /proc/fs/lustre/obdfilter/*/brw_stats 
> (see Section 32.3.5, “Monitoring the OST Block I/O Stream” for details). ...

Alex.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-10 Thread Alexander I Kulyavtsev
Hi Kurt,

to keep traffic from almost full OST we usually set ost in degraded mode like 
described in manual:

> Handling Degraded OST RAID Arrays

> To mark the OST as degraded, use:
> lctl set_param obdfilter.{OST_name}.degraded=1

Alex.

On Jul 10, 2015, at 10:13 AM, Kurt Strosahl  wrote:

> No, I'm aware of why the ost is getting new writes... it is because I had to 
> set the qos_threshold_rr to 100 due to 
> https://jira.hpdd.intel.com/browse/LU-5778  (I have an ost that has to be 
> ignored due to terrible write performance...)
> 
> w/r,
> Kurt
> 
> - Original Message -
> From: "Sean Brisbane" 
> To: "Kurt Strosahl" 
> Cc: "Patrick Farrell" , "lustre-discuss@lists.lustre.org" 
> 
> Sent: Friday, July 10, 2015 11:04:27 AM
> Subject: RE: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> Dear Kurt,
> 
> Apologies.  After leaving it some number of days it did *not* clean itself 
> up, but I feel that some number of days is long enough to verify that it is a 
> problem.
> 
> Sounds like you have another issue if the OST is not being marked as full and 
> writes are not being re-allocated to other OSTS .  I also have that second 
> issue on my system as well and I have only workarounds to offer you for the 
> problem.
> 
> Thanks,
> Sean
> 
> -Original Message-
> From: Kurt Strosahl [mailto:stros...@jlab.org] 
> Sent: 10 July 2015 16:01
> To: Sean Brisbane
> Cc: Patrick Farrell; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> The problem there is that I cannot afford to leave it "some number of 
> days"... it is at 97% full, so new writes are going to it faster then it can 
> clean itself off.
> 
> w/r,
> Kurt
> 
> - Original Message -
> From: "Sean Brisbane" 
> To: "Patrick Farrell" , "Kurt Strosahl" 
> Cc: lustre-discuss@lists.lustre.org
> Sent: Friday, July 10, 2015 10:44:39 AM
> Subject: RE: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> Hi,
> 
> The 'space not freed' issue also happened to me and I left it 'some number of 
> days'  I don't recall how many, it was a while back.
> 
> Cheers,
> Sean
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-10 Thread Alexander I Kulyavtsev
I think so, try it.
We do set ost degraded on 1.8 when ost nears 95% and we migrate data to another 
ost.
On 1.8 lfs_migrate uses 'rm' and objects are indeed deallocated. 

Alex

On Jul 10, 2015, at 10:55 AM, Kurt Strosahl  wrote:

> Will that let deletes happen against it?
> 
> w/r,
> Kurt
> 
> - Original Message -
> From: "aik" 
> To: "Kurt Strosahl" 
> Cc: "aik" , "Sean Brisbane" , 
> lustre-discuss@lists.lustre.org
> Sent: Friday, July 10, 2015 11:52:00 AM
> Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> Hi Kurt,
> 
> to keep traffic from almost full OST we usually set ost in degraded mode like 
> described in manual:
> 
>> Handling Degraded OST RAID Arrays
> 
>> To mark the OST as degraded, use:
>> lctl set_param obdfilter.{OST_name}.degraded=1
> 
> Alex.
> 
> On Jul 10, 2015, at 10:13 AM, Kurt Strosahl  wrote:
> 
>> No, I'm aware of why the ost is getting new writes... it is because I had to 
>> set the qos_threshold_rr to 100 due to 
>> https://jira.hpdd.intel.com/browse/LU-5778  (I have an ost that has to be 
>> ignored due to terrible write performance...)
>> 
>> w/r,
>> Kurt
>> 
>> - Original Message -
>> From: "Sean Brisbane" 
>> To: "Kurt Strosahl" 
>> Cc: "Patrick Farrell" , "lustre-discuss@lists.lustre.org" 
>> 
>> Sent: Friday, July 10, 2015 11:04:27 AM
>> Subject: RE: [lustre-discuss] lustre 2.5.3 ost not draining
>> 
>> Dear Kurt,
>> 
>> Apologies.  After leaving it some number of days it did *not* clean itself 
>> up, but I feel that some number of days is long enough to verify that it is 
>> a problem.
>> 
>> Sounds like you have another issue if the OST is not being marked as full 
>> and writes are not being re-allocated to other OSTS .  I also have that 
>> second issue on my system as well and I have only workarounds to offer you 
>> for the problem.
>> 
>> Thanks,
>> Sean
>> 
>> -Original Message-
>> From: Kurt Strosahl [mailto:stros...@jlab.org] 
>> Sent: 10 July 2015 16:01
>> To: Sean Brisbane
>> Cc: Patrick Farrell; lustre-discuss@lists.lustre.org
>> Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining
>> 
>> The problem there is that I cannot afford to leave it "some number of 
>> days"... it is at 97% full, so new writes are going to it faster then it can 
>> clean itself off.
>> 
>> w/r,
>> Kurt
>> 
>> - Original Message -
>> From: "Sean Brisbane" 
>> To: "Patrick Farrell" , "Kurt Strosahl" 
>> Cc: lustre-discuss@lists.lustre.org
>> Sent: Friday, July 10, 2015 10:44:39 AM
>> Subject: RE: [lustre-discuss] lustre 2.5.3 ost not draining
>> 
>> Hi,
>> 
>> The 'space not freed' issue also happened to me and I left it 'some number 
>> of days'  I don't recall how many, it was a while back.
>> 
>> Cheers,
>> Sean
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] zfs ? Re: interrupted tar archive of an mdt ldiskfs

2015-07-13 Thread Alexander I Kulyavtsev
What about zfs MDT backup/restore in lustre 2.5.3?

I took a look at the referenced manual pages - it tells nothing about zfs MDT 
backup. 
I believed we just use zfs send/receive in this case. Do I need to fix OI / FID 
mapping? 
Shall I run offline lfsck and wait???

Alex.


On Jul 13, 2015, at 2:09 PM, Henwood, Richard  wrote:

> On Mon, 2015-07-13 at 11:20 -0700, John White wrote:
>> Yea, I’m benchmarking rsync right now, it doesn’t seem much faster than the 
>> initial tar was at all.
>> 
>> Can you elaborate on the risk on 2.x systems?..  
>> 
> 
> Backing up a 2.x MDT (or OST) is described in manual for:
> 
> file level (MDT only supported since 2.3):
> https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.50438207_21638
> 
> device level:
> https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.50438207_71633
> 
> I, personally, think it does an OK job of describing the limitations of
> file and device backups - but there is always room for improvement:
> https://wiki.hpdd.intel.com/display/PUB/Making+changes+to+the+Lustre+Manual
> 
> cheers,
> Richard
> -- 
> richard.henw...@intel.com
> Tel: +1 512 410 9612
> Intel High Performance Data Division
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-13 Thread Alexander I Kulyavtsev
Hi Kurt,

The situation with "mount/unmount is necessary to trigger the cleanup" is 
similar to described at zfs bug 1548:
https://github.com/zfsonlinux/zfs/issues/1548
Reportedly it was fixed in zfs 0.6.3 ; the update to 0.6.4.1 is recommended;  
and  0.6.4.2 was recently released.
The bug is related to xattr reference count and cleanup; xattr=sa setting is 
recommended. But: it is effective for the new files. Once you created the file, 
xattr type stays.

At the ticket, the one of the failure scenarios refers to the case when the 
space is not released after unmount/mount. Search for  "None of objects X, X1 
nor X2 are freed" on bug #1548 webpage. I'm afraid you will need to transfer 
data from ost pool and reformat ost.

The entry on Jan 6 on issue 1548 suggests to drop vm caches. 

You may want to check zfs version in use, xattr setting for zpool, zfs 
(xattr=sa). What version of zfs was in use when you wrote files you can not 
delete now?
What is ashift and reported fragmentation on the zpool/zfs?

Best regards,
Alex.

On Jul 12, 2015, at 4:13 AM, Sean Brisbane  
wrote:

> Hi Kurt,
> 
> I was following the recommendation that the OST be active to allow the 
> deletion to happen, hence the reactivation followed by mount/unmount is 
> necessary to trigger the cleanup. The OST was therefore active during the 
> mount/unmount.  
> 
> Best,
> Sean
> 
> From: Kurt Strosahl [stros...@jlab.org]
> Sent: 12 July 2015 02:03
> To: Sean Brisbane
> Cc: Shawn Hall; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> Thanks,
> 
>   I'll have to see if I can run this test myself.  Did you notice if the 
> "inactive" status persisted through the unmount/remount?
> 
> w/r,
> Kurt
> 
> - Original Message -
> From: "Sean Brisbane" 
> To: "Kurt Strosahl" , "Shawn Hall" 
> Cc: lustre-discuss@lists.lustre.org
> Sent: Saturday, July 11, 2015 4:29:42 AM
> Subject: RE: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> Dear Kurt,
> 
> I have the same issue as you in that deleted files on deactivated OST could 
> not be cleaned up even after re-activation. It was on my todo list to work 
> out at some point how to get around this. I was told that an unmount/mount 
> cycle on the servers will trigger a clean-up.
> 
> I have just performed the experiment and it was in fact the MDT not the OST 
> which needed to be unmounted and re-mounted in my case.
> 
> Unmounting and remounting the OST during this process appeared to make no 
> difference either way.
> 
> All the best,
> Sean
> 
> 
> 
> From: Kurt Strosahl [stros...@jlab.org]
> Sent: 10 July 2015 19:53
> To: Shawn Hall
> Cc: Sean Brisbane; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> Yes, there are quite a few issues with lustre 2.5.3 (it would be sad if it 
> wasn't so frustrating... 1.8.x was solid).
> 
> The full osts have a higher index then the one that broke the weighted round 
> robin... plus all the ones above the most recent are exceptionally full 
> (>=80%).  I'm not sure how I'm going to go forward, I've heard that maybe an 
> unmount / mount of the osts would push a purge. I'm also compiling a list of 
> all the files on the ost... the idea being that I could then enable it, and 
> launch multiple lfs_migrates... trying to race everyone else using the file 
> system.  I think I'd have the advantage, as my moves would be targeted 
> directly to the ost, while the other writes would just land where ever they 
> could.
> 
> w/r,
> Kurt
> 
> - Original Message -
> From: "Shawn Hall" 
> To: "Kurt Strosahl" , "Sean Brisbane" 
> 
> Cc: lustre-discuss@lists.lustre.org
> Sent: Friday, July 10, 2015 11:49:06 AM
> Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining
> 
> It sounds like you have a couple of issues that are working against each 
> other then.  You’ll probably need to fight one at a time.
> 
> 
> 
> My recommendation of clearing up file system space still stands.  I don’t 
> have scientific proof, but giving Lustre more space to work with definitely 
> helps.
> 
> Does your full OST have a lower index than your slow OST?  Then you could 
> disable the slow one (and because of the bug everything above it) and let 
> space clear up on the full one.
> 
> Beyond that you might have to get creative and try something similar to 
> Tommy.  Migrate data but manually specify stripe offsets.
> 
> Shawn
> 
> On 7/10/15, 11:13 AM, "lustre-discuss on behalf of Kurt Strosahl" 
>  
> wrote:
> 
>> No, I'm aware of why the ost is getting new writes... it is because I had to 
>> set the qos_threshold_rr to 100 due to 
>> https://jira.hpdd.intel.com/browse/LU-5778  (I have an ost that has to be 
>> ignored due to terrible write performance...)
>> 
>> w/r,
>> Kurt
>> 
>> - Original Message -
>> From: "Sean Brisbane" 
>> To: "Kurt Strosahl" 
>> Cc: "Patrick Farrell" , "l

Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-14 Thread Alexander I Kulyavtsev
Since zfs on linux 0.6.4 :

[root@lfsa ~]# zpool get fragmentation,leaked zpla
NAME  PROPERTY   VALUE   SOURCE
zpla  fragmentation  0%  -
zpla  leaked 0   default

or do "get all ..." and look for fragmentation entry.

Alex.

On Jul 14, 2015, at 7:23 AM, Kurt Strosahl  wrote:

> Is there an easy way to show fragmentation?

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] FIEMAP support for Lustre

2015-08-24 Thread Alexander I Kulyavtsev
Hi Oleg,
does ZFS based lustre supports FIEMAP?

We have lustre 2.5 with zfs installed. Otherwise we will need to setup separate 
test system with ldiskfs.

But: please review separate reply, I think this can be addressed through 
multirail, NRS, file striping.

Best regards, Alex.

On Aug 24, 2015, at 11:06 AM, Drokin, Oleg  wrote:

> Hello!
> 
> On Aug 24, 2015, at 11:57 AM, Wenji Wu wrote:
> 
>> Hello, everybody,
>> 
>> I understand that ext2/3/4 support FIEMAP to get file extent mapping. 
>> 
>> Does Lustre supports similar feature like FIEMAP? Can Lustre client gets 
>> FIEMAP-like information on a Luster file system?
> 
> Yes, Lustre does support fiemap.
> You can see patched ext4progs and the filefrag included there works on top of 
> Lustre too, as an example.
> 
> lustre/tests/checkfiemap.c in the lustre source tree is another example user 
> of this functionality that you can consult.
> 
> Bye,
>Oleg
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] FIEMAP support for Lustre

2015-08-24 Thread Alexander I Kulyavtsev
Wenji,
you may take a look at 
1.3.  Lustre File System Storage and I/O 
and 
1.3.1.  Lustre File System and Striping
Commands 
lfs getstripe
lfs setstripe

Lustre Network Request Scheduler

https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.nrstuning

Lustre multirail:

http://cdn.opensfs.org/wp-content/uploads/2013/04/LUG13-Presentation-ihara-final-rev4.pdf

http://cdn.opensfs.org/wp-content/uploads/2012/12/900-930_Diego_Moreno_LUG_Bull_2011.pdf
These are actually server side. IIRC you are looking on client side.

Best regards, Alex.

On Aug 24, 2015, at 11:06 AM, Drokin, Oleg  wrote:

> Hello!
> 
> On Aug 24, 2015, at 11:57 AM, Wenji Wu wrote:
> 
>> Hello, everybody,
>> 
>> I understand that ext2/3/4 support FIEMAP to get file extent mapping. 
>> 
>> Does Lustre supports similar feature like FIEMAP? Can Lustre client gets 
>> FIEMAP-like information on a Luster file system?
> 
> Yes, Lustre does support fiemap.
> You can see patched ext4progs and the filefrag included there works on top of 
> Lustre too, as an example.
> 
> lustre/tests/checkfiemap.c in the lustre source tree is another example user 
> of this functionality that you can consult.
> 
> Bye,
>Oleg
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] free space on ldiskfs vs. zfs

2015-08-24 Thread Alexander I Kulyavtsev
Same question here.

6TB/65TB is 11% . In our case about the same fraction was "missing."

My speculation was, It may happen if at some point between zpool and linux the 
value reported in TB is interpreted as in TiB, and then converted to TB. Or  
unneeded conversion MB to MiB done twice, etc.

Here is my numbers:
We have 12* 4TB drives per pool, it is 48 TB (decimal).
zpool created as raidz2 10+2.
zpool reports  43.5T.
Pool size shall be 48T=4T*12, or 40T=4T*10 (depending what zpool shows, before 
raiding or after raiding).
>From the Oracle ZFS documentation, "zpool list" returns the total space 
>without overheads, thus 48 TB shall be reported by zpool instead of 43.5TB.

In my case, it looked like conversion error/interpretation issue between TB and 
TiB:

48*1000*1000*1000*1000/1024/1024/1024/1024 = 43.65574568510055541992


At disk level:

~/sas2ircu 0 display

Device is a Hard disk
  Enclosure # : 2
  Slot #  : 12
  SAS Address : 5003048-0-015a-a918
  State   : Ready (RDY)
  Size (in MB)/(in sectors)   : 3815447/7814037167
  Manufacturer: ATA 
  Model Number: HGST HUS724040AL
  Firmware Revision   : AA70
  Serial No   : PN2334PBJPW14T
  GUID: 5000cca23de6204b
  Protocol: SATA
  Drive Type  : SATA_HDD

One disk size is about 4 TB (decimal):

3815447*1024*1024 = 4000786153472
7814037167*512  = 4000787029504

vdev presents whole disk to zpool. There is some overhead, some space left on 
sdq9 .

[root@lfs1 scripts]# head -4 /etc/zfs/vdev_id.conf
alias s0  /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90c-lun-0
alias s1  /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90d-lun-0
alias s2  /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90e-lun-0
alias s3  /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa90f-lun-0
...
alias s12  /dev/disk/by-path/pci-:03:00.0-sas-0x50030480015aa918-lun-0
...

[root@lfs1 scripts]# ls -l  /dev/disk/by-path/
...
lrwxrwxrwx 1 root root  9 Jul 23 16:27 
pci-:03:00.0-sas-0x50030480015aa918-lun-0 -> ../../sdq
lrwxrwxrwx 1 root root 10 Jul 23 16:27 
pci-:03:00.0-sas-0x50030480015aa918-lun-0-part1 -> ../../sdq1
lrwxrwxrwx 1 root root 10 Jul 23 16:27 
pci-:03:00.0-sas-0x50030480015aa918-lun-0-part9 -> ../../sdq9

Pool report:

[root@lfs1 scripts]# zpool list
NAMESIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP  HEALTH  ALTROOT
zpla-  43.5T  10.9T  32.6T -16%24%  1.00x  ONLINE  -
zpla-0001  43.5T  11.0T  32.5T -17%25%  1.00x  ONLINE  -
zpla-0002  43.5T  10.8T  32.7T -17%24%  1.00x  ONLINE  -
[root@lfs1 scripts]# 

[root@lfs1 ~]# zpool list -v zpla-0001
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP  HEALTH  ALTROOT
zpla-0001  43.5T  11.0T  32.5T -17%25%  1.00x  ONLINE  -
  raidz2  43.5T  11.0T  32.5T -17%25%
s12  -  -  - -  -  -
s13  -  -  - -  -  -
s14  -  -  - -  -  -
s15  -  -  - -  -  -
s16  -  -  - -  -  -
s17  -  -  - -  -  -
s18  -  -  - -  -  -
s19  -  -  - -  -  -
s20  -  -  - -  -  -
s21  -  -  - -  -  -
s22  -  -  - -  -  -
s23  -  -  - -  -  -
[root@lfs1 ~]# 

[root@lfs1 ~]# zpool get all zpla-0001
NAME   PROPERTYVALUE   SOURCE
zpla-0001  size43.5T   -
zpla-0001  capacity25% -
zpla-0001  altroot -   default
zpla-0001  health  ONLINE  -
zpla-0001  guid547290297520142 default
zpla-0001  version -   default
zpla-0001  bootfs  -   default
zpla-0001  delegation  on  default
zpla-0001  autoreplace off default
zpla-0001  cachefile   -   default
zpla-0001  failmodewaitdefault
zpla-0001  listsnapshots   off default
zpla-0001  autoexpand  off default
zpla-0001  dedupditto  0   default
zpla-0001  dedupratio  1.0

Re: [lustre-discuss] free space on ldiskfs vs. zfs

2015-08-24 Thread Alexander I Kulyavtsev
Hmm,
I was assuming the question was about total space as I struggled for some time 
to understand  why do I have 99 TB total available space per OSS, after 
installing zfs lustre, while ldiskfs OSTs have 120 TB on the same hardware. The 
20% difference was partially (10%) accounted by different raid6 / raidz2 
configuration. But I was not able to explain the other 10%.

For question in original post, I can not make 24 TB from "available" field of 
df output:
207 KiB "available" on his zfs lustre,  198 KiB on ldiskfs lustre.
At the same time the difference of the total space is 
233548424256 -207693153280 = 25855270976 KiB = 24.09 TB.

Götz, could you please tell us what did you mean by "available" ?

Also,
in my case the output of linux df on OSS for the zfs pool looks strange:
zpool size reported as 25T (why?), and the formatted OST taking all space on 
this pool shows 33T:

[root@lfs1 ~]# df -h  /zpla-  /mnt/OST
Filesystem Size  Used Avail Use% Mounted on
zpla-   25T  256K   25T   1% /zpla-
zpla-/OST   33T  8.3T   25T  26% /mnt/OST
[root@lfs1 ~]# 

in bytes:

[root@lfs1 ~]# df --block-size=1  /zpla-  /mnt/OST
Filesystem 1B-blocks  Used  Available Use% Mounted on
zpla- 26769344561152262144 26769344299008   1% /zpla-
zpla-/OST 35582552834048 9093386076160 26489164660736  26% /mnt/OST

same ost reported by lustre:
[root@lfsa scripts]# lfs df 
UUID   1K-blocksUsed   Available Use% Mounted on
lfs-MDT_UUID   974961920  275328   974684544   0% /mnt/lfsa[MDT:0]
lfs-OST_UUID 34748586752  8880259840 25868324736  26% /mnt/lfsa[OST:0]
...

Compare:

[root@lfs1 ~]# zpool list
NAMESIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP  HEALTH  ALTROOT
zpla-  43.5T  10.9T  32.6T -16%24%  1.00x  ONLINE  -
zpla-0001  43.5T  11.0T  32.5T -17%25%  1.00x  ONLINE  -
zpla-0002  43.5T  10.8T  32.7T -17%24%  1.00x  ONLINE  -
I realize zfs reports raw disk space including parity blocks (48TB = 43.5 TiB); 
 and everything else (like metadata, space for xattr inodes).

I can not explain the difference 40 TB (dec.) of data space (10*4TB drives) and 
35,582,552,834,048 bytes shown by df for OST.

Best regards, Alex.

On Aug 24, 2015, at 7:52 PM, Christopher J. Morrone  wrote:

> I could be wrong, but I don't think that the original poster was asking 
> why the SIZE field of zpool list was wrong, but rather why the AVAIL 
> space in zfs list was lower than he expected.
> 
> I would find it easier to answer the question if I knew his drive count 
> and drive size.
> 
> Chris
> 
> On 08/24/2015 02:12 PM, Alexander I Kulyavtsev wrote:
>> Same question here.
>> 
>> 6TB/65TB is 11% . In our case about the same fraction was "missing."
>> 
>> My speculation was, It may happen if at some point between zpool and linux 
>> the value reported in TB is interpreted as in TiB, and then converted to TB. 
>> Or  unneeded conversion MB to MiB done twice, etc.
>> 
>> Here is my numbers:
>> We have 12* 4TB drives per pool, it is 48 TB (decimal).
>> zpool created as raidz2 10+2.
>> zpool reports  43.5T.
>> Pool size shall be 48T=4T*12, or 40T=4T*10 (depending what zpool shows, 
>> before raiding or after raiding).
>>> From the Oracle ZFS documentation, "zpool list" returns the total space 
>>> without overheads, thus 48 TB shall be reported by zpool instead of 43.5TB.
>> 
>> In my case, it looked like conversion error/interpretation issue between TB 
>> and TiB:
>> 
>> 48*1000*1000*1000*1000/1024/1024/1024/1024 = 43.65574568510055541992
>> 
>> 
>> At disk level:
>> 
>> ~/sas2ircu 0 display
>> 
>> Device is a Hard disk
>>   Enclosure # : 2
>>   Slot #  : 12
>>   SAS Address : 5003048-0-015a-a918
>>   State   : Ready (RDY)
>>   Size (in MB)/(in sectors)   : 3815447/7814037167
>>   Manufacturer: ATA
>>   Model Number: HGST HUS724040AL
>>   Firmware Revision   : AA70
>>   Serial No   : PN2334PBJPW14T
>>   GUID: 5000cca23de6204b
>>   Protocol: SATA
>>   Drive Type  : SATA_HDD
>> 
>> One disk size is about 4 TB (decimal):
>> 
>> 3815447*1024*1024 = 4000786153472
>> 7814037167*512  = 4000787029504
>> 
>> vdev presents whole disk to zpool. There is some overhea

Re: [lustre-discuss] zfs and luster 2.5.3.90

2016-01-15 Thread Alexander I Kulyavtsev
Frederick,
thanks for the patch list! 
It is nice to know the patch set(s) which is/are actually running in 
production. 
We are at zfs/spl 0.6.4.1 in production for the last six months with 2.5.3 last 
GA release (Sept'14).

Is tag 2.5.3.90 considered stable?
I was cautious to use 2.5.3.90 as there can be critical patches before final 
release.

There are not many differences between Intel's 2.5.3.90 and 2.5.3-llnl, if we 
disregard ldiskfs patches which we do not use. Until there are 'collateral' 
changes in the patch which are not reflected in commit message.
We did try 2.5.3-llnl build, it was working fine with 2.5.3 clients (or 1.8.9 
only). There were client crashes when we mounted both old 1.8.9 lustre and new 
lustre 2.5.3 on the same 1.8.9 client. We need that for transitional period as 
worker nodes need to access both 2.5 and 1.8 systems during migration. Last 
Intel's 2.5.3 GA release does not have this issue (+one stability patch).
We moved most of the data to the new system and reconfigured most oss/ost to 
new 2.5.3 lustre. No "in-place" conversions. Thus the issue of compatibility 
with 1.8.9 client will be not relevant when we complete migration and upgrade 
clients.

What client version shall we use with 2.5.3 servers?
2.5.3 is obvious. 
Reportedly 2.6 client has performance improvements. I build 2.7.0 client and 
rebalanced several TB with crc checks, seems OK. 
Is 2.7.0 stable? 
Shall we look on 2.8.0 client or it is too early?

Alex.

On Jan 15, 2016, at 9:28 AM, Frederick Lefebvre 
 wrote:

> If you think you need a more recent version of ZFS, we have run Lustre 2.5.3 
> with ZFS up to 0.6.5.3 by building Lustre with patches from the following 
> jiras:
> https://jira.hpdd.intel.com/browse/LU-6152
> https://jira.hpdd.intel.com/browse/LU-6459
> https://jira.hpdd.intel.com/browse/LU-6816
> 
> Regards,
> 
> Frederick
> 
> On Fri, Jan 15, 2016 at 7:21 AM Dilger, Andreas  
> wrote:
> On 2016/01/12, 14:21, "lustre-discuss on behalf of Kurt Strosahl"
> 
> wrote:
> 
> >Hello,
> >
> >What is the highest version of zfs supported by lustre 2.5.3.90?
> 
> Looks like 0.6.3, according to the "lbuild" script's SPLZFSVER.  In the
> master branch of Lustre we now add this information into lustre/ChangeLog
> along with the kernel versions and such.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> 
> Lustre Principal Architect
> Intel High Performance Data Division
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] strange lustre issues following removal of an OST

2016-01-19 Thread Alexander I Kulyavtsev
LU-642 has similar assert message. It also reports:

Lustre: setting import flintfs-OST_UUID INACTIVE by administrator request

Do you have deactivated OSTs ?

Alex.

On Jan 19, 2016, at 4:09 PM, Kurt Strosahl 
mailto:stros...@jlab.org>> wrote:

All,

  On Monday morning we had to remove an OST due to the failure of the 
underlying zpool.  I set the lazystatfs option on the mds, and everything 
seemed to be ok.

However now we, after rebooting a node, are seeing the below errors:
Jan 19 16:59:42 ifarm1401 kernel: LustreError: 
6962:0:(sec.c:379:import_sec_validate_get()) import 8810744f7800 (NEW) with 
no sec
Jan 19 16:59:42 ifarm1401 kernel: LustreError: 
6962:0:(sec.c:379:import_sec_validate_get()) Skipped 663 previous similar 
messages

The system then kernel panics with the following...
Jan 19 17:07:43 ifarm1401 kernel: LustreError: 
7703:0:(osc_lock.c:606:osc_lock_blocking()) ASSERTION( olck->ols_lock == 
dlmlock ) failed:
Jan 19 17:07:43 ifarm1401 kernel: LustreError: 
7703:0:(osc_lock.c:606:osc_lock_blocking()) LBUG

We are running the stock 2.5.3 server and client.

w/r,
Kurt J. Strosahl
System Administrator
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Inactivated ost still showing up on the mds

2016-01-26 Thread Alexander I Kulyavtsev
Hi Kurt,
probably too late if you unlinked the files:
Did you do zfs snapshot on MDT and damaged OST before removing files?
I so, it may be possible to mount ost zfs as a regualr zfs and pull out objects 
corresponding to files.
mdt zfs snapshot to get fids.
Alex.

On Jan 22, 2016, at 7:39 AM, Kurt Strosahl  wrote:

> Good Morning,
> 
>   The real issue here is that the OST was decomissioned because the zpool on 
> which it resided died, which left about 30TB of data (and possibly several 
> million files) to be scrubbed.
> 
>   The steps I took were as follows... I set active=0 on the mds, and then set 
> lazystatfs=1 on the mds and the clients so that df commands wouldn't hang.
> 
>   I don't see in the documentation where you have to set the ost to active=0 
> on every client, did I miss that?  Also that is a marked change from 1.8, 
> where deactivating an OST just required active=0 on the mds.
> 
> w/r,
> Kurt
> 
> - Original Message -
> From: "Sean Brisbane" 
> To: "Kurt Strosahl" , "Chris Hunter" 
> Cc: lustre-discuss@lists.lustre.org
> Sent: Friday, January 22, 2016 4:33:41 AM
> Subject: RE: Inactivated ost still showing up on the mds
> 
> Dear Kurt,
> 
> Im not sure if this is exactly what you were trying to do, but when I 
> decommission an OST I also deactivate the OST on the client, which means that 
> nothing on the OST will be accessible but the filesystem will carry on 
> happily.  
> 
> lctl set_param osc.lustresystem-OST00NN-osc*.active=0
> 
> Thanks,
> Sean
> 
> From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
> Kurt Strosahl [stros...@jlab.org]
> Sent: 21 January 2016 18:09
> To: Chris Hunter
> Cc: lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] Inactivated ost still showing up on the mds
> 
> Good Afternoon Chris,
> 
>   I have already run the active=0 command on the mds, is there another step?  
> From my testing under 2.5.3 the clients will hang indefinitely without using 
> the lazystatfs=1.
> 
>   Our major issue at present is that when the OST died it had a fair amount 
> of data on in (closing in on 2M files lost), and it seems like the client 
> gets into a bad state when calls re made repeatedly to files that are lost 
> (but still have their ost index information).  As the crawl has unlinked 
> files the number of errors has dropped, as have client crashes.
> 
> w/r,
> Kurt
> 
> - Original Message -
> From: "Chris Hunter" 
> To: lustre-discuss@lists.lustre.org
> Cc: "Kurt Strosahl" 
> Sent: Thursday, January 21, 2016 12:50:03 PM
> Subject: [lustre-discuss] Inactivated ost still showing up on the mds
> 
> Hi Kurt,
> For reference when an underlying OST object is missing, this is the
> error message generated on our MDS (lustre 2.5):
>> Lustre: 12752:0:(mdd_object.c:1983:mdd_dir_page_build()) build page failed: 
>> -5!
> 
> I suspect until you update the MGS info the MDS will still connect to
> the deactive OST.
> 
> My experience is sometimes the recipe to deactivate an OST works
> flawlessly sometimes other times the clients hang on "df" command and
> timeout on file access. I guess the order which you run the commands
> (ie. client vs server) is important.
> 
> regards,
> chris hunter
> 
>> From: Kurt Strosahl 
>> To: lustre-discuss@lists.lustre.org
>> Subject: [lustre-discuss] Inactivated ost still showing up on the mds
>> 
>> All,
>> 
>>   Continuing the issues that I reported yesterday...  I found that by 
>> unlinking lost files that I was able to stop the below error from occurring, 
>> this gives me hope that systems will stop crashing once all the lost files 
>> are scrubbed.
>> 
>> LustreError: 7676:0:(sec.c:379:import_sec_validate_get()) import 
>> 880623098800 (NEW) with no sec
>> LustreError: 7971:0:(sec.c:379:import_sec_validate_get()) import 
>> 880623098800 (NEW) with no sec
>> 
>>   I do note that the inactivated ost doesn't seem to ever REALLY go away.  
>> After I removed an ost from my test system I noticed that the mds still 
>> showed it...
>> 
>> On a client hooked up to the test system...
>> client: lfs df
>> UUID   1K-blocksUsed   Available Use% Mounted on
>> testL-MDT_UUID1819458432   10112  1819446272   0% 
>> /testlustre[MDT:0]
>> testL-OST_UUID   57914433152   12672 57914418432   0% 
>> /testlustre[OST:0]
>> testL-OST0001_UUID   57914433408   12672 57914418688   0% 
>> /testlustre[OST:1]
>> testL-OST0002_UUID   57914433408   12672 57914418688   0% 
>> /testlustre[OST:2]
>> OST0003 : inactive device
>> testL-OST0004_UUID   57914436992  144896 57914290048   0% 
>> /testlustre[OST:4
>> 
>> on the mds it still shows as up when I do lctl dl:
>> mds: lctl dl | grep OST0003
>> 22 UP osp testL-OST0003-osc-MDT testL-MDT-mdtlov_UUID 5
>> 
>> So I stopped the test system, ran lctl dl again (getting no results), and 
>> restarted it.  Once the system was back up I still saw OST3 ma

[lustre-discuss] lustre 1.8.9 client with LLNL server 2.5.3 LBUG

2016-01-27 Thread Alexander I Kulyavtsev
Does anyone have experience running lustre 1.8.9 client with LLNL server 2.5.3 
(zfs)?

I was almost instantly getting LBUG related to IGIF FID assertion after the 
mount:

dsg0515 kernel: LustreError: 30899:0:(mdc_fid.c:334:fid_le_to_cpu()) 
ASSERTION(fid_is_igif(dst) || fid_ver(dst) == 0) failed: 
[0x293e75006ada:0x70f8b3:0xa721a500]
(full stack dump at the end of email).

This happened only when I tried to mount two lustre file systems (the old 1.8.9 
servers and new 2.5.3 servers) on the same client 1.8.9 during the tests last 
summer. The new 2.5.3 system was freshly formatted and few data written from 
2.5.3 client.
I would like to try llnl 2.5.3 server with 1.8.9 client again.

Apparently I'm missing something obvious.
I realize it is not supported or "tested" configuration, but we successfully 
running the similar configuration with last intel's GA release 2.5.3 server for 
more than half year with HPC clusters doing IO on both lustres and few nodes 
doing 'cp' between old and new lustres, checksumming and stats.

We still need to have double mount (1.8 and 2.5) for another month till we 
finish migration. We will need to run 1.8.9 clients for six months more. I'm 
trying to reassess if I can use 2.5.3 llnl lustre on reinstalled servers in 
this configuration.

lustre 1.8.9 (or 2.5.3) client with LLNL server 2.5.3 only - runs fine.
lustre 1.8.9 client mounting both 1.8 servers and intel's 2.5.3 servers - runs 
fine.
lustre 1.8.9 client mounting both 1.8 servers and llnl 2.5.3 servers - crash 
after mount or few operations.
I was able to make it to last longer by mounting in certain order and doing 
"ls" to few existing files, but it crashes some time later during IO.

The reported FIDs looks real, but also
[0xdead00100100 :0x200200 :0xdead]
[0x5a5a5a5a5a5a5a5a :0x5a5a5a5a :0x5a5a5a5a]

which corresponds to
CONFIG_ILLEGAL_POINTER_VALUE
# define LI_POISON ((int)0x5a5a5a5a)or like

I tried to compare branches 2.5.3-llnl and whamcloud branch 2_5 tag 2.5.3, and 
also tag 2.5.3.90 .
I did not find commit messages related to IGIF FID in commits which differ, 
though I guess there can be code change not related to commit message in the 
patch I missed.

I would appreciate any hints were to look to make it work and what is the 
difference causing this LBUG.

Thank in advance, Alex.


Jun  1 15:01:10 dsg0515 kernel: LustreError: 
4541:0:(mdc_fid.c:334:fid_le_to_cpu()) ASSERTION(fid_is_igif(dst) || 
fid_ver(dst) == 0) failed: [0x60005:0x7:0x]

Jun  1 15:01:10 dsg0515 kernel: LustreError: 
4541:0:(mdc_fid.c:334:fid_le_to_cpu()) LBUG

Jun  1 15:01:10 dsg0515 kernel: Pid: 4541, comm: ls

Jun  1 15:01:10 dsg0515 kernel:

Jun  1 15:01:10 dsg0515 kernel: Call Trace:

Jun  1 15:01:10 dsg0515 kernel:  [] ? 
generic_permission+0x24/0xc0

Jun  1 15:01:10 dsg0515 kernel:  [] 
libcfs_debug_dumpstack+0x57/0x80 [libcfs]

Jun  1 15:01:10 dsg0515 kernel:  [] lbug_with_loc+0x76/0xe0 
[libcfs]

Jun  1 15:01:10 dsg0515 kernel:  [] fid_le_to_cpu+0xa5/0xb0 
[mdc]

Jun  1 15:01:10 dsg0515 kernel:  [] ll_readdir+0x935/0xb00 
[lustre]

Jun  1 15:01:10 dsg0515 kernel:  [] ? 
nameidata_to_filp+0x57/0x70

Jun  1 15:01:10 dsg0515 kernel:  [] ? 
__inc_zone_state+0x9/0x70

Jun  1 15:01:10 dsg0515 kernel:  [] ? __lru_cache_add+0x9/0x70

Jun  1 15:01:10 dsg0515 kernel:  [] ? 
lru_cache_add_lru+0x19/0x40

Jun  1 15:01:10 dsg0515 kernel:  [] ? filldir+0x0/0xf0

Jun  1 15:01:10 dsg0515 kernel:  [] ? filldir+0x0/0xf0

Jun  1 15:01:10 dsg0515 kernel:  [] vfs_readdir+0xac/0xd0

Jun  1 15:01:10 dsg0515 kernel:  [] sys_getdents+0x86/0xe0

Jun  1 15:01:10 dsg0515 kernel:  [] ? page_fault+0x1f/0x30

Jun  1 15:01:10 dsg0515 kernel:  [] 
system_call_fastpath+0x16/0x1b

Jun  1 15:01:10 dsg0515 kernel:

Jun  1 15:01:10 dsg0515 kernel: LustreError: dumping log to 
/tmp/lustre-log.1433188870.4541




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Questions about migrate OSTs from ldiskfs to zfs

2016-03-01 Thread Alexander I Kulyavtsev
Please see inlined.

On Feb 26, 2016, at 6:36 PM, Dilger, Andreas 
mailto:andreas.dil...@intel.com>> wrote:

On Feb 23, 2016, at 05:24, Fernando Perez 
mailto:fpe...@icm.csic.es>> wrote:

Hi all.
... snip...

- Dou you recommend to do a lustre update before replace the OSTs by the new 
zfs OSTs?

Lustre 2.4.1 is very old.  It makes sense to use a newer version than this, 
especially if you are using ZFS.

However, it is generally bad sysadmin practice to do major hardware and 
software updates at the same time, since it becomes very difficult to isolate 
any problems that appear afterward. I would recommend to upgrade to a newer 
version of Lustre (2.5.3, or what is recommended from your support provider) at 
least on the servers and run that for a week or two before doing the hardware 
upgrade.
is tag 2.5.3.90 considered stable?


- I have read in the list that there are problem with the last zfsonlinux 
release and lustre only works with zfsonlinux 0.6.3. Is this right?

Newer versions of Lustre work well with ZFS 0.6.4.3, I don't remember anymore 
what ZFS versions were tested with 2.4.1. We had some problems under heavy load 
with 0.6.5.3 and have moved back to 0.6.4.3 for the Lustre 2.8.0 release until 
that is fixed.
We are 0.6.4.2, the last before 0.6.5.1.
What is zfs 0.6.4.3 ?

Cheers, Andreas

Best regards, Alex.

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Questions about migrate OSTs from ldiskfs to zfs

2016-03-01 Thread Alexander I Kulyavtsev
Is there way to run 2.5.3-llnl server with 1.8.9 client?
Or, do you have a hint what can be a problem causing LBUG when mounting two 
lustres (old 1.8.8 and new 2.5.3) from 1.8.9 client? No ldiskfs on new system, 
ZFS only.

I'm copying my previous posting at the end of the mail for specific error. Is 
it related to FID format changes and FID namespace ranges in 1.8 / 2.4  / 2.5 ?

Thanks, Alex.

Subject: lustre 1.8.9 client with LLNL server 2.5.3  LBUG
Date: Wed, 27 Jan 2016 15:29:24 -0600

Does anyone have experience running lustre 1.8.9 client with LLNL server 2.5.3 
(zfs)?

I was almost instantly getting LBUG related to IGIF FID assertion after the 
mount:

dsg0515 kernel: LustreError: 30899:0:(mdc_fid.c:334:fid_le_to_cpu()) 
ASSERTION(fid_is_igif(dst) || fid_ver(dst) == 0) failed: 
[0x293e75006ada:0x70f8b3:0xa721a500]
(full stack dump at the end of email).

This happened only when I tried to mount two lustre file systems (the old 1.8.9 
servers and new 2.5.3 servers) on the same client 1.8.9 during the tests last 
summer. The new 2.5.3 system was freshly formatted and few data written from 
2.5.3 client.
I would like to try llnl 2.5.3 server with 1.8.9 client again.

Apparently I'm missing something obvious.
I realize it is not supported or "tested" configuration, but we successfully 
running the similar configuration with last intel's GA release 2.5.3 server for 
more than half year with HPC clusters doing IO on both lustres and few nodes 
doing 'cp' between old and new lustres, checksumming and stats.

We still need to have double mount (1.8 and 2.5) for another month till we 
finish migration. We will need to run 1.8.9 clients for six months more. I'm 
trying to reassess if I can use 2.5.3 llnl lustre on reinstalled servers in 
this configuration.

lustre 1.8.9 (or 2.5.3) client with LLNL server 2.5.3 only - runs fine.
lustre 1.8.9 client mounting both 1.8 servers and intel's 2.5.3 servers - runs 
fine.
lustre 1.8.9 client mounting both 1.8 servers and llnl 2.5.3 servers - crash 
after mount or few operations.
I was able to make it to last longer by mounting in certain order and doing 
"ls" to few existing files, but it crashes some time later during IO.

The reported FIDs looks real, but also
[0xdead00100100 :0x200200 :0xdead]
[0x5a5a5a5a5a5a5a5a :0x5a5a5a5a :0x5a5a5a5a]

which corresponds to
CONFIG_ILLEGAL_POINTER_VALUE
# define LI_POISON ((int)0x5a5a5a5a)or like

I tried to compare branches 2.5.3-llnl and whamcloud branch 2_5 tag 2.5.3, and 
also tag 2.5.3.90 .
I did not find commit messages related to IGIF FID in commits which differ, 
though I guess there can be code change not related to commit message in the 
patch I missed.

I would appreciate any hints were to look to make it work and what is the 
difference causing this LBUG.

Thank in advance, Alex.


Jun  1 15:01:10 dsg0515 kernel: LustreError: 
4541:0:(mdc_fid.c:334:fid_le_to_cpu()) ASSERTION(fid_is_igif(dst) || 
fid_ver(dst) == 0) failed: [0x60005:0x7:0x]

Jun  1 15:01:10 dsg0515 kernel: LustreError: 
4541:0:(mdc_fid.c:334:fid_le_to_cpu()) LBUG

Jun  1 15:01:10 dsg0515 kernel: Pid: 4541, comm: ls

Jun  1 15:01:10 dsg0515 kernel:

Jun  1 15:01:10 dsg0515 kernel: Call Trace:

Jun  1 15:01:10 dsg0515 kernel:  [] ? 
generic_permission+0x24/0xc0

Jun  1 15:01:10 dsg0515 kernel:  [] 
libcfs_debug_dumpstack+0x57/0x80 [libcfs]

Jun  1 15:01:10 dsg0515 kernel:  [] lbug_with_loc+0x76/0xe0 
[libcfs]

Jun  1 15:01:10 dsg0515 kernel:  [] fid_le_to_cpu+0xa5/0xb0 
[mdc]

Jun  1 15:01:10 dsg0515 kernel:  [] ll_readdir+0x935/0xb00 
[lustre]

Jun  1 15:01:10 dsg0515 kernel:  [] ? 
nameidata_to_filp+0x57/0x70

Jun  1 15:01:10 dsg0515 kernel:  [] ? 
__inc_zone_state+0x9/0x70

Jun  1 15:01:10 dsg0515 kernel:  [] ? __lru_cache_add+0x9/0x70

Jun  1 15:01:10 dsg0515 kernel:  [] ? 
lru_cache_add_lru+0x19/0x40

Jun  1 15:01:10 dsg0515 kernel:  [] ? filldir+0x0/0xf0

Jun  1 15:01:10 dsg0515 kernel:  [] ? filldir+0x0/0xf0

Jun  1 15:01:10 dsg0515 kernel:  [] vfs_readdir+0xac/0xd0

Jun  1 15:01:10 dsg0515 kernel:  [] sys_getdents+0x86/0xe0

Jun  1 15:01:10 dsg0515 kernel:  [] ? page_fault+0x1f/0x30

Jun  1 15:01:10 dsg0515 kernel:  [] 
system_call_fastpath+0x16/0x1b

Jun  1 15:01:10 dsg0515 kernel:

Jun  1 15:01:10 dsg0515 kernel: LustreError: dumping log to 
/tmp/lustre-log.1433188870.4541


On Mar 1, 2016, at 3:44 PM, Drokin, Oleg 
mailto:oleg.dro...@intel.com>> wrote:


On Mar 1, 2016, at 4:14 PM, Christopher J. Morrone wrote:

On 03/01/2016 09:18 AM, Alexander I Kulyavtsev wrote:

is tag 2.5.3.90 considered stable?

No.  Generally speaking you do not want to use anything with number 50
or greater for the fourth number unless you are helping out with testing
during the development process.

I think you are mixing up things and it is the 3rd number at 50 or above
that is the development code.


2.5.3 was the last official release on branch b2_5 before it was
discontinued

Re: [lustre-discuss] Error on a zpool underlying an OST

2016-03-11 Thread Alexander I Kulyavtsev

You lost one "file" only:
0x2c90f

I wold take zfs snapshot on ost, mount it as zfs and try to find lustre FID of 
the file.

If that does not work, I guess zdb with high verbosity level can help to 
pinpoint broken zfs object, like in
"zdb: Examining ZFS At Point-Blank Range," and see what it is (plain zfs file 
or else).

Knowing zfs version can be helpful.

Alex

On Mar 11, 2016, at 7:19 PM, Bob Ball mailto:b...@umich.edu>> 
wrote:

errors: Permanent errors have been detected in the following files:

   ost-007/ost0030:<0x2c90f>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre 2.8.0 released

2016-03-19 Thread Alexander I Kulyavtsev
You do not need to rebuild the kernel for "pure" zfs system. Few server kernel 
patches are for ldiskfs optimizations. You still need to rebuild zfs, lustre 
server and/or lustre client.

Client nodes may have different versions of kernels. You need to rebuild client 
for specific kernel version of client.  
We use(d) mix of Scientific Linux(5) 6 on head nodes and also SLF(5)6 with 
rebuilt upstream kernel on worker nodes.

Alex.

From: lustre-discuss  on behalf of 
Patrick Farrell 
Sent: Friday, March 18, 2016 10:34 AM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre 2.8.0 released

Lustre servers should, generally, be kept on exactly the supported
kernel as they patch the kernel and it's dicey whether or not those
patches will work on a newer kernel.  You're welcome to try it, of course.

Lustre clients are not intended to have kernel version dependencies and
in general can be built for any kernel version you'd like. Because some
interfaces are changed from time to time, this compatibility isn't
absolute.  But Lustre 2.8.0 should be buildable for essentially any
kernel in the last few years, unless something has changed from recent
history.  (There is also no need to build your own kernel in that case.
Just get the kernel-devel package for your kernel and build the Lustre
client against that.)

- Patrick

On 03/18/2016 10:30 AM, Jon Tegner wrote:
> This is great!
>
> Will start working with this new version as soon as possible.
>
> At this point I have a few questions:
>
> the lustre server kernel is based on
>
> kernel-3.10.0-327.3.1.el7.x86_64.rpm
>
> surely this means that the clients should be on the same version of
> the kernel (i.e., 3.1, but the standard one)?
>
> And if I want to use a later kernel I have to build it from source
> (something I have done on one of the release candidates, but didn't
> have time to test)?
>
> Would it actually be recommended to build lustre on a newer kernel (if
> you don't have any general issues with this older kernel)?
>
> Regards, and thanks again
>
> On 03/16/2016 11:13 PM, Jones, Peter A wrote:
>> We are pleased to announce that the Lustre 2.8.0 Release has been
>> declared GA and is available
>> for
>> download 
>> 
>> . You can also grab the source
>> from
>> git
>>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Rebuild server

2016-03-19 Thread Alexander I Kulyavtsev
Here are configuration files we keep in git in addition to scripts for lustre 
install and mdt/ost formatting.

/etc/ldev.conf  -- this file is common for all 
servers
/etc/sysconfig/lustre  -- common for all servers; changed zfs 
mountpoint

/etc/modprobe.d/lustre.conf   -- and/or other modprobe.d files with 
lustre/lnet/lnd drivers configuration; likely common

/etc/zfs/vdev_id.conf-- zfs
/etc/zfs/zfswatcher.conf-- zfs monitoring;  "from ="

Alex.


From: lustre-discuss  on behalf of 
Peter Bortas 
Sent: Wednesday, March 16, 2016 9:06 AM
To: Jon Tegner
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Rebuild server

Hi Jon,

Just for extra reinsurance from the real world: NSC reinstalls the
same system image on all MDSs and OSSs every time we reboot our
servers. So there is no special magic in the OS that needs to be
preserved with the exception of the /etc/ldev.conf file. In our case
we have fixed that difference between servers by having an init script
that knows how to recreate it.

Regards,
--
Peter Bortas
National Supercomputer Centre
Sweden


On Fri, Mar 11, 2016 at 12:41 PM, Jon Tegner  wrote:
> Thanks! Much appreciated!
>
> Was quite stressed when I noticed the server was down (data is backed up,
> but still). Our servers are managed/provisioned by kickstart and saltstack -
> so it should be easy to bring up new ones with the same configuration.
>
> Thanks again,
>
> /jon
>
> On 03/11/2016 07:05 AM, Cowe, Malcolm J wrote:
>>
>> So, in summary: rebuild the root disks (maybe use a provisioning system
>> like kickstart for repeatability), restore the network config, restore LNet
>> config, maybe restore the HA software, restore the identity management (e.g.
>> LDAP, passwd, group) then mount the storage as before.
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Issue with installing zfs on lustre

2016-05-03 Thread Alexander I Kulyavtsev
Install zfs and spl from zfsonlinux.org

Alex

On May 4, 2016, at 12:40 AM, sohamm mailto:soh...@gmail.com>> 
wrote:

Downloading packages:
kmod-zfs-3.10.0-327.13.1.el7_l FAILED
http://build.hpdd.intel.com/job/lustre-master/arch=x86_64%2Cbuild_type=server%2Cdistro=el7%2Cib_stack=inkernel/lastSuccessfulBuild/artifact/artifacts/RPMS/x86_64/kmod-zfs-3.10.0-327.13.1.el7_lustre.x86_64-0.6.4.2-1.el7.x86_64.rpm:
 [Errno 14] HTTPS Error 401 - Unauthorized
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] stripe count recommendation, and proposal for auto-stripe tool

2016-05-19 Thread Alexander I Kulyavtsev
"1) More even space usage across OSTs (mostly relevant for *really* big files, 
..."


When OSTs are almost full and user writes large file it can overfill the OSTs. 
Having file OSTs striped over several OSTs somewhat mitigates this issue.


2) bandwidth ...

It is better to benchmark the application.


There is third aspect ("write multiplication effect") of wide striping we faced 
few times when retiring/emptying OSTs.

Say, I need to empty out 10 TB out of retiring or failed OST, and files have 
striping N=16. I got the list of files on this OSTs to migrate. The total size 
I need to rebalance is N times bigger, 160 TB in this example.


This can be addressed by (a) copying out selected stripes of the files and 
leaving other stripes sparse in new file, then (b) switching layout of  
selected stripes. Both (a) and (b) not implemented.


Best regards, Alex.


From: lustre-discuss  on behalf of 
Patrick Farrell 
Sent: Wednesday, May 18, 2016 2:22:11 PM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] stripe count recommendation, and proposal for 
auto-stripe tool

Nathan,

This *is* excellent fodder for discussion.

A few thoughts from a developer perspective.  When you stripe a file to 
multiple OSTs, you're spreading the data out across multiple targets, which (to 
my mind) has two purposes:
1) More even space usage across OSTs (mostly relevant for *really* big files, 
since in general, singly striped files are distributed across OSTs anyway)
2) Better bandwidth/parallelism for accesses to the file.

The first one lends itself well to a file size based heuristic, but I'm not 
sure the second one does.  That's more about access patterns.  I'm not sure 
that you see much bandwidth benefit from striping with a single client, at 
least as long as an individual OST is fast relative to a client (increasingly 
common, I think, with flash and larger RAID arrays).  So then, whatever the 
file size, if it's accessed from one client, it should probably be single 
striped.

Also, for shared files, client count relative to stripe count has a huge impact 
on write performance.  Assuming strided I/O patterns, anything more than 1 
client per stripe/OST is actually worse than 1 client.  (See my lock ahead 
presentation at LUG'15 for more on this.)  Read performance doesn't share this 
weirdness, though.

All that's to say that for case 2 above, at least for writing, it's access 
pattern/access parallelism, not size, which matters.  I'm sure there's some 
correlation between file size and how parallel the access pattern is, but it 
might be very loose, and at least write performance doesn't scale linearly with 
stripe size.  Instead, the behavior is complex.

So in order to pick an ideal striping with case 2 in mind, you really need to 
understand the application access pattern.  I can't see another way to do that 
goal justice.  (The Lustre ADIO in the MPI I/O library does this, partly by 
controlling the I/O pattern through I/O aggregation for collective I/Os.)

So I think your tool can definitely help with case 1, not so sure about case 2.

- Patrick

On 05/18/2016 12:22 PM, Nathan Dauchy - NOAA Affiliate wrote:
Greetings All,

I'm looking for your experience and perhaps some lively discussion regarding 
"best practices" for choosing a file stripe count.  The Lustre manual has good 
tips on "Choosing a Stripe Size", and in practice the default 1M rarely causes 
problems on our systems. Stripe Count on the other hand is far more difficult 
to chose a single value that is efficient for a general purpose and multi-use 
site-wide file system.

Since there is the "increased overhead" of striping, and weather applications 
do unfortunately write MANY tiny files, we usually keep the filesystem default 
stripe count at 1.  Unfortunately, there are several users who then write very 
large and shared-access files with that default.  I would like to be able to 
tell them to restripe... but without digging into the specific application and 
access pattern it is hard to know what count to recommend.  Plus there is the 
"stripe these but not those" confusion... it is common for users to have a few 
very large data files and many small log or output image files in the SAME 
directory.

What do you all recommend as a reasonable rule of thumb that works for "most" 
user's needs, where stripe count can be determined based only on static data 
attributes (such as file size)?  I have heard a "stripe per GB" idea, but some 
have said that escalates to too many stripes too fast.  ORNL has a knowledge 
base article that says use a stripe count of "File size / 100 GB", but does 
that make sense for smaller, non-DOE sites?  Would stripe count = 
Log2(size_in_GB)+1 be more generally reasonable?  For a 1 TB file, that 
actually works out to be similar to ORNL, only gets there more gradually:
https://www.olcf.ornl.gov/kb_articles/lustre-basics/#Stripe_Count

Ideally, I would like to have a t

Re: [lustre-discuss] lnet router lustre rpm compatibility

2016-06-20 Thread Alexander I Kulyavtsev
On Jun 20, 2016, at 4:00 PM, Jessica Otey  wrote:

> All,
> I am in the process of preparing to upgrade a production lustre system 
> running 1.8.9 to 2.4.3.
I would like to know router compatibility matrix too and have it published 
together with client/server/IB compatibility matrix.
It will be nice to have separate lnet rpm, more precisely the set of rpms to be 
installed on routers.

Having said that, we migrated petabyte size 1.8.9 lustre file system wtih ~150 
million files to two 2.5.3 systems (2.5.3+two-three patches). Essentially 
migration was "cp ..." between two systems mounted with 1.8 client. Hardware 
was moved from old to new system as space was released and rebalanced. This 
holistic approach took about an o(year) on live system without shutdowns. 
We did not risk to do upgrade in place so I can not comment on that. 

There is no "official" recommendation on 1.8 client/2.5 server compatibility so 
I can not tell if this migration path will work or is safe for you.
You may search reports to this list, JLAB hit issues with hardware during 
similar migration and IIRC they were need to disable OST on new system; this 
lead to space disbalance on new system (data not written to OST > broken ost).

We are still running 1.8.9 clients and we will switch them to 2.5 or later 
version after the last dataset is moved and we have chance to pause production. 
At some point I set up up 2.8.0 client on one of the nodes and used it to 
rebalance OSTs on 2.5.3 system. We do checksumming, it looks like things went 
well.
Servers run intel GA 2.5.3 with few patches; LLNL's release 2.5.3  gave edif 
errors on 1.8.9 client mounting both 1.8 and 2.5.
OSTs, MDTs - all zfs. 

> This current system has 2 lnet routers.
> Our plan is to perform the upgrade in 2 stages:
> 1) Upgrade the MDS and OSSes to 2.4.3, leaving clients in 1.8.9.
> 2) Upgrade all clients to 2.4.3 (thus allowing the entire system to be 
> subsequently updated to 2.5 and beyond)
> 
> The problem is that I can't seem to locate any information about the version 
> requirements (if any) for lnet routers. Does the lnet router have to be the 
> same as mds, or the same as the clients? Or both?
We still run 1.8.9 routers  between 2.5.3 servers and 1.8.9 clients, will 
upgrade routers together with clients.
We use routers to access lustre from cluster in remote room through ethernet 
over fibers. 

The first thing you may face: take a look on privileged port setting, the 
default is different in 2.5.3
options ko2iblnd require_privileged_port=0
options ko2iblnd use_privileged_port=0

You may need to change default # of credits/buffers. Defaults are different in 
1.8 and 2.5 but shall be the same on same lnet.

Alex.
> 
> Any light shed on this would be most helpful.
> 
> Thanks,
> Jessica
> 
> -- 
> Jessica Otey
> System Administrator II
> North American ALMA Science Center (NAASC)
> National Radio Astronomy Observatory (NRAO)
> Charlottesville, Virginia (USA)
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lnet router lustre rpm compatibility

2016-06-22 Thread Alexander I Kulyavtsev
Servers upgraded first or with clients. Alex.

On Jun 22, 2016, at 11:01 AM, E.S. Rosenberg 
mailto:esr+lus...@mail.hebrew.edu>> wrote:

I always understood the recommendation was to update the clients (and LNET 
Routers) before the servers and not the other way around?

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ZFS backed OSS out of memory

2016-06-23 Thread Alexander I Kulyavtsev
1) https://github.com/zfsonlinux/zfs/issues/2581
suggests few things to monitor in /proc .  Searching for OOM at 
https://github.com/zfsonlinux/zfs/issues gives more hints where to look.

I guess OOM is not necessarily caused by zfs/spl.
Do you have lustre mounted on OSS and some process writing to it? (memory 
pressure).

2)
> http://lustre.ornl.gov/ecosystem-2016/documents/tutorials/Stearman-LLNL-ZFS.pdf
Last three pages.
2a) it may worth to set at /etc/modprobe.d/zfs.conf 
   options zfs zfs_prefetch_disable=1

2b) did you set metaslab_debug_unload ? It increases memory consumption.

Can you correlate OOM with some type of activity (read; write; scrub; snapshot 
delete)?
Do you actually re-read same data? ARC helps to the second read. 
Having 64GB in memory ARC seems a lot together with L2ARC on SSD.
lustre does not use zfs slog IIRC.

3) do you have option to upgrade zfs?

4) you may setup monitoring and feed zfs and lustre stats to influxdb 
(monitoring node) with telegraf (OSS). Both at influxdata.com. I have DB on 
SSD. Plot data with grafana, or query directly from influxdb. 
> # fgrep plugins /etc/opt/telegraf/telegraf.conf
> ...
> [plugins]
> [[plugins.cpu]]
> [[plugins.disk]]
> [[plugins.io]]
> [[plugins.mem]]
> [[plugins.swap]]
> [[plugins.system]]
> [[plugins.zfs]]
> [[plugins.lustre2]]


5) drop caches  with echo 3 > /proc/sys/vm/drop_caches .  If it helps add to 
cron to avoid OOM kills.

Alex.

> Folks,
> 
> I've done my fair share of googling and run across some good information on 
> ZFS backed Lustre tuning including this:
> 
> http://lustre.ornl.gov/ecosystem-2016/documents/tutorials/Stearman-LLNL-ZFS.pdf
> 
> and various discussions around how to limit (or not) the ARC and clear it if 
> needed.
> 
> That being said, here is my configuration.
> 
> RHEL 6 
> Kernel 2.6.32-504.3.3.el6.x86_64
> ZFS 0.6.3
> Lustre 2.5.3 with a couple of patches
> Single OST per OSS with 4 x RAIDZ2 4TB SAS drives
> Log and Cache on separate SSDs
> These OSSes are beefy with 128GB of memory and Dual E5-2630 v2 CPUs
> 
> About 30 OSSes in all serving mostly a standard HPC cluster over FDR IB with 
> a sprinkle of 10G
> 
> # more /etc/modprobe.d/lustre.conf
> options lnet networks=o2ib9,tcp9(eth0)
> 
> ZFS backed MDS with same software stack.
> 
> The problem I am having is the OOM killer is whacking away at system 
> processes on a few of the OSSes. 
> 
> "top" shows all my memory is in use with very little Cache or Buffer usage.
> 
> Tasks: 1429 total,   5 running, 1424 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0%us,  2.9%sy,  0.0%ni, 94.0%id,  3.1%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  132270088k total, 131370888k used,   899200k free, 1828k buffers
> Swap: 61407100k total, 7940k used, 61399160k free,10488k cached
> 
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>   47 root  RT   0 000 S 30.0  0.0 372:57.33 migration/11
> 
> I had done zero tuning so I am getting the default ARC size of 1/2 the memory.
> 
> [root@lzfs18b ~]# arcstat.py 1
>time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
> 09:11:50 0 0  0 00 00 0063G   63G
> 09:11:51  6.2K  2.6K 41   2066  2.4K   71 0063G   63G
> 09:11:52   21K  4.0K 18   3052  3.7K   3418063G   63G
> 
> The question is, if I have 128GB of RAM and ARC is only taking 63, where did 
> the rest go and how can I get it back so that the OOM killer stops killing me?
> 
> Thanks!
> 
> Tim
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] MDT 100% full

2016-07-26 Thread Alexander I Kulyavtsev
Brian,
Do you have zfs 'frozen' ? It can lock when you have zero bytes left and you 
can not do much with zfs after.
You will need to remove file on zfs itself or remove snapshot to unfreeze zfs, 
so do not wait until it filled completely.
To avoid deleting mdt objects after zfs locking, I create few zfs files with 
known names in extra zfs system on the same pool, and also do space reservation.

I had had large space consumed per inode on test system last summer, about 10KB 
per inode.
I formatted ssds with ashift=12. I thought zpool ashift=12 is better fit for 4K 
sector size SSD.

The issue was resolved when I set ashift=9 (512 sector emulated), and at the 
same time I set xattr=sa (it was not default then).

Right now I have 82M files taking 126 GB on production system (with thousands 
of snapshots), or 1.5 KB per "inode." With ashift=12 it was about 10KB IIRC. 
This can be reproduced with "createmany."

Here is the plot I got during tests with ashift=9.
You can see the bin with n=11 is heavy populated (2**11=2048). On similar plot 
for zpool formatted with ashift=12 the distribution starts at n=12.
That is a lot of objects with size <= 2048 will be created as 4K objects for 
zpool with ashift=12 leading to larger space consumed on lustre MDT  per lustre 
file (OI objects, hidden attributes,...)


zdb -M zpl


vdev  0 metaslabs  119  fragmentation  5%

  9:  21553 *

 10: 186445 **

 11: 1337626 

 12: 603282 ***

 13: 235246 

 14: 104615 

 15:  50365 **

 16:  22656 *

 17:   9997 *

 18:   3482 *

 19:   1198 *

 20:481 *

Alex.


On Jul 26, 2016, at 7:57 PM, Rick Wagner 
mailto:rpwag...@sdsc.edu>> wrote:

Hi Brian,

On Jul 26, 2016, at 5:45 PM, Andrus, Brian Contractor 
mailto:bdand...@nps.edu>> wrote:

All,

Ok, I thought 100GB would be sufficient for an MDT.
I have 2 MDTs as well, BUT…

MDT0 is 100% full and now I cannot write anything to my lustre filesystem.
The MDT is on a ZFS backing filesystem.

So, what is the proper way to grow my MDT using ZFS? Do I need to shut the 
filesystem down completely? Can I just add a disk or space to the pool and 
Lustre will see it?

Any advice or direction is appreciated.

We just did this successfully on the two MDTs backing one of our Lustre file 
systems and everything happened at the ZFS layer. We added drives to the pool 
and Lustre immediately saw the additional capacity. Whether you take down the 
file system or do it live is a question of your architecture, skills, and 
confidence. Having test file system is also worthwhile to go over the steps.

--Rick




Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] client server communication half-lost, read-out?

2016-08-01 Thread Alexander I Kulyavtsev
I used this:

# lctl get_param printk

lnet.printk=warning error emerg console

# lctl set_param printk=+neterror

(debug)

# lctl set_param printk=-neterror


Take a look at  "Diagnostic and Debugging Tools" chapter at lustre manual.

# lctl debug_list subs
Subsystems: all_subs, undefined, mdc, mds, osc, ost, class, log, llite, rpc, 
mgmt, lnet, lnd, pinger, filter, echo, ldlm, lov, lquota, osd, lfsck, lmv, sec, 
gss, mgc, mgs, fid, fld

# lctl debug_list types
Types: all_types, trace, inode, super, ext2, malloc, cache, info, ioctl, 
neterror, net, warning, buffs, other, dentry, nettrace, page, dlmtrace, error, 
emerg, ha, rpctrace, vfstrace, reada, mmap, config, console, quota, sec, lfsck, 
hsm

You may take a look at neterror, net.

The list of parameters (it is long, running on oss):
# lctl list_param -R | less
# lctl list_param -R | wc -l
error: list_param: /proc/fs/lustre/llite/*: Found no match
error: list_param: /proc/fs/lustre/qmt/*: Found no match
15761



Alex.


On Aug 1, 2016, at 11:28 AM, Thomas Roth  wrote:

> Hi all,
> 
> we are suffering from "temporarily unavailable" OSTs, where "temporarily" is 
> a slight understatement, should read "permanently".
> 
> Evicting the client on the OSS manually or deactivating+activating the OSTs 
> in question works without problems.
> 
> However, the clients are never evicted on Lustre's own. Obviously, the 
> communication between client and OST is not dead entirely.
> 
> 
> Is there a way to read out the communication between such an osc - obdfilter 
> pair? As the network people do with wireshark et al.?
> 
> Cheers,
> Thomas
> 
> -- 
> 
> Thomas Roth
> Department: Informationstechnologie
> Location: SB3 1.250
> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
> 
> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> Planckstraße 1
> 64291 Darmstadt
> www.gsi.de
> 
> Gesellschaft mit beschränkter Haftung
> Sitz der Gesellschaft: Darmstadt
> Handelsregister: Amtsgericht Darmstadt, HRB 1528
> 
> Geschäftsführung: Ursula Weyrich
> Professor Dr. Karlheinz Langanke
> Jörg Blaurock
> 
> Vorsitzende des Aufsichtsrates: St Dr. Georg Schütte
> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] two lustre fs on same lnet was: Re: lustre clients cannot access different OSS groups on TCP and infiniband at same time

2016-08-03 Thread Alexander I Kulyavtsev
Hi Andreas,
the network names need to be unique if the same clients are connecting to both 
filesystems.
What are complication having two lustre filesystems on the same lnet on the 
same IB? Does it have performance impact (broadcasts, credits, buffers)?

We have two (three) lustre fs facing clusters on the same lnet. I'm thinking if 
I need to change that - we have service window right now.

Initially I set up separate lnets for each lustre but as we were doing ethernet 
'bridge" lnet1(ib)-rtr-eth-rtr-lnet2(ib) to remote cluster between IB networks 
the routing
get kind of complicated. As a practical matter we were able to move 1PB of data 
between two lustres plus IO from/to compute cluster in this configuration.

We have one lnet per IB fabric:

 router(eth) -- tcp11...tcp14 -- (eth)routers - o2ib2 -- cluster2
 |
   +-- lustre1 +
 |   |
cluster0 -{o2ib0} -- lustre2 --{o2ib1} - cluster1
 |   |
 +-- lustre3 +



Right now we are merging clusters 1 and 2 and retire lustre1;
It can be good time to reconsider and split lnets like o2ib0 -> 
(o2ib20,oi2ib30) and o2ib1->(o2ib30,o2ib31).

What would be a reason for such lnet split?

Alex.

On Jul 13, 2016, at 8:15 PM, Dilger, Andreas 
mailto:andreas.dil...@intel.com>> wrote:

It sounds like you have two different filesystems, each using the same LNet 
networks "tcp0" and "o2ib0".  While "tcp" is a shorthand for network "tcp0", 
the network names need to be unique if the same clients are connecting to both 
filesystems.  One of the filesystems will need to regenerate the configuration 
to use "tcp1" and "o2ib1" (or whatever) to allow the clients to distinguish 
between the different networks.

Cheers, Andreas

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ZFS not freeing disk space

2016-08-10 Thread Alexander I Kulyavtsev
It can be zfs snapshot holding space on ost.

Or, it can be zfs issue, zfs not releasing space until reboot. Check zfs bugs 
on zfs wiki.
Lustre shows change in OST used space right away.

zfs 0.6.3 is pretty old. We are using 0.6.4.1  with lustre 2.5.3. (there is  
zfs 0.6.4.2)
You may need to patch lustre 2.5.3 to go with some zfs 0.6.5.x;  patches were 
listed on this mail list. 
Alex.

On Aug 10, 2016, at 12:57 PM, Thomas Roth  wrote:

> Hi all,
> 
> one of our ((Lustre 2.5.3, ZFS 0.6.3) OSTs got filled up to >90%, so I 
> deactivated it and am now migrating files off of that OST.
> 
> Checking the list of files I am currently using, I can verify that the 
> migration is working: Lustre tells me that the top of the list is already on 
> some other OSTs, the bottom of the list still resides on the OST in question.
> 
> But when I do either 'lfs df' or 'df' on the OSS, and don't see any change in 
> terms of bytes, while the migrated files already sum up to several GB.
> 
> Is this a special feature of ZFS, or just a symptom of a broken OST?
> 
> 
> I think I have seen this behavior before, and the "df" result shrank to an 
> expected value after the server had been rebooted. In that case, this seems 
> more like a too persistent caching effect -?
> 
> Cheers,
> Thomas
> 
> -- 
> 
> Thomas Roth
> Department: Informationstechnologie
> Location: SB3 1.250
> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
> 
> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> Planckstraße 1
> 64291 Darmstadt
> www.gsi.de
> 
> Gesellschaft mit beschränkter Haftung
> Sitz der Gesellschaft: Darmstadt
> Handelsregister: Amtsgericht Darmstadt, HRB 1528
> 
> Geschäftsführung: Ursula Weyrich
> Professor Dr. Karlheinz Langanke
> Jörg Blaurock
> 
> Vorsitzende des Aufsichtsrates: St Dr. Georg Schütte
> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ZFS not freeing disk space

2016-08-10 Thread Alexander I Kulyavtsev
"Deleting Files Doesn't Free Space"
   https://github.com/zfsonlinux/zfs/issues/1188

"Deleting Files Doesn't Free Space, unless I unmount the filesystem"
https://github.com/zfsonlinux/zfs/issues/1548

There are more references listed on pages above.

Since 0.6.3 the possible work around can be to set zfs xattr=sa but that does 
not help with existing files on OST.

Alex.
P.S. Apparently there's more than one way to leak the space.


On Aug 10, 2016, at 6:07 PM, Alexander I Kulyavtsev 
mailto:a...@fnal.gov>> wrote:

It can be zfs snapshot holding space on ost.

Or, it can be zfs issue, zfs not releasing space until reboot. Check zfs bugs 
on zfs wiki.
Lustre shows change in OST used space right away.

zfs 0.6.3 is pretty old. We are using 0.6.4.1  with lustre 2.5.3. (there is  
zfs 0.6.4.2)
You may need to patch lustre 2.5.3 to go with some zfs 0.6.5.x;  patches were 
listed on this mail list.
Alex.

On Aug 10, 2016, at 12:57 PM, Thomas Roth mailto:t.r...@gsi.de>> 
wrote:

Hi all,

one of our ((Lustre 2.5.3, ZFS 0.6.3) OSTs got filled up to >90%, so I 
deactivated it and am now migrating files off of that OST.

Checking the list of files I am currently using, I can verify that the 
migration is working: Lustre tells me that the top of the list is already on 
some other OSTs, the bottom of the list still resides on the OST in question.

But when I do either 'lfs df' or 'df' on the OSS, and don't see any change in 
terms of bytes, while the migrated files already sum up to several GB.

Is this a special feature of ZFS, or just a symptom of a broken OST?


I think I have seen this behavior before, and the "df" result shrank to an 
expected value after the server had been rebooted. In that case, this seems 
more like a too persistent caching effect -?

Cheers,
Thomas

--

Thomas Roth
Department: Informationstechnologie
Location: SB3 1.250
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de<http://www.gsi.de>

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführung: Ursula Weyrich
Professor Dr. Karlheinz Langanke
Jörg Blaurock

Vorsitzende des Aufsichtsrates: St Dr. Georg Schütte
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] how to get objid in zfs?

2016-09-16 Thread Alexander I Kulyavtsev
What is the simple way to find corrupted file reported in lustre error, like :
Sep 12 19:12:29 lfs7 kernel: LustreError: 
10704:0:(ldlm_resource.c:1188:ldlm_resource_get()) lfs-OST0012: lvbo_init 
failed for resource 0x51b94b:0x0: rc = -2

0x51b94b looks like objid, how to find fid or the file corresponding to it?
I can scan underlying zfs with zdb like in zfsobj2fid.py . Where to look for 
objid?
What is format of trusted.lma in lustre zfs?

OST reported errors like these and some objects were orphaned. 
few files were reported by user having ls error
-? ? ?  ?  ??  filename
and I was able to identify file with lfs getstripe and unlink it.
 
Name to objid was easy. Now I need to translate the opposite way.
I would like to identify a dozen of other objects reported in the logs and 
trying to avoid full scan of namespace.

Lustre 2.5.3
zfs-0.6.4.1


Thank you in advance, 
Alex.



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] migrating from a stand-alone mgs to a merged mgs

2016-09-23 Thread Alexander I Kulyavtsev
Do you want put mgs and mdt on the same node, or same partition?

You may have mds and mdt on the same node on different partitions. It may be 
easier to do. mgs partition is small.

Alex.

On Sep 23, 2016, at 3:07 PM, John White  wrote:

> Good Afternoon,
>   I’m trying to figure out how to take an existing file system with a 
> distinct mgs and merge that into the (existing) mdt.  Is this possible?
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LustreError on ZFS volumes

2016-12-13 Thread Alexander I Kulyavtsev
It may worth to do zfs snapshot on ost before there were mass changes on ost, 
to investigate original issue and just in case things get worse if underlying 
zfs metadata are broken.


Did you do scrub pool (snapshot/clone) before migrating files out? It will not 
fix the data, may fix metadata and may point to corruption.


Chris M. published on  LU-5155 the link to his script 
zfsobj2fid
 to dump zfs objects and convert to fid. You may take a look on dump generated 
by script and script itself. It may worth to start with checking good known 
file.


Alex.



From: lustre-discuss  on behalf of 
Jesse Stroik 
Sent: Tuesday, December 13, 2016 1:15:28 PM
To: Crowe, Tom
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] LustreError on ZFS volumes

We discussed a course of action this morning and decided that we'd start
by migrating the files off of the OST. Testing suggests files that
cannot be completely read will be left on OST0002.

Due to the nature of the corruption - faulty hardware raid controller -
it seems unlikely we'll be able to meaningfully save any files that were
corrupted. This is something we may evaluate more closely once the
lfs_migrate is complete and we have our file list.

We'll then share the list of corrupted files with our users and find out
the cost of the lost data. If it's reasonably reproducible, we'll
reinitialize the RAID array and reformat the vdev.

Thanks for your help, Tom!

Best,
Jesse Stroik



On 12/12/2016 03:51 PM, Crowe, Tom wrote:
> Hi Jessie,
>
> In regards to you seeing 370 objects with errors form ‘zpool status’, but 
> having over 400 files with “access issues”, I would suggest running the 
> ‘zpool scrub’ to identify all the ZFS objects in the pool that are reporting 
> permanent errors.
>
> It would be very important to have a complete list of files w/issues, before 
> replicating the VDEV(s) in question.
>
> You may also want to dump the zdb information for the source VDEV(s) with the 
> following:
>
> zdb -dd source_pool/source_vdev > /some/where/with/room
>
> For example, if the zpool was named pool-01, and the VDEV was named 
> lustre-0001 and you had free space in a filesystem named /home:
>
> zdb -dd pool-01/lustre-0001 > /home/zdb_pool-01_0001_20161212.out
>
> There is a great wealth of data zdb can share about your files. Having the 
> output may prove helpful down the road.
>
> Thanks,
> Tom
>
>> On Dec 12, 2016, at 4:39 PM, Jesse Stroik  wrote:
>>
>> Thanks for taking the time to respond, Tom,
>>
>>
>>> For clarification, it sounds like you are using hardware based RAID-6, and 
>>> not ZFS raid? Is this correct? Or was the faulty card simply an HBA?
>>
>>
>> You are correct. This particular file system is still using hardware RAID6.
>>
>>
>>> At the bottom of the ‘zpool status -v pool_name’ output, you may see paths 
>>> and/or zfs object ID’s of the damaged/impacted files. This would be good to 
>>> take note of.
>>
>>
>> Yes, I output this to files at a few different times and we've had no chance 
>> since replacing the RAID controller, which makes me feel reasonably 
>> comfortable leaving the file system in production.
>>
>> There are 370 objects listed by zpool status -v but I am unable to access at 
>> least 400 files. Almost all of our files are single stripe.
>>
>>
>>> Running a ‘zpool scrub’ is a good idea. If the zpool is protected with "ZFS 
>>> raid", the scrub may be able to repair some of the damage. If the zpool is 
>>> not protected with "ZFS raid", the scrub will identify any other errors, 
>>> but likely NOT repair any of the damage.
>>
>>
>> We're not protected with ZFS RAID, just hardware raid6. I could run a patrol 
>> on the hardware controller and then a ZFS scrub if that makes the most sense 
>> at this point. This file system is scheduled to run a scrub the third week 
>> of every month so it would run one this weekend otherwise.
>>
>>
>>
>>> If you have enough disk space on hardware that is behaving properly (and 
>>> free space in the source zpool), you may want to replicate the VDEV’s (OST) 
>>> that are reporting errors. Having a replicated VDEV can afford you the 
>>> ability to examine the data without fear of further damage. You may also 
>>> want to extract certain files from the replicated VDEV(s) which are 
>>> producing IO errors on the source VDEV.
>>>
>>> Something like this for replication should work:
>>>
>>> zfs snap source_pool/source_ost@timestamp_label
>>> zfs send -Rv source_pool/source_ost@timestamp_label | zfs receive 
>>> destination_pool/source_oat_replicated
>>>
>>> You will need to set zfs_send_corrupt_data to 1 in 
>>> /sys/module/zfs/parameters or the ‘zfs send’ will error and fail when 
>>> sending a VDEV with read and/or checksum errors.
>>> Enabling zfs_send_corrupt_data allows the zfs send operation to complete. 
>>> Any blocks that are damaged on the source sid

Re: [lustre-discuss] Building against kmod spl/zfs

2017-01-13 Thread Alexander I Kulyavtsev
Hi Brian,

do you use rpm based system or something else?


I do not use yet kmod zfs lustre (using dkms) but I use kmod zfs on other zfs 
appliance.

I case of rpm base system you need to install zfs-release-1-5 rpm to configure 
yum.


Yum may use prebuilt dkms modules. RHEL based systems have kABI compatibility 
so you do not need to rebuild zfs for each minor kernel update.

Yum needs  proper repository enabled when installing rpms (zfs-kmod vs zfs for 
dkms or kmod). Sources are at zfs-source.

So you may install zfs from zfs-kmod repo and sources from zfs-source. Then 
build lustre against zfs-source headers.


You may find more information on zfsonlinux.org .


Alex.


From: lustre-discuss  on behalf of 
Andrus, Brian Contractor 
Sent: Thursday, January 12, 2017 11:14:50 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Building against kmod spl/zfs

All,

I am starting to try and build lustre using just kmod instead of dkms. Now the 
trouble I am seeing right off is that lustre wants the spl and zfs source 
(which is part of the dkms packages) just to configure it. So what would be the 
appropriate way to try and make kmod-lustre? Should we obtain the spl source 
and just point to it? Or perhaps there is no kmod for lustre server as yet?
My simple config line is:

./configure  --enable-server --disable-ldiskfs 
--with-linux=/usr/src/kernels/$(uname -r)

Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Odd quota behavior with Lustre/ZFS

2017-02-09 Thread Alexander I Kulyavtsev
Yes, in lustre 2.5.3 after doing chgrp for large subtree. IIRC, for three 
groups; counts were small different "negative" numbers, not 21.
I can get more details tomorrow.

Alex

> On Feb 9, 2017, at 5:14 PM, Mohr Jr, Richard Frank (Rick Mohr) 
>  wrote:
> 
> Has anyone else encountered this “off by 21” problem before?  I didn’t see 
> anything online, but perhaps I missed something.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Odd quota behavior with Lustre/ZFS

2017-02-16 Thread Alexander I Kulyavtsev
Hi Rick,

Last November I had 'negative' file count reported quota for two groups.

Checking it now the count for one of them got 'positive.' I guess user just 
wrote more files.


The file count reported by robinhood was 1,634 and 3924 files.


11/9/16


[root@lfsb ~]# lfs quota  -qh -g  thermog /mnt/lfsa

  /mnt/lfsa  0k  0k  0k   - 18446744073709550831   0
   0   -


[root@lfsb ~]# lfs quota  -qh -g  c51 /mnt/lfsa

  /mnt/lfsa  0k  0k  0k   - 18446744073709551611   0
   0   -

interestingly:

>>> hex(18446744073709551611)

'0xfffbL'

>>> hex(18446744073709550831)

'0xfcefL'


Right now lfs quota reports same large number of files for group c51:

18446744073709551611

and [almost] no space used.


# lfs quota  -g  c51 /mnt/lfsa

Disk quotas for group c51 (gid 9886):

 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace

  /mnt/lfsa   0  1048576 1048576   - 18446744073709551611*   1024   
 1024   -


I can mount mdt snapshot and take a look but I have to know what to look for.


This is lustre 2.5.3 GA with few patches. zfs-0.6.4.1


Alex.



From: Mohr Jr, Richard Frank (Rick Mohr) 
Sent: Thursday, February 16, 2017 3:02 PM
To: Alexander I Kulyavtsev
Cc: Lustre discussion
Subject: Re: [lustre-discuss] Odd quota behavior with Lustre/ZFS

Alex,

Were you ever able to get more details about this problem?  Thanks.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu


> On Feb 9, 2017, at 10:27 PM, Alexander I Kulyavtsev  wrote:
>
> Yes, in lustre 2.5.3 after doing chgrp for large subtree. IIRC, for three 
> groups; counts were small different "negative" numbers, not 21.
> I can get more details tomorrow.
>
> Alex
>
>> On Feb 9, 2017, at 5:14 PM, Mohr Jr, Richard Frank (Rick Mohr) 
>>  wrote:
>>
>> Has anyone else encountered this “off by 21” problem before?  I didn’t see 
>> anything online, but perhaps I missed something.


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] set OSTs read only ?

2017-07-12 Thread Alexander I Kulyavtsev
You may find advise from Andreas on this list (also attached below). I did not 
try setting fail_loc myself.

In 2.9 there is setting  osp.*.max_create_count=0 described at LUDOC-305.

We used to set OST degraded as described in lustre manual.
It works most of the time but at some point I saw lustre errors in logs for 
some ops. Sorry, I do not recall details.

I still not sure either of these approaches will work for you: setting OST 
degraded or fail_loc will makes some osts selected instead of others.
You may want to verify if these settings will trigger clean error on user side 
(instead of blocking) when all OSTs are degraded.

The other and also simpler approach would be to enable lustre quota and set 
quota below used space for all users (or groups).

Alex.

From: "Dilger, Andreas" 
mailto:andreas.dil...@intel.com>>
Subject: Re: [lustre-discuss] lustre 2.5.3 ost not draining
Date: July 28, 2015 at 11:51:38 PM CDT
Cc: "lustre-discuss@lists.lustre.org" 
mailto:lustre-discuss@lists.lustre.org>>

Setting it degraded means the MDS will avoid allocations on that OST
unless there aren't enough OSTs to meet the request (e.g. stripe_count =
-1), so it should work.

That is actually a very interesting workaround for this problem, and it
will work for older versions of Lustre as well.  It doesn't disable the
OST completely, which is fine if you are doing space balancing (and may
even be desirable to allow apps that need more bandwidth for a widely
striped file), but it isn't good if you are trying to empty the OST
completely to remove it.

It looks like another approach would be to mark the OST as having no free
space using OBD_FAIL_OST_ENOINO (0x229) fault injection on that OST:

  lctl set_param fail_loc=0x229 fail_val=

This would cause the OST to return 0 free inodes from OST_STATFS for the
specified OST index, and the MDT would skip this OST completely.  To
disable all of the OSTs on an OSS use  = -1.  It isn't possible
to selectively disable a subset of OSTs using this method.  The
OBD_FAIL_OST_ENOINO fail_loc has been available since Lustre 2.2, which
covers all of the 2.4+ versions that are affected by this issue.

If this mechanism works for you (it should, as this fail_loc is used
during regular testing) I'd be obliged if someone could file an LUDOC bug
so the manual can be updated.

Cheers, Andreas


On Jul 12, 2017, at 4:20 PM, Riccardo Veraldi 
mailto:riccardo.vera...@cnaf.infn.it>> wrote:

Hello,

on one of my lustre FS I need to find a solution so that users can still
access data on the FS but cannot write new files on it.
I have hundreds of clients accessing the FS so remounting it ro is not
really easily feasible.
Is there an option on the OSS side to allow OSTs to be accessed just to
read data and not to store new data ?
tunefs.lustre could do that ?
thank you

Rick

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Interesting disk usage of tar of many small files on zfs-based Lustre 2.10

2017-08-03 Thread Alexander I Kulyavtsev
Lustre IO size is 1MB; you have zfs record 4MB.
Do you see IO rate change when tar record size set to 4 MB (tar -b 8192) ?

How many data disks do you have at raidz2?

zfs may write few extra empty blocks to improve defragmentation; IIRC this 
patch is on by default in zfs 0.7 to improve io rates for some disks:
https://github.com/zfsonlinux/zfs/pull/5931

 If I understand it correctly, for very small files (untarred) there will be 
overhead to pad file to record size, and for extra padding to P+1 records (=P 
extra) and for parity records (+P). Plus metadata size for the lustre ost 
object. For raidz2 with P=2 it is factor 5x or more.

Alex.

On Aug 3, 2017, at 7:28 PM, Nathan R.M. Crawford 
mailto:nrcra...@uci.edu>> wrote:

Off-list, it was suggested that tar's default 10K blocking may be the cause. I 
increased it to 1MiB using "tar -b 2048 ...", which seems to result in the 
expected 9.3 GiB disk usage. It probably makes archives incompatible with very 
old versions of tar, but meh.

-Nate

On Thu, Aug 3, 2017 at 3:07 PM, Nathan R.M. Crawford 
mailto:nrcra...@uci.edu>> wrote:
  In testing how to cope with naive users generating millions of tiny files, I 
noticed some surprising (to me) behavior on a lustre 2.10/ZFS 0.7.0 system.

  The test directory (based on actual user data) contains about 4 million files 
(avg size 8.6K) in three subdirectories. Making tar files of each subdirectory 
gives the total nominal size of 34GB, and using "zfs list", the tar files took 
up 33GB on disk.

  The initially surprising part is that making copies of the tar files only 
adds 9GB to the disk usage. I suspect that the creation of the tar files is as 
a bunch of tiny appendings, and with a raidz2 on ashift=12 disks (4MB max 
recordsize), there is some overhead/wasted space on each mini-write. The copies 
of the tar files, however, could be made as a single write that avoided the 
overhead and probably allowed the lz4 compression to be more efficient.

  Are there any tricks or obscure tar options that make archiving millions of 
tiny files on a Lustre system avoid this? It is not a critical issue, as taking 
a minute to copy the tar files is simple enough.

-Nate

--

Dr. Nathan Crawford  
nathan.crawf...@uci.edu
Modeling Facility Director
Department of Chemistry
1102 Natural Sciences II Office: 2101 Natural Sciences II
University of California, Irvine  Phone: 949-824-4508
Irvine, CA 92697-2025, USA



--

Dr. Nathan Crawford  
nathan.crawf...@uci.edu
Modeling Facility Director
Department of Chemistry
1102 Natural Sciences II Office: 2101 Natural Sciences II
University of California, Irvine  Phone: 949-824-4508
Irvine, CA 92697-2025, USA

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] nodes crash during ior test

2017-08-07 Thread Alexander I Kulyavtsev
Lustre wiki has sidebars on Testing and Monitoring, you may start Benchmarking.

there was Benchmarking Group in OpenSFS.
wiki:  http://wiki.opensfs.org/Benchmarking_Working_Group
mail list: http://lists.opensfs.org/listinfo.cgi/openbenchmark-opensfs.org

It is actually question to the list what is the preferred location for KB on 
lustre benchmarking: on lustre.org or 
opensfs.org.
IMHO KB on lustre.org and BWG minutes (if it reengage)  on 
opensfs.org.

Alex.


On Aug 7, 2017, at 7:56 AM, E.S. Rosenberg 
mailto:esr+lus...@mail.hebrew.edu>> wrote:

OT:
Can we create a wiki page or some other form of knowledge pooling on 
benchmarking lustre?

Right now I'm using slides from 2009 as my source which may not be ideal...

http://wiki.lustre.org/images/4/40/Wednesday_shpc-2009-benchmarking.pdf

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Recompiling client from the source doesnot contain lnetctl

2018-01-23 Thread Alexander I Kulyavtsev
Andreas,
It will be extremely helpful to have "rpmbuild --rebuild" to build lustre 
[client] rpm with the same content and functionality as lustre[-client] rpm 
from distro.
The client rpm is rebuild most often for different kinds of worker nodes and I 
see several mails on this list reporting lnetctl is not there.

enable-dlc/disable-dlc flag is not described in manual or lustre wiki 
(http://wiki.lustre.org/Compiling_Lustre) .

Arman,
for the record:
To get lnetctl into the client rpm on slf6.8 I was need to use flags:
rpmbuild  --rebuild --without servers --with lnet-dlc --with 
lustre-utils ./lustre-2.10.2-1.src.rpm

This adds files below to lustre-client-*.rpm wrt build without these flags:
> /etc/lnet.conf
> /usr/lib64/liblnetconfig.a
> /usr/lib64/liblnetconfig.so
> /usr/lib64/liblnetconfig.so.2
> /usr/lib64/liblnetconfig.so.2.0.0
> /usr/sbin/lnetctl

I did not try "--with lnet-dlc" alone but using only --with lustre-utils was 
not enough.
Probably --enable-utils and --enable-dlc configure arguments can be used for 
the same effect.
Using flags lnet-dlc and lustre-utils  is in addition to requirement to have 
libyaml-devel installed on build host.

Alex.

> On Nov 30, 2017, at 2:22 PM, Dilger, Andreas  wrote:
> 
> You should also check the config.log to see if it properly detected libyaml 
> being installed and enabled “USE_DLC” for the build:
> 
> configure:35728: checking for yaml_parser_initialize in -lyaml
> configure:35791: result: yes
> configure:35801: checking whether to enable dlc
> configure:35815: result: yes
> 
> Cheers, Andreas
> 
> On Nov 29, 2017, at 05:28, Arman Khalatyan  wrote:
> 
>> even in the extracted source code the lnetctl does not compile.
>> running make in the utils folder it is producing wirecheck,lst and
>> routerstat, but not lnetctl.
>> After running "make lnetctl" in the utils folder
>> /tmp/lustre-2.10.2_RC1/lnet/utils
>> 
>> it produces the executable.
>> 
>> 
>> On Wed, Nov 29, 2017 at 11:52 AM, Arman Khalatyan  wrote:
>>> Hi Andreas,
>>> I just checked the yaml-devel it is installed:
>>> yum list installed | grep yaml
>>> libyaml.x86_64 0.1.4-11.el7_0  @base
>>> libyaml-devel.x86_64   0.1.4-11.el7_0  @base
>>> 
>>> and still no success:
>>> rpm -qpl rpmbuild/RPMS/x86_64/*.rpm| grep lnetctl
>>> /usr/share/man/man8/lnetctl.8.gz
>>> /usr/src/debug/lustre-2.10.2_RC1/lnet/include/lnet/lnetctl.h
>>> 
>>> are there any other dependencies ?
>>> 
>>> Thanks,
>>> Arman.
>>> 
>>> On Wed, Nov 29, 2017 at 6:46 AM, Dilger, Andreas
>>>  wrote:
 On Nov 28, 2017, at 07:58, Arman Khalatyan  wrote:
> 
> Hello,
> I would like to recompile the client from the rpm-source but looks
> like the packaging on the jenkins is wrong:
> 
> 1) wget 
> https://build.hpdd.intel.com/job/lustre-b2_10/arch=x86_64,build_type=client,distro=el7,ib_stack=inkernel/lastSuccessfulBuild/artifact/artifacts/SRPMS/lustre-2.10.2_RC1-1.src.rpm
> 2) rpmbuild --rebuild --without servers lustre-2.10.2_RC1-1.src.rpm
> after the successful build the rpms doesn't contain the lnetctl but
> the help only
> 3) cd /root/rpmbuild/RPMS/x86_64
> 4) rpm -qpl ./*.rpm| grep lnetctl
> /usr/share/man/man8/lnetctl.8.gz
> /usr/src/debug/lustre-2.10.2_RC1/lnet/include/lnet/lnetctl.h
> 
> The   lustre-client-2.10.2_RC1-1.el7.x86_64.rpm on the jenkins
> contains the lnetctl
> Maybe I should add more options to rebuild the client + lnetctl?
 
 You need to have libyaml-devel installed on your build node.
 
 Cheers, Andreas
 --
 Andreas Dilger
 Lustre Principal Architect
 Intel Corporation
 
 
 
 
 
 
 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] ltop for lustre 2.10.2 ?

2018-02-01 Thread Alexander I Kulyavtsev
Is there version of ltop working with lustre 2.10.2 ?

lmtmetric does not work on mds; it is OK checking ost on oss.

# lmtmetric -m mdt
lmtmetric: error reading lustre MDS uuid from proc: Invalid argument
lmtmetric: mdt metric: Invalid argument

I suspect it is due to lmtmetric constantly reading  /proc/fs/lustre/version 
and it is now in /sys/...

> [lustre-discuss] missing lustre version in /proc/fs/lustre/version
> 
> Dilger, Andreas 
> Tue Jun 20 15:34:57 PDT 2017
> ... skip ...
> We've had to move a lot of Lustre parameters out of /proc/fs/lustre and into 
> /sys/fs/lustre
> for most parameter values, or /sys/kernel/debug/lustre (via debugfs) for 
> large statistics
> due to rules imposed by the upstream kernel developers.
> ...

I'm trying to get ltop working on SLF 7.4 with lustre 2.10.2. No need for lmt & 
DB.

Alex.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ltop for lustre 2.10.2 ?

2018-02-02 Thread Alexander I Kulyavtsev
Great, thanks!

Meanwhile the attached patch makes lmt mostly work for me for now with LOCKS, 
LGR, LCR missing.
fs/lustre/ldlm/namespaces now in /sys too.

Part of the issue is that _packed_lustre_version() error return = -1 is not 
processed in calling functions but is interpreted as packed lustre version ( 
the version earlier than 0.0.0.0 ).
This causes paths in /proc to be defined like for lustre 1.8.x and this 
definitely does not work for recent lustre.
Just setting default lustre verstion to 2.10.0 (you can set to 99.999.0.0) 
makes ltop to work.
I guess lustre versions 1.8 and 2.0 can be dropped because if someone needs 
these he can take the older distro of ltop.

The other issue:
lmtmetric succesfully opens "/proc/fs/lustre/version" nine times per run (every 
time per other two open()) as I can see on my older 2.5.3 system;
# strace -e open lmtmetric -m mdt 2>&1 | fgrep ' = 3'
18 calls total to open("/proc/fs/lustre/version", ...) out of 35 calls to 
open("/proc/fs/lustre/...)
Lustre version can be cached.
"/proc/fs/lustre/osd-zfs" is opened four times.

Best regards, Alex.




> On Feb 1, 2018, at 1:37 PM, Di Natale, Giuseppe  wrote:
>
> Hi Alex,
>
> We are aware of the problem. I'm going to try and develop a patch to handle 
> the move to /sys/ in the next week or so.
>
> Giuseppe
> From: lustre-discuss  on behalf of 
> Alexander I Kulyavtsev 
> Sent: Thursday, February 1, 2018 9:10:20 AM
> To: lustre-discuss@lists.lustre.org
> Subject: [lustre-discuss] ltop for lustre 2.10.2 ?
>
> Is there version of ltop working with lustre 2.10.2 ?
>
> lmtmetric does not work on mds; it is OK checking ost on oss.
>
> # lmtmetric -m mdt
> lmtmetric: error reading lustre MDS uuid from proc: Invalid argument
> lmtmetric: mdt metric: Invalid argument
>
> I suspect it is due to lmtmetric constantly reading  /proc/fs/lustre/version 
> and it is now in /sys/...
>
> > [lustre-discuss] missing lustre version in /proc/fs/lustre/version
> >
> > Dilger, Andreas
> > Tue Jun 20 15:34:57 PDT 2017
> > ... skip ...
> > We've had to move a lot of Lustre parameters out of /proc/fs/lustre and 
> > into /sys/fs/lustre
> > for most parameter values, or /sys/kernel/debug/lustre (via debugfs) for 
> > large statistics
> > due to rules imposed by the upstream kernel developers.
> > ...
>
> I'm trying to get ltop working on SLF 7.4 with lustre 2.10.2. No need for lmt 
> & DB.
>
> Alex.
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



libproc.lustre.c.patch
Description: libproc.lustre.c.patch
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] File locking errors.

2018-02-15 Thread Alexander I Kulyavtsev
Do you have flock option in fstab for lustre mount or in command you use to 
mount lustre on client?

Search for flock on lustre wiki
http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes
or lustre manual
http://doc.lustre.org/lustre_manual.pdf

Here are links where to start learning about lustre:
* http://lustre.org/getting-started-with-lustre/
* http://wiki.lustre.org
* https://wiki.hpdd.intel.com
* jira.hpdd.intel.com
* http://opensfs.org/lustre/

Alex.

On Feb 15, 2018, at 11:02 AM, Prentice Bisbal 
mailto:pbis...@pppl.gov>> wrote:

Hi.

I'm an experience HPC system admin, but I know almost nothing about Lustre 
administration. The system admin who administered our small Lustre filesystem 
recently retired, and no one has filled that gap yet. A user recently reported 
they are now getting file-locking errors from a program they've run repeatedly 
on Lustre in the past. When the run the same program on an NFS filesystem, the 
error goes away. I've cut-and-pasted the error messages below.

Since I have real experience as a Lustre admin, I turned to google, and it 
looks like it might be that the file-locking daemon died (if Lustre has a 
separate file-lock daemon), or somehow file-locking was recently disabled. If 
that is possible, how do I check this, and restart or re-enable if necessary?  
I skimmed the user manual, and could not find anything on either of these 
issues.

Any and all help will be greatly appreciated.

Some of the error messages:

HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) MPI-process 9:
  #000: H5F.c line 579 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize 
file structure
major: File accessibilty
minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
major: Virtual File Layer
minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 
38, error message = 'Function not implemented'
major: File accessibilty
minor: Bad file ID accessed
Error: couldn't open file HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) 
MPI-process 13:
  #000: H5F.c line 579 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize 
file structure
major: File accessibilty
minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
major: Virtual File Layer
minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 
38, error message = 'Function not implemented'
major: File accessibilty
minor: Bad file ID accessed

--
Prentice

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre LTS 2.10 on el6.x ... was Re: File locking errors.

2018-02-15 Thread Alexander I Kulyavtsev
Eli,
since lustre version 2.10.1 intel build server has server rpms for el6.9 with 
in kernel ofed (but not on download server).
e.g. 2.10.3 GA
https://build.hpdd.intel.com/job/lustre-b2_10/69/
It means lustre 2.10.x at least builds on el6.9 and I guess it shall be easier 
with patchless server (zfs or patchless ldiskfs ost).

It can be simpler path for us too  to upgrade existing lustre 2.5 system on 
el6.x to minimize downtime and to have rollback option - upgrade zfs and 
lustre; keep OS or do minor OS upgrade.
On new system we about to go live with 2.10.3 on slf 7.4.

Peter, Andreas,
do you test el6.x server to any extent? Is it worth trying to use it on test 
system?

Alex.


On Feb 15, 2018, at 4:17 PM, E.S. Rosenberg 
mailto:esr+lus...@mail.hebrew.edu>> wrote:

On Fri, Feb 16, 2018 at 12:00 AM, Colin Faber 
mailto:cfa...@gmail.com>> wrote:
If the mount on the users clients had the various options enabled, and those 
aren't present in fstab, you'd end up with such behavior. Also 2.8? Can you 
upgrade to 2.10 LTS??
Depending on when they installed their system that may not be such a 'small' 
change, our 2.8 is running on CentOS 6.8 so an upgrade to 2.10 requires us to 
also upgrade the OS from 6.x to 7.x and though I very much want to do that that 
is a more intensive process that so far I have not had the time for and I can 
imagine others have the same issue.
Regards,
Eli

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre client 2.10.3 build/install problem on Centos 6.7

2018-03-23 Thread Alexander I Kulyavtsev
There are detailed build instructions on lustre.org wiki :
http://wiki.lustre.org/Compiling_Lustre

I built, installed and lustre client 2.10.3 works on SLF 6.x as below:

> $ cat /etc/redhat-release 
> Scientific Linux Fermi release 6.9 (Ramsey)
> $ uname -r
> 2.6.32-696.1.1.el6.x86_64

> $ cat /etc/redhat-release 
> Scientific Linux Fermi release 6.8 (Ramsey)
> $ uname -r
> 2.6.32-642.15.1.el6.x86_64

Build script (in-kernel IB) to build lustre client to be installed on the same 
host:

> $ cat lcb.inkernel-ib.sh 
> 
> #!/bin/bash
> 
> DIR=../downloads/downloads.hpdd.intel.com
> SRPM=lustre-2.10.3-1.src.rpm
> 
> rpmbuild  --rebuild --without servers --with lnet-dlc --with lustre-utils 
> $DIR/$SRPM
> 

Essentially this is the command you use.
There are extra arguments if you compile for different kernel, e.g. building on 
node with 3.10 for older kernel 2.6.32 .

It looks like you are trying to install lustre client module built with newer 
kernel.  Take a look at build logs.

You have to install kernel-devel rpm for your kernel on build node.

Alex.

> On Mar 23, 2018, at 1:30 AM, Alex Vodeyko  wrote:
> 
> Hi,
> 
> I'm trying to build/install Lustre 2.10.3 client on Centos 6.7.
> "rpmbuild  --rebuild --without servers lustre/lustre-2.10.3-1.src.rpm" goes 
> fine, but 
> "yum localupdate kmod-lustre-client-2.10.3-1.el6.x86_64.rpm 
> lustre-client-2.10.3-1.el6.x86_64.rpm" failed with:
> Error: Package: kmod-lustre-client-2.10.3-1.el6.x86_64 
> (/kmod-lustre-client-2.10.3-1.el6.x86_64)
>Requires: kernel >= 3.10.0-693
> 
> Building and installing on the same system with 2.6.32-696.10.2.el6.x86_64 
> kernel.
> 
> Same procedure worked fine on the same system with lustre 2.10.1 client.
> 
> Could you please help with it?
> 
> Thanks,
> Alex
> 
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD

2018-04-10 Thread Alexander I Kulyavtsev
Ricardo,
It can be helpful to see output of commands on zfs pool host when you read 
files through lustre client; and directly through zfs:

# zpool iostat -lq -y zpool_name 1
# zpool iostat -w -y zpool_name 5
# zpool iostat -r -y zpool_name 5

-q queue statistics
-l Latency statistics

-r Request size histogram:
-w (undocumented) latency statistics

I did see different behavior of zfs reads on zfs pool for the same dd/fio 
command reading file from lustre mount on different host; and directly from zfs 
on OSS. I created separate zfs dataset with similar zfs settings on lustre 
zpool.
lustre IO seen on zfs pool with 128KB requests while dd/fio on zfs has 1MB 
requests. dd/fio command used 1MB IO.

zptevlfs6 sync_readsync_writeasync_readasync_write  scrub   
req_size  indaggindaggindaggindaggindagg
--  -  -  -  -  -  -  -  -  -  -
512 0  0  0  0  0  0  0  0  0  0
1K  0  0  0  0  0  0  0  0  0  0
2K  0  0  0  0  0  0  0  0  0  0
4K  0  0  0  0  0  0  0  0  0  0
8K  0  0  0  0  0  0  0  0  0  0
16K 0  0  0  0  0  0  0  0  0  0
32K 0  0  0  0  0  0  0  0  0  0
64K 0  0  0  0  0  0  0  0  0  0
128K0  0  0  0  2.00K  0  0  0  0  
0 <
256K0  0  0  0  0  0  0  0  0  0
512K0  0  0  0  0  0  0  0  0  0
1M  0  0  0  0125  0  0  0  0  
0<
2M  0  0  0  0  0  0  0  0  0  0
4M  0  0  0  0  0  0  0  0  0  0
8M  0  0  0  0  0  0  0  0  0  0
16M 0  0  0  0  0  0  0  0  0  0

^C

Alex.


On 4/9/18, 6:15 PM, "lustre-discuss on behalf of Dilger, Andreas" 
 
wrote:

On Apr 6, 2018, at 23:04, Riccardo Veraldi  
wrote:
> 
> So I'm struggling since months with these low performances on Lsutre/ZFS.
> 
> Looking for hints.
> 
> 3 OSSes, RHEL 74  Lustre 2.10.3 and zfs 0.7.6
> 
> each OSS has one  OST raidz
> 
>   pool: drpffb-ost01
>  state: ONLINE
>   scan: none requested
>   trim: completed on Fri Apr  6 21:53:04 2018 (after 0h3m)
> config:
> 
> NAME  STATE READ WRITE CKSUM
> drpffb-ost01  ONLINE   0 0 0
>   raidz1-0ONLINE   0 0 0
> nvme0n1   ONLINE   0 0 0
> nvme1n1   ONLINE   0 0 0
> nvme2n1   ONLINE   0 0 0
> nvme3n1   ONLINE   0 0 0
> nvme4n1   ONLINE   0 0 0
> nvme5n1   ONLINE   0 0 0
> 
> while the raidz without Lustre perform well at 6GB/s (1GB/s per disk),
> with Lustre on top of it performances are really poor.
> most of all they are not stable at all and go up and down between
> 1.5GB/s and 6GB/s. I Tested with obfilter-survey
> LNET is ok and working at 6GB/s (using infiniband FDR)
> 
> What could be the cause of OST performance going up and down like a
> roller coaster ?

Riccardo,
to take a step back for a minute, have you tested all of the devices
individually, and also concurrently with some low-level tool like
sgpdd or vdbench?  After that is known to be working, have you tested
with obdfilter-survey locally on the OSS, then remotely on the client(s)
so that we can isolate where the bottleneck is being hit.

Cheers, Andreas


> for reference here are few considerations:
> 
> filesystem parameters:
> 
> zfs set mountpoint=none drpffb-ost01
> zfs set sync=disabled drpffb-ost01
> zfs set atime=off drpffb-ost01
> zfs set redundant_metadata=most drpffb-ost01
> zfs set xattr=sa drpffb-ost01
> zfs set recordsize=1M drpffb-ost01
> 
> NVMe SSD are  4KB/sector
> 
> ashift=12
> 
> 
> ZFS module parameters
> 
> options zfs zfs_prefetch_disable=1
> options zfs zfs_txg_history=120
> options zfs metaslab_debug_unload=1
> #
> options zfs zfs_vdev_scheduler=deadline
> options zfs zfs_vdev_async_write_active_min_dirty_percent=20
> #
> options zfs zfs_vdev_scrub_min_active=48
> options zfs zfs_vdev_scrub_max_active=128
   

Re: [lustre-discuss] luster 2.10.3 lnetctl configurations not persisting through reboot

2018-04-17 Thread Alexander I Kulyavtsev
File /etc/lnet.conf is described on lustre wiki:
   http://wiki.lustre.org/Dynamic_LNet_Configuration_and_lnetctl

Alex.


On 4/17/18, 3:37 PM, "lustre-discuss on behalf of Kurt Strosahl" 
 wrote:

I configured an lnet router today with luster 2.10.3 as the lustre 
software.  I then connfigured the lnet router using the following lnetctl 
commands


lnetctl lnet configure
lnetctl net add --net o2ib0 --if ib1
lnetctl net add --net o2ib1 --if ib0
lnetctl set routing 1

When I rebooted the router the configuration didn't stick.  Is there a way 
to make this persist through a reboot?

I also notices that when I do an export of the lnetctl configuration it 
contains

- net type: o2ib1
  local NI(s):
- nid: @o2ib1
  status: up
  interfaces:
  0: ib0
  statistics:
  send_count: 2958318
  recv_count: 2948077
  drop_count: 0
  tunables:
  peer_timeout: 180
  peer_credits: 8
  peer_buffer_credits: 0
  credits: 256
  lnd tunables:
  peercredits_hiw: 4
  map_on_demand: 256
  concurrent_sends: 8
  fmr_pool_size: 512
  fmr_flush_trigger: 384
  fmr_cache: 1
  ntx: 512
  conns_per_peer: 1
  tcp bonding: 0
  dev cpt: 0
  CPT: "[0,1]"

Is this expected behavior?

w/r,
Kurt J. Strosahl
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre 2.11 lnet troubleshooting

2018-04-17 Thread Alexander I Kulyavtsev
To original question: lnetctl on router node shows ‘enable: 1 ’ 

# lnetctl routing show
routing:
- cpt[0]:
 …snip…
- enable: 1

Lustre 2.10.3-1.el6

Alex.

On 4/17/18, 7:05 PM, "lustre-discuss on behalf of Faaland, Olaf P." 
 wrote:

Update:

Joe pointed out "lnetctl set routing 1".  After invoking that on the router 
node, the compute node reports the route as up:

[root@ulna66:lustre-211]# lnetctl route show -v
route:
- net: o2ib100
  gateway: 192.168.128.4@o2ib33
  hop: -1
  priority: 0
  state: up

Does this replace the lnet module parameter "forwarding"?

Olaf P. Faaland
Livermore Computing



From: lustre-discuss  on behalf of 
Faaland, Olaf P. 
Sent: Tuesday, April 17, 2018 4:34:22 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Lustre 2.11 lnet troubleshooting

Hi,

I've got a cluster running 2.11 with 2 routers and 68  compute nodes.  It's 
the first time I've used a post-multi-rail version of Lustre.

The problem I'm trying to troubleshoot is that my sample compute node 
(ulna66) seems to think the router I configured (ulna4) is down, and so an 
attempt to ping outside the cluster results in failure and "no route to XXX" on 
the console.  I can lctl ping the router from the compute node and vice-versa.  
 Forwarding is enabled on the router node via modprobe argument.

lnetctl route show reports that the route is down.  Where I'm stuck is 
figuring out what in userspace (e.g. lnetctl or lctl) can tell me why.

The compute node's lnet configuration is:

[root@ulna66:lustre-211]# cat /etc/lnet.conf
ip2nets:
  - net-spec: o2ib33
interfaces:
 0: hsi0
ip-range:
 0: 192.168.128.*
route:
- net: o2ib100
  gateway: 192.168.128.4@o2ib33

After I start lnet, systemctl reports success and the state is as follows:

[root@ulna66:lustre-211]# lnetctl net show
net:
- net type: lo
  local NI(s):
- nid: 0@lo
  status: up
- net type: o2ib33
  local NI(s):
- nid: 192.168.128.66@o2ib33
  status: up
  interfaces:
  0: hsi0

[root@ulna66:lustre-211]# lnetctl peer show --verbose
peer:
- primary nid: 192.168.128.4@o2ib33
  Multi-Rail: False
  peer ni:
- nid: 192.168.128.4@o2ib33
  state: up
  max_ni_tx_credits: 8
  available_tx_credits: 8
  min_tx_credits: 7
  tx_q_num_of_buf: 0
  available_rtr_credits: 8
  min_rtr_credits: 8
  refcount: 4
  statistics:
  send_count: 2
  recv_count: 2
  drop_count: 0

[root@ulna66:lustre-211]# lnetctl route show --verbose
route:
- net: o2ib100
  gateway: 192.168.128.4@o2ib33
  hop: -1
  priority: 0
  state: down

I can instrument the code, but I figure there must be someplace available 
to normal users to look, that I'm unaware of.

thanks,

Olaf P. Faaland
Livermore Computing
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre project quotas

2018-05-10 Thread Alexander I Kulyavtsev
Do you use zfs or ldiskfs on OST?
Zfs does not have project quota yet. Alex.

From: lustre-discuss  on behalf of 
Einar Næss Jensen 
Date: Thursday, May 10, 2018 at 7:47 AM
To: "lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] lustre project quotas


​Lustre server is 2.10.1

lustre client is 2.10.3


--
Einar Næss Jensen
NTNU HPC Section
Norwegian University of Science and Technoloy
Address: Høgskoleringen 7i
 N-7491 Trondheim, NORWAY
tlf: +47 90990249
email:   einar.nass.jen...@ntnu.no

From: lustre-discuss  on behalf of 
Einar Næss Jensen 
Sent: Thursday, May 10, 2018 2:45 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] lustre project quotas


Hello.



I have sucessfully installed lustre and it works well, but I'm having trouble 
figuring out how to enable / set project quotas.

How can I verify that project quotas are enabled, and how do I set up projects 
and assign directories and users to the projects?





Best Regards

Einar Næss Jensen


--
Einar Næss Jensen
NTNU HPC Section
Norwegian University of Science and Technoloy
Address: Høgskoleringen 7i
 N-7491 Trondheim, NORWAY
tlf: +47 90990249
email:   einar.nass.jen...@ntnu.no
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LUG 2018

2018-06-20 Thread Alexander I Kulyavtsev
Slides at:
http://opensfs.org/lug-2018-agenda/
-A.

From: lustre-discuss  on behalf of 
"E.S. Rosenberg" 
Date: Wednesday, June 20, 2018 at 12:20 PM
To: Lustre discussion 
Subject: [lustre-discuss] LUG 2018

Hi all,
Are the talks online yet?
Thanks,
Eli


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] MOFED 4.4-1.0.0.0

2018-08-06 Thread Alexander I Kulyavtsev
Hi Megan,
Standard lustre build works with in-kernel ofed. To use Mellanox ofed you have 
to rebuild lustre:
http://wiki.lustre.org/Compiling_Lustre
Apparently there is version of lustre 2.10.4 built with mlx ofed in whamcloud 
download area:
https://downloads.whamcloud.com/public/lustre/lustre-2.10.4-ib/

What version did you try?

Best regards, Alex.

From: lustre-discuss  on behalf of 
"Ms. Megan Larko" 
Date: Saturday, August 4, 2018 at 10:26 AM
To: Lustre User Discussion Mailing List 
Subject: [lustre-discuss] MOFED 4.4-1.0.0.0

Hi,

I have found that Lustre-2.10.4 works only with CentOS linux kernel 
3.10.0-693.x. and newer.  I discovered that Mellanox MOFED 4.3(or 4?)-1.0.1.0 
(I'm not where I can verify the MOFED version number but it is 4 and ending in 
"1.0.1.0")  will not work with CentOS linux kernel 3.10.0-8*.  So I have a 
successful Lustre 2.10.4 with the "693" linux kernel series and MOFED 
4.3/4-1.0.1.0

So I will second the statement that the software version stack is indeed 
"particular".

Cheers,
megan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre vs. lustre-client

2018-08-10 Thread Alexander I Kulyavtsev
What about lustre client in upstream kernel? 
I guess lustre-common and lustre-client shall be packaged in a way that these 
rpms can be drop-in replacement for lustre client functionality in upstream 
kernel like today we have lustre with in-kernel IB or custom IB.

Also there was  discussion to split off lnet rpm.

“only a handful of modules would be different between the client and server”
Do these extra server modules bring extra dependencies like zfs or else? 

Alex.


On 8/10/18, 5:36 PM, "Andreas Dilger"  wrote:

On Aug 9, 2018, at 18:51, Faaland, Olaf P.  wrote:
> 
> Hi,
> 
> What is the reason for naming the package "lustre" if it includes both 
client and server binaries, but "lustre-client" if it includes only the client?
> 
> = (from
> # Set the package name prefix
> %if %{undefined lustre_name}
>%if %{with servers}
>%global lustre_name lustre
>%else
>%global lustre_name lustre-client
>%endif
> %endif
> =
> 
> Are there sites that build both with and without servers, and need to 
keep track which is installed on a given machine?  The size of the RPMs isn't 
that different, so it's not obvious to me why one would do that.

The original reason for separate "lustre" and "lustre-client" packages was
that the "lustre-client" package was built against a patchless kernel, so
that it could be installed on unmodified client systems.  At the time, this
was a departure from the all-inclusive "lustre" package that was always
built against a patched kernel.

Until not so long ago, it wasn't possible to build a server against an
upatched kernel, but that has been working for a while now.  We do build
"patched" and "unpatched" server RPMs today, but haven't gotten around to
changing the packaging to match.

At this point, I think it makes sense to just move over to RPMs for patched
and unpatched kernels, and get rid of the "-client" package.  Alternately,
we could have "lustre-client", "lustre-server", and "lustre-common" RPMs,
but (IMHO) that just adds more confusion for the users, and doesn't really
reduce the package size significantly (only a handful of modules would be
different between the client and server).

Having a patched server kernel isn't needed for ZFS, and while it works for
ldiskfs as well, there are still a few kernel patches that improve ldiskfs
server performance/functionality that are not in RHEL7 (e.g. project quota,
the upcoming T10-PI interface changes) that make it desirable to keep both
options until those changes are in vendor kernels.

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud










___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [Lustre-discuss] GIT corrupted on lustre

2012-12-23 Thread Alexander I Kulyavtsev
Stackoverflow has thread
http://stackoverflow.com/questions/4254389/git-corrupt-loose-object
with reference to article by Linus how to recover

http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/howto/recover-corrupted-blob-object.txt;h=323b513ed0e0ce8b749672f589a375073a050b97;hb=HEAD

Alex

On Dec 23, 2012, at 9:16 AM, "Eric Chamberland" 
mailto:eric.chamberl...@giref.ulaval.ca>> 
wrote:

Hi,

we are having many problems similiar to this post:

http://thread.gmane.org/gmane.comp.file-systems.lustre.user/12093

which got no reply...

Anyone have an idea on how to solve this?

Thanks,

Eric

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[lustre-discuss] lustre-monitoring email list

2018-08-23 Thread Alexander I Kulyavtsev
Hi Ken, all.

Ken, could you please create “lustre-monitoring” email list at 
http://lists.lustre.org ?

The purpose of the list is to discuss the development and share experiences of 
lustre monitoring solutions.

Specifically, I would like to bring up the discussion of experiences of using 
influxdata TICK stack (telegraf, influxdb) and grafana for lustre monitoring; 
sharing of dashboard templates; possible alternatives or complementary 
solutions (prometeus).

I feel lustre-discuss is more dedicated to core lustre, general lustre and 
operational/troubleshooting issues. I would like to keep discussion focused on 
monitoring solution implementation in the separate list.

Best regards, Alex.

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] separate SSD only filesystem including HDD

2018-08-31 Thread Alexander I Kulyavtsev
Thanks, Andreas!
I’m looking on similar configuration.

Performance wise, is zfs or ldiskfs recommended on NVMe OSTs?
We are comfortable with zfs on current HDD system, how much penalty we will pay 
for ldiskfs on NVMe?
zfs overhead can be different for high IOPS with NVMe; are there numbers?

Alex.

On 8/31/18, 3:20 AM, "lustre-discuss on behalf of Andreas Dilger" 
 
wrote:

Just to confirm, there is only a single NVMe device in each server node, or 
there is a single server with 24 NVMe devices in it?

Depending on what you want to use the NVMe storage for (e.g. very fast 
short-term scratch == burst buffer) it may be OK to just make a Lustre 
filesystem with each NVMe device a separate OST with no redundancy.  The 
failure rate for these devices is low, and adding redundancy will hurt 
performance.

Cheers, Andreas


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] slow write performance from client to server

2018-10-15 Thread Alexander I Kulyavtsev
You can do a quick check with 2.10.5 client by mounting lustre on MDS if you do 
not have free node to install 2.10.5 client.

Do you have lnet configured with IB or 10GE? LNet defaults to tcp if not set. 
Can it be you are connected through slow management network?

Alex.

On 10/15/18, 6:41 PM, "lustre-discuss on behalf of Riccardo Veraldi" 
 wrote:

Hello,

I have a new Lustre FS version 2.10.5. 18 OSTs 18TB each on 3 OSSes.

I noticed very slow performances couple of MB/sec when RHEL6 Lustre 
clients  2.8.0 are writing to the fielsystem.

Could it be a Lustre version problem server vs client ?

I have no errors either on server or client side  that can debug it 
further...

thanks

Rick


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org

https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=23V5nhLj03jeTboyg6QveA&m=ABK5Lf73Df1JeZ-ryuh87ds4a5qoTk1gcookZ1auOuU&s=3gW-gvhmg4r0oQv3isx4u1P2TBHOzyDeFC-MZoxR68Y&e=


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Command line tool to monitor Lustre I/O ?

2018-12-20 Thread Alexander I Kulyavtsev
1) cerebro + ltop still work.


2) telegraf + inflixdb (collector, time series DB ). Telegraf has input plugins 
for lustre ("lustre2"), zfs,  and many others. Grafana to  plot live data from 
DB. Also, influxDB integrates with Prometheus.

Basically, each component can feed data to different output types through 
plugins; or take data from multiple type of sources so you can use different 
combination for your monitoring stack.


For the simplest tool you may take a look if telegraf from influxdb stack has 
proper output plugin (see influxdata on github).


Alex.


From: lustre-discuss  on behalf of 
Laifer, Roland (SCC) 
Sent: Thursday, December 20, 2018 8:04:55 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Command line tool to monitor Lustre I/O ?

Dear Lustre administrators,

what is a good command line tool to monitor current Lustre metadata and
throughput operations on the local client or server? Up to now we had
used collectl but this no longer works for Lustre 2.10.

Some background about collectl: The Lustre support of collectl was
removed many years ago but up to Lustre 2.7 it was still possible to
monitor metadata and throughput operations on clients. In addition,
there were plugins which also worked for the server side, see
https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.lustre.org_Collectl&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=23V5nhLj03jeTboyg6QveA&m=RpMjhssRJoiP3ANRP6Ze3_nBrliMMPOgQaewqEwRTn4&s=QmdmoNcRR5A0sOgiJimMo0KtZnc-ne44A4YY8aSWbuI&e=
However, it seems that there was no update for these plugins to adapt
them for Lustre 2.10.

Regards,
  Roland
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=23V5nhLj03jeTboyg6QveA&m=RpMjhssRJoiP3ANRP6Ze3_nBrliMMPOgQaewqEwRTn4&s=SXbueuHkxyBAq95D_-bLmBayRVDMtR-l7t0XZfNXEXk&e=
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Migrating files doesn't free space on the OST

2019-01-17 Thread Alexander I Kulyavtsev
- you can re-run command to find files residing on ost to see if files are new 
or old.

- zfs may have snapshots if you ever did snapshots; it takes space.

- removing data or snapshots has some lag to release the blocks (tens of 
minutes) but I guess that is completed by now.

- there are can be orphan objects on OST if you had crashes. On older lustre 
versions if the ost was emptied out you can mount underlying fs as ext4 or zfs; 
set mount to readonly and browse ost objects - you may see if there are some 
orphan objects left. On newer lustre releases you probably can run lfsck 
(lustre scanner).

- to find what hosts / jobs currently writing to lustre you may enable lustre 
jobstats; clear counters and parse stats files in /proc . There was xltop tool 
on github for older versions of lustre not having implemented jobstats but it 
was not updated for a while.

- depending on lustre version you have the implementation of lfs migrate is 
different. The older version copied file with other name to other ost, renamed 
files and removed old file. If migration done on file open for write by 
application the data will not be released until file closed (and data in new 
file are wrong). Recent implementation of migrate does swap of the file objects 
with file layout lock taken. I can not tell if it is safe for active write.

- not releasing space can be a bug - did you check jira on whamcloud? What 
version of lustre do you have? Is it ldiskfs or zfs based? zfs version?


Alex.



From: lustre-discuss  on behalf of 
Jason Williams 
Sent: Wednesday, January 16, 2019 10:25 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Migrating files doesn't free space on the OST


I am trying to migrate files I know are not in use off of the full OST that I 
have using lfs migrate.  I have verified up and down that the files I am moving 
are on that OST and that after the migrate lfs getstripe indeed shows they are 
no longer on that OST since it's disabled in the MDS.


The problem is, the used space on the OST is not going down.


I see one of at least two issues:

- the OST is just not freeing the space for some reason or another ( I don't 
know)

- Or someone is writing to existing files just as fast as I am clearing the 
data (possible, but kind of hard to find)


Is there possibly something else I am missing? Also, does anyone know a good 
way to see if some client is writing to that OST and determine who it is if 
it's more probable that that is what is going on?



--
Jason Williams
Assistant Director
Systems and Data Center Operations.
Maryland Advanced Research Computing Center (MARCC)
Johns Hopkins University
jas...@jhu.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Stop writes for users

2019-05-14 Thread Alexander I Kulyavtsev
There was feature request, and there were corresponding LU:

LU-5703 - Lustre quiesce

LU-7236 - connections on demand


Alex.



From: lustre-discuss  on behalf of 
Robert Redl 
Sent: Tuesday, May 14, 2019 10:36 AM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Stop writes for users


I don't know is this really works for this use case, but newer Lustre versions 
have the possibility to create a write barrier, which is normally part of the 
snapshot process.

Have a look at lctl barrier_freeze.

On 5/14/19 5:25 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote:

On May 13, 2019, at 6:51 PM, Fernando Pérez 
 wrote:

Is there a way to stop file writes for all users or for groups without using 
quotes?

We have a lustre filesystem with corrupted quotes and I need to stop the write 
for all users (or for some users).


There are ways to deactivate OSTs, but those are intended to stop creation of 
new file objects on those OSTs and don’t actually stop writes to existing 
files.  I don’t think that mounting OSTs read-only  (with “mount -t lustre -o 
ro …”) works because Lustre updates some info when it mounts the target (but 
this might be based on old info so I could be wrong).  You could remount all 
the clients read-only, but I don’t know if this is practical for you.

The only other option I can think of would be if there was a client-side 
parameter that could be set via “lctl conf_param” that might cause the clients 
to treat all the targets as read-only.  But if there is such a parameter, I am 
not familiar with it.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


--

Dr. Robert Redl
Scientific Programmer, "Waves to Weather" (SFB/TRR165)
Meteorologisches Institut
Ludwig-Maximilians-Universität München

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre v2.12.3 Availability

2019-07-12 Thread Alexander I Kulyavtsev
Do you plan to have zfs 0.8.x in lustre 2.12.3 ? Or better to ask, do you test 
lustre 2.12.y with zfs 0.8.x ?

Alex.


From: lustre-discuss  on behalf of 
Andreas Dilger 
Sent: Friday, July 12, 2019 4:02:34 PM
To: Peter Jones
Cc: lustre-discuss@lists.lustre.org; Tauferner, Andrew T
Subject: Re: [lustre-discuss] Lustre v2.12.3 Availability

Also, the b2_12 branch is only receiving patches that have already been tested 
on master, so should be relatively stable.  Doing advanced testing of this 
branch before the 2.12.3 release is always welcome.

On Jul 12, 2019, at 14:59, Peter Jones  wrote:
>
> Our current thinking is that it will be sooner rather than later in the 
> quarter. You can follow the progress at 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.whamcloud.com_-3Fp-3Dfs_lustre-2Drelease.git-3Ba-3Dshortlog-3Bh-3Drefs_heads_b2-5F12&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=23V5nhLj03jeTboyg6QveA&m=fa7SWhl8JlHdeINuFLGrCSAucHAGouUf7Q0SHk60xy0&s=y9Z2NVfRufB3D353ANX8t8ORCIouhprrPp1cjrhj2N8&e=
>   and we;ll announce on this list when it’s ready.
>
> On Friday, July 12, 2019 at 11:39 AM "Tauferner, Andrew T" 
>  wrote:
>>
>> What is the outlook for v2.12.3 availability?  The release roadmap shows 
>> something around Q3 ’19.  I’d like a more definitive target if possible.  
>> Thanks.

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=23V5nhLj03jeTboyg6QveA&m=fa7SWhl8JlHdeINuFLGrCSAucHAGouUf7Q0SHk60xy0&s=B93aRlplXs-NgZ5yChJAsIkM4KQppKn7nckgNzXZZMo&e=
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] which zfs for lustre 2.12.3?

2019-09-27 Thread Alexander I Kulyavtsev
What are plans supporting zfs 0.8.x in lustre 2.12.x ?
2.13 is feature branch and 2.12.x is LTS.


IIRC there was note to have a possibility to enable zfs 0.8 for tests in lustre.

How to enable 0.8.2 (when it will be released) in 2.12.3 ? I guess I will need 
to rebuild lustre pointing to proper zfs distro. Will it build? Do I need 
anything else?

I would like to try zfs metadata on SSD vdev on lustre OST.

Best regards, Alex.

From: lustre-discuss  on behalf of 
Degremont, Aurelien 
Sent: Friday, September 27, 2019 4:52 AM
To: Bernd Melchers ; 
lustre-discuss@lists.lustre.org 
Subject: Re: [lustre-discuss] which zfs for lustre 2.12.3?

Hi Bernd,

Lustre is trying to update to support by default ZFS 0.8.1. During the tests of 
Lustre 2.13 development branch, there was an increase of test failures. Default 
ZFS version was reverted to 0.7.13 until this is clear if the problem comes 
from that or something else. Based on that, the official ZFS version for 2.13 
is not yet decided and could or could not be 0.8.1.
If you can wait this is sorted out, you will see what's the conclusion.

If you don't want to wait, 0.7.13 is probably the safest choice for now.
Lustre will move to 0.8 eventually anyway. This is just that this could be 
delayed a bit at short term.


Aurélien

Le 27/09/2019 11:45, « lustre-discuss au nom de Bernd Melchers » 
 
a écrit :

Hi all,
is there already a recommendation for the zfs-on-linux version for the
upcoming lustre 2.12.3? zfs 0.8.x includes the spl part of zfs 0.7.x
and i am not sure how to migrate from 0.7.x to 0.8.x  and if this is the
right choice...

Mit freundlichen Grüßen
Bernd Melchers

--
Archiv- und Backup-Service | fab-serv...@zedat.fu-berlin.de
Freie Universität Berlin   | Tel. +49-30-838-55905
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org

https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=23V5nhLj03jeTboyg6QveA&m=-iOPbJ3kafVOwgeeaMn1f-cbpObfx50nKTc3s-Up6AM&s=fx6lovmQXWG-7JPnUTt-3hdlw9Fl9DqfAs5OWTja98E&e=


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=23V5nhLj03jeTboyg6QveA&m=-iOPbJ3kafVOwgeeaMn1f-cbpObfx50nKTc3s-Up6AM&s=fx6lovmQXWG-7JPnUTt-3hdlw9Fl9DqfAs5OWTja98E&e=
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org