Re: [lustre-discuss] Not able to load lustre modules on Luster client

2018-06-28 Thread vaibhav pol
Hi,

I am using the precompiled rpms from site
(https://downloads.whamcloud.com/public/lustre/lustre-2.11.0/el7.4.1708/).
Modprobe is not able to find the lnet module .  Tried  to insert using insmod.
Following is the  error message.



cfs_array_alloc (err 0)
cfs_get_random_bytes (err 0)
cfs_expr_list_free_list (err 0)
libcfs_register_ioctl (err 0)
cfs_percpt_lock_create (err 0)
cfs_restore_sigs (err 0)
lbug_with_loc (err 0)
libcfs_log_goto (err 0)
libcfs_debug_msg (err 0)
cfs_cpt_table (err 0)
cfs_expr_list_print (err 0)
cfs_trace_copyout_string (err 0)
cfs_cpt_current (err 0)
__x86_indirect_thunk_rax (err 0)
cfs_percpt_free (err 0)
cfs_percpt_lock_free (err 0)
cfs_rand (err 0)
cfs_percpt_unlock (err 0)
cfs_percpt_alloc (err 0)
libcfs_log_return (err 0)
lnet_insert_debugfs (err 0)
cfs_percpt_number (err 0)
cfs_expr_list_match (err 0)
cfs_trimwhite (err 0)
cfs_array_free (err 0)
libcfs_kmemory (err 0)
cfs_trace_copyin_string (err 0)
libcfs_deregister_ioctl (err 0)
cfs_srand (err 0)
cfs_block_allsigs (err 0)
cfs_str2num_check (err 0)
ktime_get_real_seconds (err 0)
__x86_indirect_thunk_rcx (err 0)
cfs_cpt_spread_node (err 0)
cfs_expr_list_values_free (err 0)
libcfs_subsystem_debug (err 0)
cfs_expr_list_free (err 0)
cfs_percpt_lock (err 0)
cfs_gettok (err 0)
cfs_expr_list_parse (err 0)
cfs_cpt_of_node (err 0)
libcfs_debug (err 0)
cfs_cpt_weight (err 0)
lprocfs_call_handler (err 0)
cfs_cpt_distance (err 0)
ktime_get_seconds (err 0)
cfs_cpt_number (err 0)
cfs_expr_list_values (err 0)



Thanks and regards,
Vaibhav Pol
HPC I
Centre for Development of Advanced Computing
Ganeshkhind Road
Pune University Campus
PUNE-Maharashtra
Phone +91-20-25704183 ext: 183
Cell Phone : +919850466409


On June 29, 2018 at 9:36 AM Andreas Dilger  wrote:
> It would be useful to include the actual error messages, in particular which
> module symbols it is complaining about.
>
> Cheers, Andreas
>
> On Jun 28, 2018, at 22:01, vaibhav pol  wrote:
> >
> > Hi,
> > I have installed the Lustre client RPMS (Version 2.11.0) on CentOS 7.4
> > Whenever I tried to insert lnet module it give the unknown symbol message
> > and not able to load modules. Tried to insert forcefully but that is also
> > not working.
>
---
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
---

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Not able to load lustre modules on Luster client

2018-06-28 Thread Andreas Dilger
It would be useful to include the actual error messages, in particular which 
module symbols it is complaining about. 

Cheers, Andreas

On Jun 28, 2018, at 22:01, vaibhav pol  wrote:
> 
> Hi,
>I have installed the Lustre client RPMS (Version 2.11.0) on CentOS 7.4 
> 
> Whenever I tried to  insert lnet module it give the unknown symbol  message 
> and not able to load modules. Tried to insert forcefully but that is also not 
> working.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Not able to load lustre modules on Luster client

2018-06-28 Thread vaibhav pol
Hi,
   I have installed the Lustre client RPMS (Version 2.11.0) on CentOS 7.4
Whenever I tried to  insert lnet module it give the unknown symbol  message and
not able to load modules. Tried to insert forcefully but that is also not
working.






Thanks and regards,
Vaibhav
---
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
---

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

2018-06-28 Thread yu sun
all server and client that fore-mentioned is using netmasks
255.255.255.224.  and they can ping with each other, for example:

root@ml-gpu-ser200.nmg01:~$ ping node28
PING node28 (10.82.143.202) 56(84) bytes of data.
64 bytes from node28 (10.82.143.202): icmp_seq=1 ttl=61 time=0.047 ms
64 bytes from node28 (10.82.143.202): icmp_seq=2 ttl=61 time=0.028 ms

--- node28 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.028/0.037/0.047/0.011 ms
root@ml-gpu-ser200.nmg01:~$ lctl ping node28@o2ib1
failed to ping 10.82.143.202@o2ib1: Input/output error
root@ml-gpu-ser200.nmg01:~$

 and we also have hundreds of GPU machines with different IP Subnet,  they
are in service and it's difficulty to change the network structure. so any
material or document can guide me solve this by don't change network
structure.

Thanks
Yu

Mohr Jr, Richard Frank (Rick Mohr)  于2018年6月29日周五 上午3:30写道:

>
> > On Jun 27, 2018, at 4:44 PM, Mohr Jr, Richard Frank (Rick Mohr) <
> rm...@utk.edu> wrote:
> >
> >
> >> On Jun 27, 2018, at 3:12 AM, yu sun  wrote:
> >>
> >> client:
> >> root@ml-gpu-ser200.nmg01:~$ mount -t lustre 
> >> node28@o2ib1:node29@o2ib1:/project
> /mnt/lustre_data
> >> mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at
> /mnt/lustre_data failed: Input/output error
> >> Is the MGS running?
> >> root@ml-gpu-ser200.nmg01:~$ lctl ping node28@o2ib1
> >> failed to ping 10.82.143.202@o2ib1: Input/output error
> >> root@ml-gpu-ser200.nmg01:~$
> >
> > In your previous email, you said that you could mount lustre on the
> client ml-gpu-ser200.nmg01.  Was that not accurate, or did something change
> in the meantime?
>
> (Note: Received out-of-band reply from Yu stating that there was a typo in
> the previous email, and that client ml-gpu-ser200.nmg01 could not mount
> lustre.  Continuing discussion here so others on list can follow/benefit.)
>
> Yu,
>
> For the IPoIB addresses used on your nodes, what are the subnets (and
> netmasks) that you are using?  It looks like servers use 10.82.143.X and
> clients use 10.82.141.X.  If you are using a 255.255.0.0 netmask, you
> should be fine.  But if you are using 255.255.255.0, then you will run into
> problems.  Lustre expects that all nodes on the same lnet network (o2ib1 in
> your case) will also be on the same IP subnet.
>
> Have you tried running a regular “ping ” command between
> clients and servers to make sure that part is working?
>
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] multiple filesystem in MGS vs folder based ACL ? prons /cons

2018-06-28 Thread Cory Spitz
Oops, right!  Mark, thanks for pointing that out.  Peter, thanks for the update 
on ZFS 0.8.
-Cory

-- 

On 6/28/18, 5:20 PM, "Peter Jones"  wrote:

Correct. ZFS 0.8 will provide the necessary changes in the underlying ZFS.

On 2018-06-28, 3:10 PM, "lustre-discuss on behalf of Mark Hahn" 
 wrote:

> FYI, Project Quotas exist beginning with Lustre 2.10.0.

but not yet for ZFS configs, right?  sorry, I don't remember whether 
the OP mentioned which underfilesystem they were using...
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] SSK configuration

2018-06-28 Thread Andreas Dilger
On Jun 27, 2018, at 06:05, Mark Roper  wrote:
> 
> Hi Jeremy & All,
> I got a request to share the results of my SSK performance investigation with 
> this group from Mark Hahn, which I'm happy to do!  If you're not interested 
> in the impact on throughput for encryption of client-to-mds and client-to-oss 
> communication using the SSK feature, you can stop reading now.
> 
> The tldr is that enabling encryption reduced read throughput 81% and write 
> throughput 53%.  To me this was large but unsurprising given that the server 
> and client nodes are performing software encryption, but I wanted to know the 
> extent of the impact.

If crypto performance is critical for you, you might consider to look at the 
Intel QAT PCI adapter.  While I don't work at Intel anymore, I learned about 
this adapter and it can definitely improve performance for crypto and 
compression workloads, so long as you are limited by the crypto performance.  
It does not quite run fast enough to go full wire speed for IB/OPA networks, 
but at the speeds you are reporting it should be a benefit.

For some benchmark results see:

https://www.servethehome.com/intel-quickassist-technology-and-openssl-setup-insights-and-initial-benchmarks/

Cheers, Andreas

> Details are below!
> 
> Mark
> 
> I set up two Lustre file systems on the same virtual machine configuration in 
> AWS, one OSS VM and one MDS VM in each.  I enabled encryption of client and 
> server communication as follows:
> 
> sudo lctl conf_param scratch.srpc.flavor.default.cli2mdt=skpi
> sudo lctl conf_param scratch.srpc.flavor.default.cli2ost=skpi
> 
> I then ran a single IOR benchmark test aimed at evaluating sustem throughput. 
>  I ran the benchmark on a cluster of 5 clients with the following command:
> srun --tasks-per-node=1 -N 5 ior -a POSIX -o /scratch/demo -z -w -r -F -B -b 
> 1g -t 1m -i 2
> 
> The IOR results for the encrypted FS were:
> Summary:
> api= POSIX
> test filename  = /lustre/demo
> access = file-per-process
> ordering in a file = random offsets
> ordering inter file= no tasks offsets
> clients= 5 (1 per node)
> repetitions= 2
> xfersize   = 1 MiB
> blocksize  = 1 GiB
> aggregate filesize = 5 GiB
> 
> accessbw(MiB/s)  block(KiB) xfer(KiB)  open(s)wr/rd(s)   close(s)   
> total(s)   iter
> ---  -- -           
>    
> write 89.25  10485761024.000.003589   57.36  1.22   
> 57.37  0
> read  95.14  10485761024.000.001952   53.81  0.860698   
> 53.81  0
> remove-  -  -  -  -  -  
> 0.002120   0
> write 88.16  10485761024.000.001738   58.08  1.23   
> 58.08  1
> read  95.22  10485761024.000.001806   53.77  0.989825   
> 53.77  1
> remove-  -  -  -  -  -  
> 0.001562   1
> 
> Max Write: 89.25 MiB/sec (93.59 MB/sec)
> Max Read:  95.22 MiB/sec (99.84 MB/sec)
> 
> Summary of all tests:
> Operation   Max(MiB)   Min(MiB)  Mean(MiB) StdDevMean(s) Test# #Tasks 
> tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API 
> RefNum
> write  89.25  88.16  88.70   0.55   57.72177 0 5 1 2 1 0 
> 1 0 0 1 1073741824 1048576 5368709120 POSIX 0
> read   95.22  95.14  95.18   0.04   53.79227 0 5 1 2 1 0 
> 1 0 0 1 1073741824 1048576 5368709120 POSIX 0
> 
> The IOR results of the unencrypted filesystem were:
> 
> Summary:
> api= POSIX
> test filename  = /scratch/demo
> access = file-per-process
> ordering in a file = random offsets
> ordering inter file= no tasks offsets
> clients= 5 (1 per node)
> repetitions= 2
> xfersize   = 1 MiB
> blocksize  = 1 GiB
> aggregate filesize = 5 GiB
> 
> accessbw(MiB/s)  block(KiB) xfer(KiB)  open(s)wr/rd(s)   close(s)   
> total(s)   iter
> ---  -- -           
>    
> write 189.02 10485761024.000.002521   27.09  0.551086   
> 27.09  0
> read  508.37 10485761024.000.001257   10.07  0.326688   
> 10.07  0
> remove-  -  -  -  -  -  
> 0.002035   0
> write 187.13 10485761024.000.001748   27.36  0.928853   
> 27.36  1
> read  502.72 10485761024.000.001494   10.18  0.356007   
> 10.18  1
> remove-  -  -  -  -  -  
> 0.001705   1
> 
> Max Write: 189.02 MiB/sec (198.20 MB/sec)
> Max Read:  508.37 MiB/sec (533.07 MB/sec)
> 
> Summary of all tests:
> Operation   Max(MiB)   Min(MiB)  Mean(MiB) StdDevMean(s) Test# #Tasks 

Re: [lustre-discuss] multiple filesystem in MGS vs folder based ACL ? prons /cons

2018-06-28 Thread Peter Jones
Correct. ZFS 0.8 will provide the necessary changes in the underlying ZFS.

On 2018-06-28, 3:10 PM, "lustre-discuss on behalf of Mark Hahn" 
 wrote:

> FYI, Project Quotas exist beginning with Lustre 2.10.0.

but not yet for ZFS configs, right?  sorry, I don't remember whether 
the OP mentioned which underfilesystem they were using...
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] multiple filesystem in MGS vs folder based ACL ? prons /cons

2018-06-28 Thread Mark Hahn

FYI, Project Quotas exist beginning with Lustre 2.10.0.


but not yet for ZFS configs, right?  sorry, I don't remember whether 
the OP mentioned which underfilesystem they were using...

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] multiple filesystem in MGS vs folder based ACL ? prons /cons

2018-06-28 Thread Cory Spitz
> Lustre's current/traditional owner-based quota accounting
> is a bit of a drag, but eventually there will be project quotas...

FYI, Project Quotas exist beginning with Lustre 2.10.0.

Info is in the Lustre Ops Manual at 
http://doc.lustre.org/lustre_manual.xhtml#idm140687075905776.

-Cory

-- 

On 6/28/18, 10:46 AM, "lustre-discuss on behalf of Mark Hahn" 
 wrote:

> we have different research groups i am thinking to have one filesystem and
> beneath it using ACL have project folders .

well, the first approach should be to use the normal Unix mechanism: 
owners and groups.  ACLs are usually treated as a way to make exceptions,
since owner/group will capture most of the correct sharing relations.

after all, there's little harm in seeing lots of names in your /project 
mount.  unless someone botches the permissions, only the right people can
can traverse the trees.

> Just curious what re the pros/cons of having multiple filesystem vs single
> filesystem with folders ?

scalability of management.  it's not obviously scalable to manage many
separate filesystems, but very easy to manage thousands of groups
on a single filesystem.

> any advice ?

unless you have to prevent users from even seeing the existence 
of other users, just use a single filesystem.

Lustre's current/traditional owner-based quota accounting 
is a bit of a drag, but eventually there will be project quotas...

regards, mark hahn
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

2018-06-28 Thread Mohr Jr, Richard Frank (Rick Mohr)

> On Jun 27, 2018, at 4:44 PM, Mohr Jr, Richard Frank (Rick Mohr) 
>  wrote:
> 
> 
>> On Jun 27, 2018, at 3:12 AM, yu sun  wrote:
>> 
>> client:
>> root@ml-gpu-ser200.nmg01:~$ mount -t lustre 
>> node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data
>> mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at /mnt/lustre_data 
>> failed: Input/output error
>> Is the MGS running?
>> root@ml-gpu-ser200.nmg01:~$ lctl ping node28@o2ib1
>> failed to ping 10.82.143.202@o2ib1: Input/output error
>> root@ml-gpu-ser200.nmg01:~$
> 
> In your previous email, you said that you could mount lustre on the client 
> ml-gpu-ser200.nmg01.  Was that not accurate, or did something change in the 
> meantime?

(Note: Received out-of-band reply from Yu stating that there was a typo in the 
previous email, and that client ml-gpu-ser200.nmg01 could not mount lustre.  
Continuing discussion here so others on list can follow/benefit.)

Yu,

For the IPoIB addresses used on your nodes, what are the subnets (and netmasks) 
that you are using?  It looks like servers use 10.82.143.X and clients use 
10.82.141.X.  If you are using a 255.255.0.0 netmask, you should be fine.  But 
if you are using 255.255.255.0, then you will run into problems.  Lustre 
expects that all nodes on the same lnet network (o2ib1 in your case) will also 
be on the same IP subnet.

Have you tried running a regular “ping ” command between clients 
and servers to make sure that part is working?

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] what is fsname used for ? and how to get role based security ?

2018-06-28 Thread Andreas Dilger
For #3 you should look at "nodemap" and "subdirectory mount" in the
manual. I agree that simple user permissions should be the starting
point, but if you need more complete isolation (eg. if users are in charge of
VM images), then the following presentation will be useful:

http://wiki.lustre.org/images/5/5c/LUG2018-Multitenancy-Buisson.pdf

Cheers, Andreas

On Jun 27, 2018, at 15:48, Zeeshan Ali Shah 
mailto:javacli...@gmail.com>> wrote:

Dear All,
During mdt it ask for --fsname flag , docs mentioned it is a name for 
filesystem name to which mdt part of .. that is ok but on client when i mount 
/lustre/fsname it mount complete lustre filesystem .

1) Can a mds/mdt serve more than one fsname ?

2) What is the best practice for maintaining different projects/users ? 
including home folders

3) Beside just linux uids, gids is there any better way of Authn/Authz in 
lustre ? for e..g I want a User X to mount only /lustre/fsname/homeX and 
/lustre/fsname/projectX folders .. no way he should mount /lustre/fsname.


any advice ?


/Zeeshan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] New accounts in Jira?

2018-06-28 Thread Cory Spitz

All,

Be advised that you if you had a previous working account with Intel HPDD, 
there is no need to sign up for a new one.  You can use the exact same 
credentials you used with http://jira.hpdd.intel.com.

-Cory


--


From: lustre-discuss  on behalf of 
"Moreno Diego (ID SIS)" 
Date: Thursday, June 28, 2018 at 1:00 AM
To: "lustre-discuss@lists.lustre.org" 
Subject: [lustre-discuss] New accounts in Jira?

Hello,

It doesn’t seem possible to create a new accounts on 
https://jira.whamcloud.com/ unless I’m missing something obvious…

On the login screen it says “Not a member? To request an account, please 
contact your JIRA 
administrators.”
 Unfortunately, that link leads to a dead end: 
https://jira.whamcloud.com/secure/ContactAdministrators!default.jspa

Regards,

---
Diego Moreno
HPC - Scientific IT Services
ETH Zurich

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

2018-06-28 Thread Patrick Farrell

It seems expensive (straight mirroring rather than parity’s) and it’s 
asynchronous from Lustre, so if you’re really just syncing the block devices, 
that can’t guarantee safety on failure.  If I understand what you’re doing, 
when a failure occurs, drbd may be in the middle of syncing the block device.  
That would likely lead to losing data you had already written and possibly to 
corrupting the on disk file system in the mirror.  (Specifically, you’d end up 
copying part of something important before the failure occurred)


From: yu sun 
Sent: Wednesday, June 27, 2018 11:26:52 PM
To: Patrick Farrell
Cc: adil...@whamcloud.com; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

yes, drbd will mirror the content of block devices between hosts synchronously 
or asynchronously. this will provide us data redundancy between hosts.
perhaps we should use zfs + drbd for mdt and ost?

Thanks
Yu

Patrick Farrell mailto:p...@cray.com>> 于2018年6月27日周三 下午9:28写道:

I’m a little puzzled - it can switch, but isn’t the data on the failed disk 
lost...?  That’s why Andreas is suggesting RAID.  Or is drbd doing syncing of 
the disk?  That seems like a really expensive way to get redundancy, since it 
would have to be full online mirroring with all the costs in hardware and 
resource usage that implies...?

ZFS is not a requirement, it generally performs a bit worse than ldiskfs but 
makes it up with impressive features to improve data integrity and related 
things.  Since it sounds like that’s not a huge concern for you, I would stick 
with ldiskfs.  It will likely be a little faster and is easier to set up.


From: lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of yu sun mailto:sunyu1...@gmail.com>>
Sent: Wednesday, June 27, 2018 8:21:43 AM
To: adil...@whamcloud.com
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

yes, you are right, thanks for your great suggestions.

now we are using glusterfs to store training data for ML, and we begin to 
investigate lustre to instead glusterfs for performance.

Firstly, yes we do want to get maximum perforance, you means we should use zfs 
, for example , not each ost/mdt on a separate partitions, for better 
perforance?

Secondly, we dont use any underlying RAID devices,  and we do configure each 
ost on a separate disk, considering that lustre does not provide disk data 
redundancy, we are use drbd + pacemarker + corosync for data redundancy and HA, 
you can see we have configured --servicenode when mkfs.lustre. I dont know how 
reliable is this solution?  it seems ok for our current test, when one disk 
faild, pacemarker can switch to other ost on the other machine automaticly.

we also want to use zfs and I have test zfs by mirror, However, if the physical 
machine down,data on the machine will lost. so we decice use the solution 
listed above.

Now we are testing, and any suggesting is appreciated .
thanks Andreas.

Your
Yu



Andreas Dilger mailto:adil...@whamcloud.com>> 
于2018年6月27日周三 下午7:07写道:
On Jun 27, 2018, at 09:12, yu sun 
mailto:sunyu1...@gmail.com>> wrote:
>
> client:
> root@ml-gpu-ser200.nmg01:~$ mount -t lustre 
> node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data
> mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at /mnt/lustre_data 
> failed: Input/output error
> Is the MGS running?
> root@ml-gpu-ser200.nmg01:~$ lctl ping node28@o2ib1
> failed to ping 10.82.143.202@o2ib1: Input/output error
> root@ml-gpu-ser200.nmg01:~$
>
>
> mgs and mds:
> mkfs.lustre --mgs --reformat --servicenode=node28@o2ib1 
> --servicenode=node29@o2ib1 /dev/sdb1
> mkfs.lustre --fsname=project --mdt --index=0 --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1 --servicenode node28@o2ib1 --servicenode node29@o2ib1 
> --reformat --backfstype=ldiskfs /dev/sdc1

Separate from the LNet issues, it is probably worthwhile to point out some 
issues
with your configuration.  You shouldn't use partitions on the OST and MDT 
devices
if you want to get maximum performance.  That can offset all of the filesystem 
IO
from the RAID/sector alignment and hurt performance.

Secondly, it isn't clear if you are using underlying RAID devices, or if you are
configuring each OST on a separate disk?  It looks like the latter - that you 
are
making each disk a separate OST.  That isn't a good idea for Lustre, since it 
does
not (yet) have any redundancy at higher layers, and any disk failure would 
result
in data loss.  You currently need to have RAID-5/6 or ZFS for each OST/MDT, 
unless
this is a really "scratch" filesystem where you don't care if the data is lost 
and
reformatting the filesystem is OK (i.e. low cost is the primary goal, which is 
fine
also, but not very common).

We are working at Lustre-level data 

Re: [lustre-discuss] New accounts in Jira?

2018-06-28 Thread Peter Jones
Diego

Free sign up got restricted some months ago after a series of spam attacks on 
the wiki (which shares login credentials with JIRA). I’ll see about getting the 
message updated – in the meantime I will reach out to you privately to get this 
sorted out.

Peter

From: lustre-discuss  on behalf of 
"Moreno Diego (ID SIS)" 
Date: Wednesday, June 27, 2018 at 11:00 PM
To: "lustre-discuss@lists.lustre.org" 
Subject: [lustre-discuss] New accounts in Jira?

Hello,

It doesn’t seem possible to create a new accounts on 
https://jira.whamcloud.com/ unless I’m missing something obvious…

On the login screen it says “Not a member? To request an account, please 
contact your JIRA 
administrators.”
 Unfortunately, that link leads to a dead end: 
https://jira.whamcloud.com/secure/ContactAdministrators!default.jspa

Regards,

---
Diego Moreno
HPC - Scientific IT Services
ETH Zurich

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] multiple filesystem in MGS vs folder based ACL ? prons /cons

2018-06-28 Thread Zeeshan Ali Shah
we have different research groups i am thinking to have one filesystem and
beneath it using ACL have project folders .

Just curious what re the pros/cons of having multiple filesystem vs single
filesystem with folders ?

any advice ?

/Zeeshan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] what is fsname used for ? and how to get role based security ?

2018-06-28 Thread Zeeshan Ali Shah
Found the answers:
http://doc.lustre.org/lustre_manual.xhtml#dbdoclet.50438194_88063
http://doc.lustre.org/lustre_manual.xhtml#managingsecurity

Thanks for manual :)


/Zee

On Thu, Jun 28, 2018 at 12:48 AM Zeeshan Ali Shah 
wrote:

> Dear All,
> During mdt it ask for --fsname flag , docs mentioned it is a name for
> filesystem name to which mdt part of .. that is ok but on client when i
> mount /lustre/fsname it mount complete lustre filesystem .
>
> 1) Can a mds/mdt serve more than one fsname ?
>
> 2) What is the best practice for maintaining different projects/users ?
> including home folders
>
> 3) Beside just linux uids, gids is there any better way of Authn/Authz in
> lustre ? for e..g I want a User X to mount only /lustre/fsname/homeX and
> /lustre/fsname/projectX folders .. no way he should mount /lustre/fsname.
>
>
> any advice ?
>
>
> /Zeeshan
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] New accounts in Jira?

2018-06-28 Thread Moreno Diego (ID SIS)
Hello,

It doesn’t seem possible to create a new accounts on 
https://jira.whamcloud.com/ unless I’m missing something obvious…

On the login screen it says “Not a member? To request an account, please 
contact your JIRA 
administrators.”
 Unfortunately, that link leads to a dead end: 
https://jira.whamcloud.com/secure/ContactAdministrators!default.jspa

Regards,

---
Diego Moreno
HPC - Scientific IT Services
ETH Zurich

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org