I have an issue with the lhsm_archive where several FIDs produce this error
message during the archive:
2019/06/13 16:27:31 [32655/21] Policy | Missing attribute 'fullpath' for
evaluating boolean expression on [0x200128c87:0x1db1d:0x0]
2019/06/13 16:27:31 [32655/21] Policy | [0x200128c87:0x1db1d:0x0]: attribute is
missing for checking ignore_fileclass rule
2019/06/13 16:27:31 [32655/21] lhsm_archive | Warning: cannot determine if
entry is whitelisted: skipping it.
2019/06/13 16:27:31 [32655/21] Policy | [0x200128c87:0x1db1d:0x0]: attribute is
missing for checking fileset 'scratch'
The reason for this error is that the entry exists in the database but does not
have a path. And because I have a fileclass that is based on a tree, the check
is failing:
FileClass scratch {
definition {
tree == "/lustre/scratch"
}
}
The entry is in the database – the only entry that is missing is the NAMES
entry. But it must have existed at some stage since the STRIPE entries are
there.
MariaDB [robinhood_lustre]> select * from ENTRIES where
id='0x200128c87:0x1db1d:0x0';
+-------------------------+----------+---------+--------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-------------+--------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------+
| id | uid | gid | size | blocks |
creation_time | last_access | last_mod | last_mdchange | type | mode | nlink
| md_update | invalid | fileclass | class_update | lhsm_status | lhsm_archid
| lhsm_norels | lhsm_noarch | lhsm_lstarc | lhsm_lstrst | lhsm_uuid |
+-------------------------+----------+---------+--------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-------------+--------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------+
| 0x200128c87:0x1db1d:0x0 | n9614532 | default | 815808 | 1600 |
1560317119 | 1560317125 | 1560317113 | 1560317119 | file | 484 | 0 |
1560323027 | 0 | +std_files+ | 1560407251 | modified | 1 |
0 | 0 | 0 | 0 | NULL |
+-------------------------+----------+---------+--------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-------------+--------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------+
1 row in set (0.00 sec)
MariaDB [robinhood_lustre]> select * from NAMES where
id='0x200128c87:0x1db1d:0x0';
Empty set (0.00 sec)
MariaDB [robinhood_lustre]> select * from STRIPE_INFO where
id='0x200128c87:0x1db1d:0x0';
+-------------------------+-----------+--------------+-------------+-----------+
| id | validator | stripe_count | stripe_size | pool_name |
+-------------------------+-----------+--------------+-------------+-----------+
| 0x200128c87:0x1db1d:0x0 | 0 | 1 | 1048576 | |
+-------------------------+-----------+--------------+-------------+-----------+
1 row in set (0.00 sec)
MariaDB [robinhood_lustre]> select * from STRIPE_ITEMS where
id='0x200128c87:0x1db1d:0x0';
+-------------------------+--------------+--------+----------------------+
| id | stripe_index | ostidx | details |
+-------------------------+--------------+--------+----------------------+
| 0x200128c87:0x1db1d:0x0 | 0 | 12 | � |
+-------------------------+--------------+--------+----------------------+
1 row in set (0.00 sec)
That particular file does not exist any more in Lustre:
[root@robinhood robinhood]# lfs fid2path /lustre 0x200128c87:0x1db1d:0x0
fid2path: error on FID 0x200128c87:0x1db1d:0x0: No such file or directory
So something did go wrong that did not delete the entry out of the database.
This happens now fairly regularly. I seem to accumulate these errors regularly.
A restart of robinhood does seem to clear out these errors, but new ones
accumulate.
I am running robinhood 3.1.5 on RHEL 7.6 and Lustre 2.10.5.
My assumption at that stage is that I seem to be hitting some timing issue,
where a file gets deleted while it is processed by the changelog – resulting in
an incomplete database entry.
I have not seen that in the past, but I have not changed anything for a while,
so it is odd that this appeared.
Is this anything someone has seen before?
Any theory how that could happen?
Thanks,
Gerald
Gerald Hofer
Technical Consultant
High Performance Computing
HPE Pointnext South Pacific Delivery
+61 418 888 567 Mobile
Brisbane/Queensland
hpe.com/pointnext
[HPE logo]<http://www.hpe.com/>
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support