I have an issue with the lhsm_archive where several FIDs produce this error 
message during the archive:

2019/06/13 16:27:31 [32655/21] Policy | Missing attribute 'fullpath' for 
evaluating boolean expression on [0x200128c87:0x1db1d:0x0]
2019/06/13 16:27:31 [32655/21] Policy | [0x200128c87:0x1db1d:0x0]: attribute is 
missing for checking ignore_fileclass rule
2019/06/13 16:27:31 [32655/21] lhsm_archive | Warning: cannot determine if 
entry  is whitelisted: skipping it.
2019/06/13 16:27:31 [32655/21] Policy | [0x200128c87:0x1db1d:0x0]: attribute is 
missing for checking fileset 'scratch'

The reason for this error is that the entry exists in the database but does not 
have a path. And because I have a fileclass that is based on a tree, the check 
is failing:

FileClass scratch {
        definition {
            tree == "/lustre/scratch"
        }
}

The entry is in the database – the only entry that is missing is the NAMES 
entry. But it must have existed at some stage since the STRIPE entries are 
there.


MariaDB [robinhood_lustre]> select * from ENTRIES where 
id='0x200128c87:0x1db1d:0x0';
+-------------------------+----------+---------+--------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-------------+--------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------+
| id                      | uid      | gid     | size   | blocks | 
creation_time | last_access | last_mod   | last_mdchange | type | mode | nlink 
| md_update  | invalid | fileclass   | class_update | lhsm_status | lhsm_archid 
| lhsm_norels | lhsm_noarch | lhsm_lstarc | lhsm_lstrst | lhsm_uuid |
+-------------------------+----------+---------+--------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-------------+--------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------+
| 0x200128c87:0x1db1d:0x0 | n9614532 | default | 815808 |   1600 |    
1560317119 |  1560317125 | 1560317113 |    1560317119 | file |  484 |     0 | 
1560323027 |       0 | +std_files+ |   1560407251 | modified    |           1 | 
          0 |           0 |           0 |           0 | NULL      |
+-------------------------+----------+---------+--------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-------------+--------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------+
1 row in set (0.00 sec)

MariaDB [robinhood_lustre]> select * from NAMES where 
id='0x200128c87:0x1db1d:0x0';
Empty set (0.00 sec)

MariaDB [robinhood_lustre]> select * from STRIPE_INFO where 
id='0x200128c87:0x1db1d:0x0';
+-------------------------+-----------+--------------+-------------+-----------+
| id                      | validator | stripe_count | stripe_size | pool_name |
+-------------------------+-----------+--------------+-------------+-----------+
| 0x200128c87:0x1db1d:0x0 |         0 |            1 |     1048576 |           |
+-------------------------+-----------+--------------+-------------+-----------+
1 row in set (0.00 sec)

MariaDB [robinhood_lustre]> select * from STRIPE_ITEMS where 
id='0x200128c87:0x1db1d:0x0';
+-------------------------+--------------+--------+----------------------+
| id                      | stripe_index | ostidx | details              |
+-------------------------+--------------+--------+----------------------+
| 0x200128c87:0x1db1d:0x0 |            0 |     12 |     �              |
+-------------------------+--------------+--------+----------------------+
1 row in set (0.00 sec)


That particular file does not exist any more in Lustre:
[root@robinhood robinhood]# lfs fid2path /lustre 0x200128c87:0x1db1d:0x0
fid2path: error on FID 0x200128c87:0x1db1d:0x0: No such file or directory

So something did go wrong that did not delete the entry out of the database.

This happens now fairly regularly. I seem to accumulate these errors regularly. 
A restart of robinhood does seem to clear out these errors, but new ones 
accumulate.

I am running robinhood 3.1.5 on RHEL 7.6 and Lustre 2.10.5.

My assumption at that stage is that I seem to be hitting some timing issue, 
where a file gets deleted while it is processed by the changelog – resulting in 
an incomplete database entry.
I have not seen that in the past, but I have not changed anything for a while, 
so it is odd that this appeared.

Is this anything someone has seen before?
Any theory how that could happen?


Thanks,
Gerald




Gerald Hofer
Technical Consultant
High Performance Computing
HPE Pointnext South Pacific Delivery

+61 418 888 567  Mobile

Brisbane/Queensland
hpe.com/pointnext


[HPE logo]<http://www.hpe.com/>

_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to