Patrick,
once upon a time there was "file join" functionality in Lustre that was ancient 
and complex, and was finally removed in 2009.  There are still a few remnants 
of this like "MDS_OPEN_JOIN_FILE" and "LOV_MAGIC_JOIN_V1" defined, but unused.  
 That functionality long predated composite file layouts (PFL, FLR), and used 
an external llog file *per file* to declare a series of other files that 
described the layout.  It was extremely fragile and complex and thankfully 
never got into widespread usage.

I think with the advent of composite file layout that it should be _possible_ 
to implement this kind of functionality purely with layout changes, similar to 
"lfs migrate" doing layout swap, or "lfs mirror extend" merging the layout of a 
victim file into another file to create a mirror.

My expectation is that "join" of two files would be handled at the file EOF and 
*not* at the layout boundary.  Based on the original description from Sven, I'd 
think that small gaps in the file (e.g. 4KB for page alignment, 64KB for 
minimum layout alignment, or 1MB for stripe alignment) would be OK, but tens or 
hundreds of MB holes would be inefficient for processing.

My guess, based on similar requests I've seen previously, and Sven's email 
address, is that this relates to merging video streams from different files 
into a single file?

Sven,
while I think it is possible to implement this in Lustre, I'd have to ask what 
requirements are driving your request?  Is this just something you want to 
test, or is there some real-world usage demand for this (e.g. specific 
application workload, usage in some popular library, etc)?

It seems possible to do this with layout manipulation similar to "lfs mirror 
extend -f" (i.e. a kind of "super file append" mechanism) but would be 
similarly destructive to the "victim" files appended to the original one, and 
would definitely not be something that could be done while the "original" file 
was actively in use.  Essentially, instead of "lfs mirror extend" just 
appending the victim layout to the existing file, it would need to also modify 
the original layout to truncate the layout at EOF, then offset the extent 
ranges in the victim layout by the current file size (rounded up to at least 
64KB multiples, but preferably 1MB multiples to maintain RAID alignment).

Is this something that you would be willing to work on with guidance for the 
implementation details, or a feature request that you hope someone else will 
implement?

Cheers, Andreas

On Mar 29, 2023, at 07:41, Patrick Farrell via lustre-discuss 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> wrote:

Sven,

The "combining layouts without any data movement" part isn't currently 
possible.  It's probably possible in theory, but it's never been implemented.  
(I'm curious what your use case is?)

Even allowing for data movement, there's no tool to do this for you.  Depending 
what you mean by combining, it's possible to do this with Linux tools (see the 
end of my note), but you're going to have data copying.

It's a bit of an odd requirement, with some inherent questions - For example, 
file layouts generally go to infinity, because if they don't, you will get IO 
errors when you 'run off the end', ie, go past the defined layout, so the last 
component is usually defined to go to infinity.

That poses obvious questions when combining files.

If you're looking to combine files with layouts that do not go to infinity, 
then it's at least straightforward to see how you'd concatenate them.  But 
presumably the data in each file doesn't go to the very end of the layout?  So 
do you want the empty parts of the layout included?

Say file 1 is 10 MiB in size but the layout goes to 20 MiB (again, layouts 
normally should go to infinity) and file 2 is also 10 MiB in size but the 
layout goes to, say, 15 MiB.  Should the result look like this?

Layout: 1 1 1 1 1 1 1 ... 20 MiB 2 2 2 2 2 2 .... 35 MiB

With data from 0-10 MiB and 20 - 30 MiB.

That's something you'd have to write a tool for, so it could write the data at 
your specified offset for putting in the second file (and third, etc...).  You 
could also do something like:

lfs setstripe [your layout] combined file; cat file 1 > combined file; truncate 
[combined file] 20 MiB (the end of the file 1 layout); cat file 2 > 
combined_file", etc.

So, you definitely can't avoid data copying here.  But that's how you could do 
it with simple Linux tools (which you could probably have drawn up yourself :)).

-Patrick

________________________________
From: lustre-discuss 
<lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Sven Willner 
<sven.will...@mpimet.mpg.de<mailto:sven.will...@mpimet.mpg.de>>
Sent: Wednesday, March 29, 2023 7:58 AM
To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>
Subject: [lustre-discuss] Joining files

[You don't often get email from 
sven.will...@mpimet.mpg.de<mailto:sven.will...@mpimet.mpg.de>. Learn why this 
is important at https://aka.ms/LearnAboutSenderIdentification ]

Dear all,

I am looking for a way to join/merge/concatenate several files into one, whose 
layout is just the concatenation of the layouts of the respective files - 
ideally without any copying/moving on the data side (even if this would result 
in "holes" in the joined file).

I would very much appreciate any hints to tools or ideas of how to achieve such 
a join. As I understand that has been a `join` command for `lfs`, which is now 
deprecated (however, I am not sure if a use case like mine has been its purpose 
or why it has been deprecated).

Thanks a lot!
Best regards,
Sven

--
Dr. Sven Willner
Scientific Computing Lab (SCLab)
Max Planck Institute for Meteorology
Bundesstraße 53, D-20146 Hamburg, Germany
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
  • [... Sven Willner
    • ... Patrick Farrell via lustre-discuss
      • ... Andreas Dilger via lustre-discuss
        • ... Sven Willner
          • ... Andreas Dilger via lustre-discuss
            • ... Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
              • ... gael.delb...@cea.fr
                • ... Sven Willner
    • ... Patrick Farrell via lustre-discuss

Reply via email to