There are several assumptions in your statement that are not accurate: 1. different clients are *already* writing to a single networked file system This is not true. We have a final process called our Merge that takes the individual pieces and recombines them into a new file. Not very efficient. If I have 1000 files that are each 1MB in size and I need a single 1GB file, I need to read in 1000 files and write out the contents of the 1000 files to the 1 new file. Then I have to delete the 1000 old files. Now, assume that this needs to be done 100 times to generate 100 separate 1GB files. This has a huge processing time overhead, not to mention impact on the file server while our cluster is still doing other operations for the remaining nodes. In the near future, it will not be uncommon for us to end up with TBs worth of data over the numerous data files.
2. Since they are contiguous blocks in a known order This is not true. File Part1 could be 5K and file Part2 could be 1MB. However, I can guarantee that Part2 belongs directly after Part1. The actual data inside of them could differ based on many numerous factors during analysis Lustre is one example that can do this. Our internal version of OpenVMS running on both Alpha and Itanium can also do this. We are looking for something that can be ran using more common O/S like Windows and Linux. I will look into the links you provided. Mark Brackett <[EMAIL PROTECTED]> Sent by: "Discussion of advanced .NET topics." <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM> 02/05/2008 04:07 PM Please respond to "Discussion of advanced .NET topics." <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM> To ADVANCED-DOTNET@DISCUSS.DEVELOP.COM cc Subject Re: Join/Merge multiple files together I'm curious what other OS/file systems have this capability natively...it's an interesting, though edge-case, optimization. A quick Google only turns up Lustre http://ieeexplore.ieee.org/iel5/4215348/4215349/04215390.pdf?isnumber=3D4= 2 15349&prod=3DCNF&arnumber=3D4215390&arSt=3D267&ared=3D274&arAuthor=3DYu%2= C+Weikuan %3B+Vetter%2C+Jeffrey%3B+Canon%2C+R.+Shane%3B+Jiang%2C+Song as having "..an innovative file joining feature that joins files in place...".=20 That being said, implicit in your question is that the different clients are *already* writing to a single networked file system. Why not just have them all write to a single file to begin with? Since they are contiguous blocks in a known order, it'd seem you could get away with just having them start at different offsets. I believe that should all be doable with fairly simple managed or p/invoked code. Alternatively, if you can hook into the OS on the reading box - you can fake it without requiring source changes to the reading app. If you have that possibility, I'd start by looking at (probably in this order): 1. WinFUSE http://www.suchwerk.net/sodcms_FUSE_for_WINDOWS.htm which purports to have a userland file system in managed code 2. Windows File System Filter Drivers http://www.microsoft.com/whdc/driver/filterdrv/default.mspx 3. Windows Installable File System Kit http://www.microsoft.com/whdc/devtools/ifskit/default.mspx=20 Your goal there would just be to intercept a request for "bigfile001.dat" and read "file001-1.dat", "file001-2.dat", etc.=20 --Mark Brackett > -----Original Message----- > From: Discussion of advanced .NET topics. [mailto:ADVANCED- > [EMAIL PROTECTED] On Behalf Of Michael Sharpe > Sent: Tuesday, February 05, 2008 12:24 PM > To: ADVANCED-DOTNET@DISCUSS.DEVELOP.COM > Subject: Re: [ADVANCED-DOTNET] Join/Merge multiple files together >=20 > Sadly, no. We have no control over the consumer application of the > required data file. It cannot accept the data in chunks or pieces. > It > cannot accept a data stream either so providing it data on the fly is > out > of the question. It can only accept it as a single data file. > Otherwise > we would have pushed for this a long time ago.... >=20 > It seems like this is quite a difficult problem to address. Maybe > Windows > is not the best platform to be running for our needs. We have seen > other > O/S that can do this very easily. >=20 >=20 >=20 >=20 >=20 > John Brett <[EMAIL PROTECTED]> > Sent by: "Discussion of advanced .NET topics." > <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM> > 02/05/2008 11:02 AM > Please respond to > "Discussion of advanced .NET topics." > <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM> >=20 >=20 > To > ADVANCED-DOTNET@DISCUSS.DEVELOP.COM > cc >=20 > Subject > Re: Join/Merge multiple files together >=20 >=20 >=20 >=20 >=20 >=20 > > Is it at all possible to combine 2 (or more) data files together > without > > A) opening the data files to read and B) creating a "new" file out of > > them? >=20 > Can you change the problem to make it easier to solve? > Depending upon what you need to do with the end-product, can you change > the reading application to use an index or other mechanism to indicate > the > set of files to read? Can you create a file-reader shim that > aggregates the files on the fly whilst reading? Can you change the > application that generates these files to append to a single file? > Just trying to understand why you have to have exactly one file to > work with, since that seems to be causing you difficulties. >=20 > John >=20 > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > This list is hosted by DevelopMentor(r) http://www.develop.com >=20 > View archives and manage your subscription(s) at > http://discuss.develop.com >=20 >=20 > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > This list is hosted by DevelopMentor(r) http://www.develop.com >=20 > View archives and manage your subscription(s) at > http://discuss.develop.com =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This list is hosted by DevelopMentor=AE http://www.develop.com View archives and manage your subscription(s) at http://discuss.develop.com =================================== This list is hosted by DevelopMentorĀ® http://www.develop.com View archives and manage your subscription(s) at http://discuss.develop.com