I can not really help with your question. But your description of your application reminds me of a software project I was working on several years ago. Do you happen to know SMART?
My reaction to that software project was to try to skip the merging part, see [1] and [2] , but I think, if I recall correctly, we ended up with using memorymapped files to merge the little files into 1 big file. // Ryan [1] http://tech.groups.yahoo.com/group/win_tech_off_topic/message/32444 [2] http://tech.groups.yahoo.com/group/win_tech_off_topic/message/32355 On Feb 6, 2008 12:04 AM, Michael Sharpe <[EMAIL PROTECTED]> wrote: > There are several assumptions in your statement that are not accurate: > > 1. different clients are *already* writing to a single networked file > system > This is not true. We have a final process called our Merge that takes the > individual pieces and recombines them into a new file. Not very > efficient. If I have 1000 files that are each 1MB in size and I need a > single 1GB file, I need to read in 1000 files and write out the contents > of the 1000 files to the 1 new file. Then I have to delete the 1000 old > files. Now, assume that this needs to be done 100 times to generate 100 > separate 1GB files. This has a huge processing time overhead, not to > mention impact on the file server while our cluster is still doing other > operations for the remaining nodes. In the near future, it will not be > uncommon for us to end up with TBs worth of data over the numerous data > files. > > 2. Since they are contiguous blocks in a known order > This is not true. File Part1 could be 5K and file Part2 could be 1MB. > However, I can guarantee that Part2 belongs directly after Part1. The > actual data inside of them could differ based on many numerous factors > during analysis > > > Lustre is one example that can do this. Our internal version of OpenVMS > running on both Alpha and Itanium can also do this. We are looking for > something that can be ran using more common O/S like Windows and Linux. I > will look into the links you provided. > > > > > > > > Mark Brackett <[EMAIL PROTECTED]> > Sent by: "Discussion of advanced .NET topics." > <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM> > 02/05/2008 04:07 PM > Please respond to > "Discussion of advanced .NET topics." > <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM> > > > To > ADVANCED-DOTNET@DISCUSS.DEVELOP.COM > cc > > Subject > Re: Join/Merge multiple files together > > > > > > > I'm curious what other OS/file systems have this capability > natively...it's an interesting, though edge-case, optimization. A quick > Google only turns up Lustre > http://ieeexplore.ieee.org/iel5/4215348/4215349/04215390.pdf?isnumber=3D4= > 2 > 15349&prod=3DCNF&arnumber=3D4215390&arSt=3D267&ared=3D274&arAuthor=3DYu%2= > C+Weikuan > %3B+Vetter%2C+Jeffrey%3B+Canon%2C+R.+Shane%3B+Jiang%2C+Song as having > "..an innovative file joining feature that joins files in place...".=20 > > That being said, implicit in your question is that the different clients > are *already* writing to a single networked file system. Why not just > have them all write to a single file to begin with? Since they are > contiguous blocks in a known order, it'd seem you could get away with > just having them start at different offsets. I believe that should all > be doable with fairly simple managed or p/invoked code. > > Alternatively, if you can hook into the OS on the reading box - you can > fake it without requiring source changes to the reading app. If you have > that possibility, I'd start by looking at (probably in this order): > 1. WinFUSE http://www.suchwerk.net/sodcms_FUSE_for_WINDOWS.htm which > purports to have a userland file system in managed code > 2. Windows File System Filter Drivers > http://www.microsoft.com/whdc/driver/filterdrv/default.mspx > 3. Windows Installable File System Kit > http://www.microsoft.com/whdc/devtools/ifskit/default.mspx=20 > > Your goal there would just be to intercept a request for > "bigfile001.dat" and read "file001-1.dat", "file001-2.dat", etc.=20 > > > --Mark Brackett > > > -----Original Message----- > > From: Discussion of advanced .NET topics. [mailto:ADVANCED- > > [EMAIL PROTECTED] On Behalf Of Michael Sharpe > > Sent: Tuesday, February 05, 2008 12:24 PM > > To: ADVANCED-DOTNET@DISCUSS.DEVELOP.COM > > Subject: Re: [ADVANCED-DOTNET] Join/Merge multiple files together > >=20 > > Sadly, no. We have no control over the consumer application of the > > required data file. It cannot accept the data in chunks or pieces. > > It > > cannot accept a data stream either so providing it data on the fly is > > out > > of the question. It can only accept it as a single data file. > > Otherwise > > we would have pushed for this a long time ago.... > >=20 > > It seems like this is quite a difficult problem to address. Maybe > > Windows > > is not the best platform to be running for our needs. We have seen > > other > > O/S that can do this very easily. > >=20 > >=20 > >=20 > >=20 > >=20 > > John Brett <[EMAIL PROTECTED]> > > Sent by: "Discussion of advanced .NET topics." > > <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM> > > 02/05/2008 11:02 AM > > Please respond to > > "Discussion of advanced .NET topics." > > <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM> > >=20 > >=20 > > To > > ADVANCED-DOTNET@DISCUSS.DEVELOP.COM > > cc > >=20 > > Subject > > Re: Join/Merge multiple files together > >=20 > >=20 > >=20 > >=20 > >=20 > >=20 > > > Is it at all possible to combine 2 (or more) data files together > > without > > > A) opening the data files to read and B) creating a "new" file out > of > > > them? > >=20 > > Can you change the problem to make it easier to solve? > > Depending upon what you need to do with the end-product, can you > change > > the reading application to use an index or other mechanism to indicate > > the > > set of files to read? Can you create a file-reader shim that > > aggregates the files on the fly whilst reading? Can you change the > > application that generates these files to append to a single file? > > Just trying to understand why you have to have exactly one file to > > work with, since that seems to be causing you difficulties. > >=20 > > John > >=20 > > = > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > This list is hosted by DevelopMentor(r) http://www.develop.com > >=20 > > View archives and manage your subscription(s) at > > http://discuss.develop.com > >=20 > >=20 > > = > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > This list is hosted by DevelopMentor(r) http://www.develop.com > >=20 > > View archives and manage your subscription(s) at > > http://discuss.develop.com > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > This list is hosted by DevelopMentor=AE http://www.develop.com > > > View archives and manage your subscription(s) at > http://discuss.develop.com > > > =================================== > This list is hosted by DevelopMentor(R) http://www.develop.com > > View archives and manage your subscription(s) at http://discuss.develop.com > =================================== This list is hosted by DevelopMentorĀ® http://www.develop.com View archives and manage your subscription(s) at http://discuss.develop.com