I can not really help with your question.

But your description of your application reminds me of a software
project I was working on several years ago. Do you happen to know
SMART?

My reaction to that software project was to try to skip the merging
part, see [1] and [2] , but I think, if I recall correctly, we ended
up with using memorymapped files to merge the little files into 1 big
file.

// Ryan

[1] http://tech.groups.yahoo.com/group/win_tech_off_topic/message/32444
[2] http://tech.groups.yahoo.com/group/win_tech_off_topic/message/32355

On Feb 6, 2008 12:04 AM, Michael Sharpe <[EMAIL PROTECTED]> wrote:
> There are several assumptions in your statement that are not accurate:
>
> 1.  different clients are *already* writing to a single networked file
> system
> This is not true.  We have a final process called our Merge that takes the
> individual pieces and recombines them into a new file.  Not very
> efficient.  If I have 1000 files that are each 1MB in size and I need a
> single 1GB file, I need to read in 1000 files and write out the contents
> of the 1000 files to the 1 new file.  Then I have to delete the 1000 old
> files.  Now, assume that this needs to be done 100 times to generate 100
> separate 1GB files.  This has a huge processing time overhead, not to
> mention impact on the file server while our cluster is still doing other
> operations for the remaining nodes.  In the near future, it will not be
> uncommon for us to end up with TBs worth of data over the numerous data
> files.
>
> 2.  Since they are contiguous blocks in a known order
> This is not true.  File Part1 could be 5K and file Part2 could be 1MB.
> However, I can guarantee that Part2 belongs directly after Part1.  The
> actual data inside of them could differ based on many numerous factors
> during analysis
>
>
> Lustre is one example that can do this.  Our internal version of OpenVMS
> running on both Alpha and Itanium can also do this.  We are looking for
> something that can be ran using more common O/S like Windows and Linux.  I
> will look into the links you provided.
>
>
>
>
>
>
>
> Mark Brackett <[EMAIL PROTECTED]>
> Sent by: "Discussion of advanced .NET topics."
> <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM>
> 02/05/2008 04:07 PM
> Please respond to
> "Discussion of advanced .NET topics."
> <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM>
>
>
> To
> ADVANCED-DOTNET@DISCUSS.DEVELOP.COM
> cc
>
> Subject
> Re: Join/Merge multiple files together
>
>
>
>
>
>
> I'm curious what other OS/file systems have this capability
> natively...it's an interesting, though edge-case, optimization. A quick
> Google only turns up Lustre
> http://ieeexplore.ieee.org/iel5/4215348/4215349/04215390.pdf?isnumber=3D4=
> 2
> 15349&prod=3DCNF&arnumber=3D4215390&arSt=3D267&ared=3D274&arAuthor=3DYu%2=
> C+Weikuan
> %3B+Vetter%2C+Jeffrey%3B+Canon%2C+R.+Shane%3B+Jiang%2C+Song as having
> "..an innovative file joining feature that joins files in place...".=20
>
> That being said, implicit in your question is that the different clients
> are *already* writing to a single networked file system. Why not just
> have them all write to a single file to begin with? Since they are
> contiguous blocks in a known order, it'd seem you could get away with
> just having them start at different offsets. I believe that should all
> be doable with fairly simple managed or p/invoked code.
>
> Alternatively, if you can hook into the OS on the reading box - you can
> fake it without requiring source changes to the reading app. If you have
> that possibility, I'd start by looking at (probably in this order):
> 1. WinFUSE http://www.suchwerk.net/sodcms_FUSE_for_WINDOWS.htm which
> purports to have a userland file system in managed code
> 2. Windows File System Filter Drivers
> http://www.microsoft.com/whdc/driver/filterdrv/default.mspx
> 3. Windows Installable File System Kit
> http://www.microsoft.com/whdc/devtools/ifskit/default.mspx=20
>
> Your goal there would just be to intercept a request for
> "bigfile001.dat" and read "file001-1.dat", "file001-2.dat", etc.=20
>
>
> --Mark Brackett
>
> > -----Original Message-----
> > From: Discussion of advanced .NET topics. [mailto:ADVANCED-
> > [EMAIL PROTECTED] On Behalf Of Michael Sharpe
> > Sent: Tuesday, February 05, 2008 12:24 PM
> > To: ADVANCED-DOTNET@DISCUSS.DEVELOP.COM
> > Subject: Re: [ADVANCED-DOTNET] Join/Merge multiple files together
> >=20
> > Sadly, no.  We have no control over the consumer application of the
> > required data file.   It cannot accept the data in chunks or pieces.
> > It
> > cannot accept a data stream either so providing it data on the fly is
> > out
> > of the question.  It can only accept it as a single data file.
> > Otherwise
> > we would have pushed for this a long time ago....
> >=20
> > It seems like this is quite a difficult problem to address.  Maybe
> > Windows
> > is not the best platform to be running for our needs.  We have seen
> > other
> > O/S that can do this very easily.
> >=20
> >=20
> >=20
> >=20
> >=20
> > John Brett <[EMAIL PROTECTED]>
> > Sent by: "Discussion of advanced .NET topics."
> > <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM>
> > 02/05/2008 11:02 AM
> > Please respond to
> > "Discussion of advanced .NET topics."
> > <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM>
> >=20
> >=20
> > To
> > ADVANCED-DOTNET@DISCUSS.DEVELOP.COM
> > cc
> >=20
> > Subject
> > Re: Join/Merge multiple files together
> >=20
> >=20
> >=20
> >=20
> >=20
> >=20
> > > Is it at all possible to combine 2 (or more) data files together
> > without
> > > A) opening the data files to read and B) creating a "new" file out
> of
> > > them?
> >=20
> > Can you change the problem to make it easier to solve?
> > Depending upon what you need to do with the end-product, can you
> change
> > the reading application to use an index or other mechanism to indicate
> > the
> > set of files to read? Can you create a file-reader shim that
> > aggregates the files on the fly whilst reading? Can you change the
> > application that generates these files to append to a single file?
> > Just trying to understand why you have to have exactly one file to
> > work with, since that seems to be causing you difficulties.
> >=20
> > John
> >=20
> > =
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > This list is hosted by DevelopMentor(r)  http://www.develop.com
> >=20
> > View archives and manage your subscription(s) at
> > http://discuss.develop.com
> >=20
> >=20
> > =
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > This list is hosted by DevelopMentor(r)  http://www.develop.com
> >=20
> > View archives and manage your subscription(s) at
> > http://discuss.develop.com
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> This list is hosted by DevelopMentor=AE  http://www.develop.com
>
>
> View archives and manage your subscription(s) at
> http://discuss.develop.com
>
>
> ===================================
> This list is hosted by DevelopMentor(R)  http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>

===================================
This list is hosted by DevelopMentorĀ®  http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Reply via email to