There are several assumptions in your statement that are not accurate:

1.  different clients are *already* writing to a single networked file
system
This is not true.  We have a final process called our Merge that takes the
individual pieces and recombines them into a new file.  Not very
efficient.  If I have 1000 files that are each 1MB in size and I need a
single 1GB file, I need to read in 1000 files and write out the contents
of the 1000 files to the 1 new file.  Then I have to delete the 1000 old
files.  Now, assume that this needs to be done 100 times to generate 100
separate 1GB files.  This has a huge processing time overhead, not to
mention impact on the file server while our cluster is still doing other
operations for the remaining nodes.  In the near future, it will not be
uncommon for us to end up with TBs worth of data over the numerous data
files.

2.  Since they are contiguous blocks in a known order
This is not true.  File Part1 could be 5K and file Part2 could be 1MB.
However, I can guarantee that Part2 belongs directly after Part1.  The
actual data inside of them could differ based on many numerous factors
during analysis


Lustre is one example that can do this.  Our internal version of OpenVMS
running on both Alpha and Itanium can also do this.  We are looking for
something that can be ran using more common O/S like Windows and Linux.  I
will look into the links you provided.







Mark Brackett <[EMAIL PROTECTED]>
Sent by: "Discussion of advanced .NET topics."
<ADVANCED-DOTNET@DISCUSS.DEVELOP.COM>
02/05/2008 04:07 PM
Please respond to
"Discussion of advanced .NET topics."
<ADVANCED-DOTNET@DISCUSS.DEVELOP.COM>


To
ADVANCED-DOTNET@DISCUSS.DEVELOP.COM
cc

Subject
Re: Join/Merge multiple files together






I'm curious what other OS/file systems have this capability
natively...it's an interesting, though edge-case, optimization. A quick
Google only turns up Lustre
http://ieeexplore.ieee.org/iel5/4215348/4215349/04215390.pdf?isnumber=3D4=
2
15349&prod=3DCNF&arnumber=3D4215390&arSt=3D267&ared=3D274&arAuthor=3DYu%2=
C+Weikuan
%3B+Vetter%2C+Jeffrey%3B+Canon%2C+R.+Shane%3B+Jiang%2C+Song as having
"..an innovative file joining feature that joins files in place...".=20

That being said, implicit in your question is that the different clients
are *already* writing to a single networked file system. Why not just
have them all write to a single file to begin with? Since they are
contiguous blocks in a known order, it'd seem you could get away with
just having them start at different offsets. I believe that should all
be doable with fairly simple managed or p/invoked code.

Alternatively, if you can hook into the OS on the reading box - you can
fake it without requiring source changes to the reading app. If you have
that possibility, I'd start by looking at (probably in this order):
1. WinFUSE http://www.suchwerk.net/sodcms_FUSE_for_WINDOWS.htm which
purports to have a userland file system in managed code
2. Windows File System Filter Drivers
http://www.microsoft.com/whdc/driver/filterdrv/default.mspx
3. Windows Installable File System Kit
http://www.microsoft.com/whdc/devtools/ifskit/default.mspx=20

Your goal there would just be to intercept a request for
"bigfile001.dat" and read "file001-1.dat", "file001-2.dat", etc.=20


--Mark Brackett

> -----Original Message-----
> From: Discussion of advanced .NET topics. [mailto:ADVANCED-
> [EMAIL PROTECTED] On Behalf Of Michael Sharpe
> Sent: Tuesday, February 05, 2008 12:24 PM
> To: ADVANCED-DOTNET@DISCUSS.DEVELOP.COM
> Subject: Re: [ADVANCED-DOTNET] Join/Merge multiple files together
>=20
> Sadly, no.  We have no control over the consumer application of the
> required data file.   It cannot accept the data in chunks or pieces.
> It
> cannot accept a data stream either so providing it data on the fly is
> out
> of the question.  It can only accept it as a single data file.
> Otherwise
> we would have pushed for this a long time ago....
>=20
> It seems like this is quite a difficult problem to address.  Maybe
> Windows
> is not the best platform to be running for our needs.  We have seen
> other
> O/S that can do this very easily.
>=20
>=20
>=20
>=20
>=20
> John Brett <[EMAIL PROTECTED]>
> Sent by: "Discussion of advanced .NET topics."
> <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM>
> 02/05/2008 11:02 AM
> Please respond to
> "Discussion of advanced .NET topics."
> <ADVANCED-DOTNET@DISCUSS.DEVELOP.COM>
>=20
>=20
> To
> ADVANCED-DOTNET@DISCUSS.DEVELOP.COM
> cc
>=20
> Subject
> Re: Join/Merge multiple files together
>=20
>=20
>=20
>=20
>=20
>=20
> > Is it at all possible to combine 2 (or more) data files together
> without
> > A) opening the data files to read and B) creating a "new" file out
of
> > them?
>=20
> Can you change the problem to make it easier to solve?
> Depending upon what you need to do with the end-product, can you
change
> the reading application to use an index or other mechanism to indicate
> the
> set of files to read? Can you create a file-reader shim that
> aggregates the files on the fly whilst reading? Can you change the
> application that generates these files to append to a single file?
> Just trying to understand why you have to have exactly one file to
> work with, since that seems to be causing you difficulties.
>=20
> John
>=20
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> This list is hosted by DevelopMentor(r)  http://www.develop.com
>=20
> View archives and manage your subscription(s) at
> http://discuss.develop.com
>=20
>=20
> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> This list is hosted by DevelopMentor(r)  http://www.develop.com
>=20
> View archives and manage your subscription(s) at
> http://discuss.develop.com

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
This list is hosted by DevelopMentor=AE  http://www.develop.com

View archives and manage your subscription(s) at
http://discuss.develop.com


===================================
This list is hosted by DevelopMentorĀ®  http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Reply via email to