I appreciate the performance concern and will certainly take that into 
consideration.  The problem that I currently face is that we have a 
computational grid system.  This system is responsible for performance 
analytics and returning results.  Part of this process also is to generate 
export files that other systems can recognize.  Since it is a distributed 
(parallel) system, I cannot have a single data file generated during the 
process.  We currently wait for all processes to finish and then do a 
"merge" operation that puts everything back together and present it as if 
it were not run distributed.  This merge can often times take just as long 
as the analytical processing since it is not uncommon for us to generated 
100GB worth of data files.  The export files need to be in a specific 
format but I can built that format on a per-piece basis and only need the 
pieces to be linked together.  I don't want to have the additional disk 
overhead of performing the merge operation.  The data files are already 
exactly how we want them, they are just in pieces.  I just need to do 
File1 + File2 + File3 + File4 etc and it is all done. 

I am also not concerned with disk read since the export file is only going 
to be accessed a handful of times at most.  Having some performance 
decrease at read time is far less critical than our increased time at 
merge/recombine.   I was hoping for a way to do this without having to 
really tap into the APIs and disk subsystem.  It can get very scary in 
there and any mistake can corrupt data on the drive.





Peter Vertes <[EMAIL PROTECTED]> 
Sent by: "Discussion of advanced .NET topics." 
<ADVANCED-DOTNET@DISCUSS.DEVELOP.COM>
02/05/2008 11:00 AM
Please respond to
"Discussion of advanced .NET topics." 
<ADVANCED-DOTNET@DISCUSS.DEVELOP.COM>


To
ADVANCED-DOTNET@DISCUSS.DEVELOP.COM
cc

Subject
Re: Join/Merge multiple files together






Take a look at the FileStream class.  It gives you the possibility to
Append to a file.  I see how in the short term appending to a file
(while causing fragmentation) is a speedy process but think about it
in the long run; when you try to read the data from these files back
they will be heavily fragmented and all the speed you've gained will
be lost.

-Peter


On 2/5/08, Michael Sharpe <[EMAIL PROTECTED]> wrote:
> Is it at all possible to combine 2 (or more) data files together without
> A) opening the data files to read and B) creating a "new" file out of
> them?
>
> For example, lets say I have one file that is 20MB and a second file 
that
> is 50MB.  What I want to have happen is for the second data file to just
> be appended to the first but without Windows or the file system having 
to
> read any data.  Basically I just want the FAT to take the 50MB file and
> attach it to the end of the 20MB file as a file fragment.  I suppose 
this
> is more of a file join than an append.
>
> This is how it seems to work now:
> File 1 + File 2 =
>
> 1) Create File 3
> 2) Read contents of File 1 into File 3
> 3) Read contents of File 2 into File 3
>
>
> This is how I want it to work:
> File 1 + File 2 =
>
> 1)  File 1 is left alone except for the EOF position.
> 2)  EOF position is removed from File 1 and is now an address pointer to
> the start of File 2
> 3)  File 2 has any header information stripped from it
> 4)  EOF becomes the end of File 2
> 5)  File 1 is now a fragmented file since neither file is moved yet it
> contains the contents of File 1 and File 2
>
>
>
>
> I know that there are some file systems available on other platforms 
that
> can do this joining of files without the overhead of having to build a
> file and read data into it.  Is this at all possible in windows?  I 
don't
> care if it used .NET, C++, C,  etc.
> And I am aware that file fragmentation is not always a good thing but in
> this case, I am more than willing to accept file fragmentation for the
> speed especially when this needs to be done over many groups of files. 
If
> this needs to be done on let's saw 1,000 files that are 120GB total in
> size, I don't want the file system to rebuild 120GB of data that I 
already
> have but is just in different pieces.  I want to just stick the pieces
> together.
>
> Thanks
> Mike
>
> ===================================
> This list is hosted by DevelopMentor(R)  http://www.develop.com
>
> View archives and manage your subscription(s) at 
http://discuss.develop.com
>

===================================
This list is hosted by DevelopMentorĀ®  http://www.develop.com

View archives and manage your subscription(s) at 
http://discuss.develop.com


===================================
This list is hosted by DevelopMentorĀ®  http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Reply via email to