Hello Alex,

I have now had a chance to look at your changes.  Please see my comments 
below.

On Monday 08 March 2010 11:11:19 Alex Ehrlich wrote:
> Hello,
>
> I am in the way of implementing EFS (Encrypted File System) support on
> Windows platform.

Yes, such a feature would be nice.

>
> The current state of code is "proof of concept". It allows to back up
> encrypted files and restore them to NTFS *if* information about
> encryption is hardcoded (filenames hardcoded), so the very basic flow
> kind of works. The base source code is 5.0.0 (neither the latest 5.0.1
> nor development) as I wanted something "stable enough" as a baseline to
> add my two cents to and I started before 5.0.1; I believe it is easy to
> merge the changes into the latest versions when they are completed.
> I attach the changed files for your consideration in full, not in diff
> format (as I found no suitable diff tool right now), they are packed by
> 7-zip (www.7-zip.org) as sourceforge lists block zip attachments iirc.

This is OK as a proof of concept, but to be able to work effectively with us, 
you will really need to learn some of the elementary functions of git, which 
has a suitable diff tool built in.

You can read about git in the developer's guide that is on www.bacula.org -> 
Documentation.

>
> Currently, I've added
>    bool use_efs_api;
> to win32-struct BFILE in the same way as "bool use_backup_api" is
> defined. It is not very nice (maybe a switch {normal, backupapi, efsapi}
> should be used instead) but at least it works -- with minimal changes to
> the rest of code. Please forgive me the fact that my code changes are
> pretty ugly -- I am not a C/C++ developer.

Yes adding such a flag is OK.

>
> Ramifications made *currently* for EFS-encrypted files are:
> * no compression: compressing encrypted data is of no good usually,
> however, for small files compressing the EFS header could result into
> reasonable compression rate)
> * no encryption (by means of bacula): the file contents are already
> encrypted, but the file header (EFS stream) still needs encryption
> * no sparse support: while sparse files can be EFS-encrypted, sparse
> files could still be handled as "regular files filled by zeros"
> * no portable: see "other missing parts" below
> All these have been made for the sake of code simplicity only and I
> would like to get rid of them, see #3 below.

The proof of concept you did, sort of tacks the code on to the existing 
program, so anything that will be used for a release needs to be better 
integrated.  All the Windows dependent code that you wrote needs to be moved 
into the appropriate places in findlib or compat where it can become part of 
the current Windows code.

>
> The data stream type for EFS-encrypted files is still STREAM_WIN32_DATA.
> Is it OK or shall a new stream data type be added?

Yes, we will have to add a new data stream for it.

>
> There are 3 main outstanding issues where I hope to get advice from
> architects.
>
> (1) How to get "encrypted" attribute of the file being backed up from
> "stat" functions (in src/win32/compat/compat.cpp) up to bopen/backup
> level (FF_PKT->BFILE) -- as those stat() functions are the right places
> to read it? Currently src/findlib/find_one.c calls those stat functions,
> but the only output data is "structure stat" that is "very much minimal
> unix info". Can I just add more attributes to this structure? (I got
> access violation when I tried it in an "easy way"; there are lots of
> defines in the headers about the stat structure). Or, alternatively, can
> I use some other attribute (like sb->st_rdev that is currently used to
> keep reparse/mount point flag)? Or what would be the "right" way? Again,
> in the current "proof of concept" this is hardcoded based on sample
> filenames in backup.c/restore.c.

It is not so easy to add information to the stat packet, particularly because 
it is used to build the meta data sent to the director.  That said, it is 
possible and with a little thought we can workout something.  I would have to 
understand the details a bit -- i.e. whether we are talking about a single 
bit, or we are talking about adding a larger data item.

>
> (2) How to store (on backup) and read (on restore) the *new*
> "efs-encrypted" attribute of a file? This attribute does not fit into
> extended attributes imo as it shall be available "very early on
> backup/restore" -- at the moment of bopen(), since the way of opening
> file depends on this attribute. So how to add a new "basic-level"
> attribute?

This is done by having different streams.

>
> (3) The contents of the main data sending loop in src/filed/backup.c
> send_data()
>     while ((sd->msglen=(uint32_t)bread(&ff_pkt->bfd, rbuf, rsize)) > 0) {
>       ...
>       }
> should be rewritten into a separate function imo that gets a block of
> data and processes it.

This is *exactly* what the current code does, so I don't immediately see a 
need to change it.

> In this case callback-based processing EFS data 
> could be easily fit into this "process data block" function logic: EFS
> callback to be an alternative to this "while bread" loop. Any opinion?

Hopefully, all the callback complication can be handled inside bread(), which 
is responsible for reading a block.   If it really requires a callback 
routine at the high level, then we can work on modifying the current logic.

> So far I've made a "cut-down" copy-paste of the while loop contents to
> be called by EFS processing -- only sending data to SD, skipping all
> kinds of compression/encryption/etc.

Yes, that is fine for a proof of concept, but as you know, we need to change 
it for a real release.

>
> Other missing parts:
> * a known bug in src/filed/restore.c line 1175: only tiny files can be
> restored
> * only files are efs-encrypted/decrypted right now, not directories
> (contents of an "efs-encrypted" directory are visible without decryption
> in windows anyway, "encrypted directory" means just "encrypt any file
> created in this directory", not "encrypt directory listing")
> * check if target restore machine supports EFS (W2K or later)
> * check if target restore volume on win32 supports EFS (ie volume
> information states it supports encryption -- not available for FAT32 etc)
> * portable restore (when the target platform/filesystem does not support
> EFS) -- I believe it is not needed actually, as (in contrast to
> backupread/backupwrite format parsing) there is not much to do with raw
> EFS data; however, if anybody can provide a good reason for it this can
> be implemented
> * error handling should be reviewed and improved
> * set_portable_backup logic should be reviewed to work with efs

OK

>
> Additional questions:
> * do I miss something in the overall picture (standalone utilities,
> verify, some other parts of bacula broken due to this way of changes)?

I don't know if you missed something, but for the code to go into Bacula there 
will need to be some major changes -- putting the code in the right place 
(not in backup.c or restore.c), integrating it with the current flow of 
code, ...

> * does anybody know about other "things" in/for Windows using the same
> FILE_ATTRIBUTE_ENCRYPTED for a purpose other than EFS (maybe, BitLocker,
> 3rd party encryption tools)?

No, I try to know the least possible about Windows.

> * looking at the backup vs restore code, it *seems* that backup reads
> and sends file data in chunks, while restore reads the whole file data
> from SD and then processes it as a single piece; have I understood the
> logic correctly? if yes then has anybody ever restored a 4Gb file
> successfully? a 40Gb file?

Backup reads blocks and writes blocks to the storage daemon.  The storage 
daemon returns exactly the same blocks.  Bacula can handle an arbritrary 
about of data there is no 4GB or 40GB limit.  It can handle at least 2^64 
bytes in a single Bacula Volume, and probably an unlimited amount of data for 
a single file that it is backing up or restoring.

>
> Any other advices/hints/ideas are appreciated, especially about the
> overall architecture (using STREAM_WIN32_DATA, working with attributes
> and so on).

Several conceptial questions: 1. Does your current code save unencrypted data 
or encrypted data?  2. How are the keys handled?

One project that I have wanted to do for quite some time is to rewrite the 
basic logic of backup and restore to make them a set of "modules" or 
processes that are stacked in a certain order depending on user defined 
options.  I.e. a read subroutine, a compress subroutine, an encript 
subroutine, ...  That way, instead of the current straight inline code, it 
would all be broken into a bunch of processing routines.  The exact ones that 
would be used would be pushed on a stack, then the stack executed.  This 
would vastly simplify adding new "modules" (processing subroutines).

If the above were implemented, it would probably simplify the work you want to 
do, but doing the above is a big project, and already going from your proof 
of concept to production Bacula code is going to be a very big and 
non-trivial task.

If you want to continue with this work, I suggest the following steps:

1. Fill out and send in the FLA (see www.bacula.org -> FSFE License)
2. Read the first part of the developers guide (about the FLA, our programming 
standards, and git usage).
3. Start working with a little git database that you create to learn git.
4. Clone the current Bacula git repository.
5. Find the right place to add your Windows dependent code (some should go 
into compat.cpp and some in bfile.c, and the rest elsewhere).  
6. Get the basic subroutines into the source code without actually 
implementing the new feature.
7. Send us a format-patch of what you propose.  Send it uncompressed as an 
attachment.
8. Discuss with us how to integrate the full logic of detecting, reading, and 
backing up encrypted files.

Thanks for your interest in Bacula.  I encourage you to continue with this 
project.

Best regards,

Kern

>
> Regards,
>
> Alex Ehrlich



------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to