Nice stuff. Thanks. ----- Original Message ----- From: [EMAIL PROTECTED] To: "MS-Exchange Admin Issues" <exchangelist@lyris.sunbelt-software.com> Sent: Thursday, November 20, 2008 3:32:57 PM GMT -05:00 US/Canada Eastern Subject: RE: Email Archival 101: a General View
It was great. I appreciate you sharing it. >From : "Bingham, Kevin" <[EMAIL PROTECTED]> Sent : Thursday, November 20, 2008 12:01 PM To : "MS-Exchange Admin Issues" <exchangelist@lyris.sunbelt-software.com> Subject : RE: Email Archival 101: a General View Well, as I said, some of it is hacked together rather hastily, while I still have this account, so I expect some minor discrepancies. Therefore, a few notes in response: Event sinks ~= transport/routing agents, for this purpose. I used the Sinks terminology because more people are still familiar with it, and when we did are review of products in 2005/2006, there were no archiving vendors that had E2K7 Routing Agents. Go figure. "need manageable . content . isn't accessed very often." Precisely; that's one set of questions involved in the Content Management category. When you start doing these sorts of things and don't involve legal personnel (if you have any), it will probably come back to you for reworking, eventually. Involve potential stakeholders at the start when possible. If said stakeholders don't exist. no involvement. Even if they exist, but you don't think they have any involvement/needs in your current project to offload old data from the Exchange server, you should strongly consider touching base with them when doing this sort of work. the designs are certainly easier to do the first time than to try to retro-fit when whole new categories of requirements popup next year. I agree with the list of features to look for, in a general sense. I am much more prone to encourage a company to figure out what their needs are, though, rather than assume the same laundry list applies to everyone. Granted, knowing what is on the possible laundry list is helpful in understanding what our own list might look like. From: William Lefkovics [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2008 12:02 PM To: MS-Exchange Admin Issues Subject: RE: Email Archival 101: a General View Lots of good information in there. I certainly don't agree with everything. Event sinks? In Exchange 2007, you would write an archiving transport/routing agent. Small companies often need archiving but do not have a legal department or binding regulatory needs. They need a manageable Exchange server so they are not backing up content daily that isn't accessed very often. That's the primary reason I hear for archiving. >From an Information Week article by Andrew Conry-Murray in June 2008: What to look for in an E-mail archiving solution: 1) Compression 2) Full Content Index 3) Keyword Search 4) Litigation hold (prevent deletion) 5) Metadata Index 6) Retention Deletion Policy enforcement 7) Single Instancing [WSLIII1] Other preferred features: 1) Additional Search 2) API/Connector to other systems, especially legal apps 3) Discovery 4) SharePoint integration 5) Support for extensive list of attachment types Probably the most valuable thing you said for me, is the last paragraph. Test your potential solution. MAPI-based and Journaling (ew!) archivers should be able to be tested without affecting real live data. From: Bingham, Kevin [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2008 7:59 AM To: MS-Exchange Admin Issues Subject: Email Archival 101: a General View I promised a while back to do a generic write-up on selecting an Email Archival Solution; figured I better finish this set of scribbles before I shuffle off from the company next week. If anyone wants to throw some of this up on a blog somewhere, feel free. Since I'm finishing this up in a rush, there are undoubtedly considerations I've forgotten to include here, and I only strove to include considerations that would be prevalent to the majority of companies, but this should be a good start for any company considering archival. This is written from the perspective of an Exchange Administrator; Exchange as your core email solution is assumed, but most of the generalities within could apply to any email solution. This information is not definitive nor unbiased; it only represents the empirical findings of a couple of administrators. Email Archival has been a hot point in the industry for some time now, with no real consensus on best-practices or best-in-breed products. Different parts of industry drive this division in views by having different requirements. In general, it's a system of pulling email out of its native storage system and placing it somewhere else; the specifics end there, though. So, when considering Email Archival, the first thing you need to do is define what it means to your company. Why do you want to do archiving? From there, you should be able to work into the second big question: What features do you need this tool suite to have? There are four primary reasons to want to do archiving: mailbox size management; legal regulation; litigation response/rules of procedure; content management. Deciding upon your primary driver first is key to being able to understand your path forward - and how closely entangled your Archival implementation need to be with your Legal department. (Hint: the answer is almost always: VERY.) Mailbox size management: are you nuts? You want to take all that email data out of a system that is designed to manage email data and stick it somewhere else, increasing the complexity of the whole system and the number of steps your users need to actually do anything? Generally speaking, the tools within Exchange are sufficient for simple mailbox size management. If you need additional space, it is almost always cheaper to simply expand your Exchange databases/storage groups/servers rather than implement a whole new system on the side. With the advent of Exchange 2007, you don't even need to sustain the same level of disk I/O as previously, so larger, cheaper disks are an option natively to the email system, rather than with a third party archiving solution. Legal regulation: an easy call, relatively speaking. The requirements of the system should be laid out and decided for you. You still need to discuss with your Legal department what additional aspects need to be considered. Litigation response: the most involved scenario for legal requirements gathering. Every industry, every business, will have a slightly different focus. Heavy involvement with the legal department will be required. You need to be prepared to tell them what they have forgotten to consider, or assumed you knew, or you will find the requirements changing drastically after implementation. Content Management: It's litigation response, plus. Plus everything. This is generally for large organizations trying to get a handle on what data they have, where it is, and how they want it managed. Like litigation response, this generally starts with some very vague ideas about the requirements and a lack of understanding of just how involved the decision sets can/need to be. Usually when initially approached about retention periods for email, Legal Departments will state that you need to keep everything forever, or delete everything after 30 days. In some few cases, one of these responses might be appropriate, but generally, they are both useless. In the former, you wind up having so much garbage in the archive that it is impossible to find anything useful (do you really need to keep the note from your wife from 8 years ago, asking you to pick up a gallon of milk on the way home?) while in the latter, there is nothing useful left, and the users are upset because they can't reference the older items, either. So, retention periods probably need to be more selective. You need to determine how you want that selectivity to occur, though. Only certain users (ie, executives or lawyers, or such)? Only certain folders in a mailbox? Only certain content? Determining how that selectivity needs to occur will be a driving factor for product selection. Do you need to guarantee every item is captured? Or can you put some responsibility on the user to classify what must be archived? There are three basic methods by which data might be moved into the archive; most vendors offer a choice between two of these: MAPI, Event sink, or journaling. MAPI will use a standard MAPI login to the mailbox being archived, typically from a separate application server. It might be a continuous logon or a scheduled one; it will have all the overhead of a MAPI connection, plus whatever code the vendor is using to filter out items for archiving, plus overhead to remove items (if applicable), plus overhead to insert stubs (if applicable). Suffice to say, this might be significant in some circumstances. Event Sink runs on the Exchange server, as an extra step during message processing. It is more efficient than MAPI and guaranteed to review every message (MAPI isn't), but can increase the load significantly and possibly cause delays in mailflow. Journaling is a built-in Exchange method of copying all mail sent to mailboxes in a storage group to a different mailbox. This can be combined with a MAPI or Event Sink application, which then runs only against the journal mailbox instead of every mailbox. Journaling alone may meet some organizations' archival needs by itself, without a third-party vendor addition. It is disk and processor intensive. Retrieval methods also vary greatly from vendor to vendor. Many vendors offer multiple methods of data retrieval; what will work in your environment? Mailbox retrieval is generally accomplished by leaving a "stub" item in the user's mailbox. When the stub is opened, the message is retrieved and presented to the user. The method of retrieval, however, can also vary greatly. Perhaps the stub is a custom form that needs to be installed in your Organizational Forms, which makes a call to a web server when open, which retrieves the data from the archive repository and presents it to the user in the custom form. Perhaps it posts a request into an application mailbox, which a service is continually monitoring and processes, and posts the retrieved item into the user's mailbox, which then has to be opened. Perhaps opening the item executes an Outlook add-in which fetches the item from an archive itself. There are many ways to implement stub retrieval, all of which have different implications for supportability, load balancing, and fault tolerance. A fat client is simply an installed application on the desktop, which allows users to access, search, and sometimes manage the archive, directly. A web interface should be similar to a fat client, but would be hosted as a web page somewhere, with the application doing the work there. Security is a strong concern in some places, not so much in others. How does the solution prevent users from retrieving each other's data? Is there a way to allow a user to access someone else's data, intentionally? Does the archive maintain its own security model, or is it integrated with Active Directory or other security provider? If it is integrated, does that mean it synchronizes a copy and maintains it own, or does it make security calls against that directory directly? How is integrity of the archive (ie, are users allowed to delete things from it or not?) guaranteed? Integration with other data sources can be a concern for Content Management implementations, but might be for other implementation reasons also - and it never hurts to consider the future (will you ever have need for Content Management?) A Content Management initiative will often include - either currently or when you turn your back a month after implementation - other data sources as well, such as file servers, SharePoint, or some other databases. If so, does the solution have an integrated answer for all platforms? You may sacrifice some best-in-breed features by going with a single vendor for all sources, but you will probably gain cost savings and a single method of retrieval/search/whatever for all data. which is usually sort of the point (or one of the points) of a Content Management initiative. Topology considerations will be insignificant for small companies, but of the utmost importance for geographically disparate ones. Where is data stored - single point or multiple locations? Does the application run in multiple places, or just one? How does the storage function work over the WAN? How does the retrieval work over the WAN? If there are multiple repositories, how do they communicate with each other and how do referrals to other repositories occur, if at all? Every policy/feature consideration probably has a technical one to go with it - which you can bet the archive vendors probably won't tell you. For instance, leaving stub items in the mailbox is a great usability feature, but one of the tradeoffs is possible performance - it's not the size of your database that primarily drives performance in Exchange, but rather the number of items; leaving stubs does nothing to reduce number of items and will, in fact, swiftly increase it over time. Offline access is completely unimportant for some companies, but considered essential at others. Does the solution have any sort of offline cache for traveling users? If so, how does the cache operate - how is it populated, synchronized, encrypted? Is there a size cap? Does its existence on a laptop violate any of the drivers the Legal Department is pushing in order to run the project in the first place? For instance, if the vendor is just using their own PST to provide an offline archive, you can run into a 2GB space limitation on a file that is weakly encrypted as best, and if a primary driver is to remove PSTs from your environment, this may not be a viable offline solution for you. Finally, Pilot The Solution. Do NOT pick a vendor just from discussions, data and presentations. Get Your Hands Dirty. My old company issued RFPs to eight companies and brought three in for testing. Some things came out in testing that - though probably just fine for other companies' needs - would have left us very unhappy if we'd just gone with the vendor who seemed to fit the best from the RFPs. This e-mail is intended for the use of the addressee(s) only and may contain privileged, confidential, or proprietary information that is exempt from disclosure under law. If you have received this message in error, please inform us promptly by reply e-mail, then delete the e-mail and destroy any printed copy. Thank you. Verbing weirds language. This e-mail is intended for the use of the addressee(s) only and may contain privileged, confidential, or proprietary information that is exempt from disclosure under law. If you have received this message in error, please inform us promptly by reply e-mail, then delete the e-mail and destroy any printed copy. Thank you. ~ Ninja Email Security with Cloudmark Spam Engine Gets Image Spam ~ ~ http://www.sunbeltsoftware.com/Ninja ~