Re: Blog entry about creating certificates for SSL
Thanks so much for the feedback, Christopher. I finally updated the draft with your feedback and published the final result. https://blogs.apache.org/accumulo/entry/generating_keystores_for_configuring_accumulo On 8/14/14, 3:50 PM, Christopher wrote: Got a couple of corrections/notes/suggestions (apologies for this possibly being longer than the actual blog post): == HEADER Suggestion: Might be helpful to open with an explanation of the scope. That this document is intended to help users manually generate certificates when their organization doesn't already provide a (presumably more convenient) mechanism to do so. "Accumulo opted to not bundle any tools to automatically generate certificates for deployment." Note/Suggestion: Technically, I argued that we not bundle a custom tool to generate "secure" certificates. It was (and still is) an option to bundle an a contrib script to call an existing standard tool that already has all the features one needs. To improve this line, I'd suggest changing it from "Accumulo opted to not..." to "Accumulo expects the user to provide secure certificates, which can be generated by standard tools, like openssl, keytool, easy-rsa, and others." This wording encapsulates the decision with an explanation, and has the added benefit of conveying the user's responsibilities, which is the focus of the blog (helps establish the thesis of the blog). == Section: Generate a Certificate Authority "Only certificates which are signed by the same authority can create a secure connection." Correction: This line is incorrect (or at best, vague). Any certificate signed by an authority in the CA keystore can create a secure connection. This is true on both ends: any certificate the certificate the client uses must be signed by a CA trusted by the server, and any certificate the server uses must be signed by a CA trusted by the client (the client and server configure their keystores independently). Adding this explanation, and mapping it to the "truststore" (certificate store containing trusted CA public keys) and "keystore" (certificate store containing private key) configuration elements would greatly benefit the reader. If you explain that, then you could clarify that it is simpler to use the same CA to sign each certificate for each host, so the client only has to trust a single CA (which is what I think you were trying to say to begin with). Suggestion: For the code block, you should have comments for each line to explain what they do, eg. "generate private key for the CA", "generate a corresponding public key and create a self-signed certificate", "convert the public key to a binary format that keytool understands", etc. Suggestion: Add -des3 to the genrsa line, to encourage the good practice of encrypting private keys with a passphrase. (do this in the next section, also, but make sure to note that they should not have the same passphrase as the CA, because the CA should be treated with extra special care, since it has the authority to issue trusted certificates for everybody else) Suggestion: Drop -nodes in the "openssl req" command, because you're not outputting a private key and it's confusing. Besides, you almost never want that, even on commands that do output private keys, if you're concerned about protecting those keys. Note: If you don't need the base64 encoded ASCII version (PEM), you can use "-outform der" on the req command to output directly to the binary format, because it's simpler, but it's probably best to keep it in, because you do need the pem format in the next section (because, stupidly, the pkcs12 command doesn't understand der), so might as well be consistent and leave it in two steps. Suggestion: use the same alias in the JKS as the CN from the certificate. It'll help users distinguish between keystores, independent of file names (applies to the next section, also) == Section: Generate a certificate/keystore per host "By issuing individual certificates to each entity, it gives proper control to revoke/reissue certificates to clients as necessary, without widespread interruption." Note: This is true, and certainly the reason you want separate certificates, but you should mention that there is currently no feature in Accumulo to check a revocation list (we certainly don't expose a configuration item for it, I don't know of any built-in JSSE system property for it, and I Thrift has any mechanism built-in either... but it should, and we should expose it when it does). Suggestion: Be consistent with the filename extension. No need to do .crt here when you used .pem/.der earlier. Just stick to one or the other. Certificate formats are confusing enough. == Section: Configure Accumulo Servers Suggestion: You should mention chmod and file permissions to ensure only the services that need access to the files can read/alter them. This applies to the CA key also. == Section: Configure Accumulo Clients Suggestion: s/simple properties file/simple Java
Re: Tablet server thrift issue
As an update, I raised the tablet server memory and I have not seen this error thrown since. I'd like to say raising the memory, alone, was the solution but it appears that I also may be having some performance issues with the switches connecting the racks together. I'll update more as I dive in further. On Fri, Aug 22, 2014 at 11:41 PM, Corey Nolet wrote: > Josh, > > Your advice is definitely useful- I also thought about catching the > exception and retrying with a fresh batch writer but the fact that the > batch writer failure doesn't go away without being re-instantiated is > really only a nuisance. The TabletServerBatchWriter could be designed much > better, I agree, but that is not the root of the problem. > > The Thrift exception that is causing the issue is what I'd like to get to > the bottom of. It's throwing the following: > > *TApplicationException: applyUpdates failed: out of sequence response * > > I've never seen this exception before in regular use of the client API- > but I also just updated to 1.6.0. Google isn't showing anything useful for > how exactly this exception could come about other than using a bad > threading model- and I don't see any drastic changes or other user > complaints on the mailing list that would validate that line of thought. > Quite frankly, I'm stumped. This could be a Thrift exception related to a > Thrift bug or something bad on my system and have nothing to do with > Accumulo. > > Chris Tubbs mentioned to me earlier that he recalled Keith and Eric had > seen the exception before and may remember what it was/how they fixed it. > > > On Fri, Aug 22, 2014 at 10:58 PM, Josh Elser wrote: > >> Don't mean to tell you that I don't think there might be a bug/otherwise, >> that's pretty much just the limit of what I know about the server-side >> sessions :) >> >> If you have concrete "this worked in 1.4.4" and "this happens instead >> with 1.6.0", that'd make a great ticket :D >> >> The BatchWriter failure case is pretty rough, actually. Eric has made >> some changes to help already (in 1.6.1, I think), but it needs an overhaul >> that I haven't been able to make time to fix properly, either. IIRC, the >> only guarantee you have is that all mutations added before the last flush() >> happened are durable on the server. Anything else is a guess. I don't know >> the specifics, but that should be enough to work with (and saving off >> mutations shouldn't be too costly since they're stored serialized). >> >> >> On 8/22/14, 5:44 PM, Corey Nolet wrote: >> >>> Thanks Josh, >>> >>> I understand about the session ID completely but the problem I have is >>> that >>> the exact same client code worked, line for line, just fine in 1.4.4 and >>> it's acting up in 1.6.0. I also seem to remember the BatchWriter >>> automatically creating a new session when one expired without an >>> exception >>> causing it to fail on the client. >>> >>> I know we've made changes since 1.4.4 but I'd like to troubleshoot the >>> actual issue of the BatchWriter failing due to the thrift exception >>> rather >>> than just catching the exception and trying mutations again. The other >>> issue is that I've already submitted a bunch of mutations to the batch >>> writer from different threads. Does that mean I need to be storing them >>> off >>> twice? (once in the BatchWriter's cache and once in my own) >>> >>> The BatchWriter in my ingester is constantly sending data and the tablet >>> servers have been given more than enough memory to be able to keep up. >>> There's no swap being used and the network isn't experiencing any errors. >>> >>> >>> On Fri, Aug 22, 2014 at 4:54 PM, Josh Elser >>> wrote: >>> >>> If you get an error from a BatchWriter, you pretty much have to throw away that instance of the BatchWriter and make a new one. See ACCUMULO-2990. If you want, you should be able to catch/recover from this without having to restart the ingester. If the session ID is invalid, my guess is that it hasn't been used recently and the tserver cleaned it up. The exception logic isn't the greatest (as it just is presented to you as a RTE). https://issues.apache.org/jira/browse/ACCUMULO-2990 On 8/22/14, 4:35 PM, Corey Nolet wrote: Eric & Keith, Chris mentioned to me that you guys have seen this issue > before. Any ideas from anyone else are much appreciated as well. > > I recently updated a project's dependencies to Accumulo 1.6.0 built > with > Hadoop 2.3.0. I've got CDH 5.0.2 deployed. The project has an ingest > component which is running all the time with a batch writer using many > threads to push mutations into Accumulo. > > The issue I'm having is a show stopper. At different intervals of time, > sometimes an hour, sometimes 30 minutes, I'm getting > MutationsRejectedExceptions (server errors) from the > TabletServerBatchWriter. Once they start, I need to restart the
Re: Review Request 22658: ACCUMULO-2889: Batch metadata updates for new WALs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22658/ --- (Updated Sept. 1, 2014, 11:28 a.m.) Review request for accumulo. Changes --- Incorporating feedback from kturner. Removed the use of ThreadLocal and batching of updates happens in TabletServerLogger instead of in the writer class. Bugs: ACCUMULO-2889 https://issues.apache.org/jira/browse/ACCUMULO-2889 Repository: accumulo Description --- Added additional methods to the Writer class to handle batching. Potential risks are that we're now holding onto the locks for a bit longer than we used to. All tablets present in a batch will have their logLock's locked until the batch is complete. Diffs (updated) - core/src/main/java/org/apache/accumulo/core/client/impl/Writer.java d6762e7 server/base/src/main/java/org/apache/accumulo/server/util/InternalBatchWriter.java PRE-CREATION server/base/src/main/java/org/apache/accumulo/server/util/MetadataTableUtil.java 463ca57 server/tserver/src/main/java/org/apache/accumulo/tserver/Tablet.java f9fdacb server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java 97f2090 server/tserver/src/main/java/org/apache/accumulo/tserver/log/TabletServerLogger.java d25ee75 test/src/test/java/org/apache/accumulo/server/util/InternalBatchWriterIT.java PRE-CREATION test/src/test/java/org/apache/accumulo/test/BatchMetadataUpdatesIT.java PRE-CREATION Diff: https://reviews.apache.org/r/22658/diff/ Testing --- Added a new IT. We insert a few entries to tablets and ensure that the relevant entries appear in the WAL and Metadata Table. One test case that isn't included yet is verifying that root + metadata table entries are entered correctly. Thanks, Jonathan Park