Re: Blog entry about creating certificates for SSL

2014-09-01 Thread Josh Elser

Thanks so much for the feedback, Christopher.

I finally updated the draft with your feedback and published the final 
result.


https://blogs.apache.org/accumulo/entry/generating_keystores_for_configuring_accumulo

On 8/14/14, 3:50 PM, Christopher wrote:

Got a couple of corrections/notes/suggestions (apologies for this possibly
being longer than the actual blog post):

== HEADER

Suggestion: Might be helpful to open with an explanation of the scope. That
this document is intended to help users manually generate certificates when
their organization doesn't already provide a (presumably more convenient)
mechanism to do so.

"Accumulo opted to not bundle any tools to automatically generate
certificates for deployment."

Note/Suggestion: Technically, I argued that we not bundle a custom tool to
generate "secure" certificates. It was (and still is) an option to bundle
an a contrib script to call an existing standard tool that already has all
the features one needs. To improve this line, I'd suggest changing it from
"Accumulo opted to not..." to "Accumulo expects the user to provide secure
certificates, which can be generated by standard tools, like openssl,
keytool, easy-rsa, and others." This wording encapsulates the decision with
an explanation, and has the added benefit of conveying the user's
responsibilities, which is the focus of the blog (helps establish the
thesis of the blog).

== Section: Generate a Certificate Authority

"Only certificates which are signed by the same authority can create a
secure connection."

Correction: This line is incorrect (or at best, vague). Any certificate
signed by an authority in the CA keystore can create a secure connection.
This is true on both ends: any certificate the certificate the client uses
must be signed by a CA trusted by the server, and any certificate the
server uses must be signed by a CA trusted by the client (the client and
server configure their keystores independently). Adding this explanation,
and mapping it to the "truststore" (certificate store containing trusted CA
public keys) and "keystore" (certificate store containing private key)
configuration elements would greatly benefit the reader. If you explain
that, then you could clarify that it is simpler to use the same CA to sign
each certificate for each host, so the client only has to trust a single CA
(which is what I think you were trying to say to begin with).

Suggestion: For the code block, you should have comments for each line to
explain what they do, eg. "generate private key for the CA", "generate a
corresponding public key and create a self-signed certificate", "convert
the public key to a binary format that keytool understands", etc.

Suggestion: Add -des3 to the genrsa line, to encourage the good practice of
encrypting private keys with a passphrase. (do this in the next section,
also, but make sure to note that they should not have the same passphrase
as the CA, because the CA should be treated with extra special care, since
it has the authority to issue trusted certificates for everybody else)

Suggestion: Drop -nodes in the "openssl req" command, because you're not
outputting a private key and it's confusing. Besides, you almost never want
that, even on commands that do output private keys, if you're concerned
about protecting those keys.

Note: If you don't need the base64 encoded ASCII version (PEM), you can use
"-outform der" on the req command to output directly to the binary format,
because it's simpler, but it's probably best to keep it in, because you do
need the pem format in the next section (because, stupidly, the pkcs12
command doesn't understand der), so might as well be consistent and leave
it in two steps.

Suggestion: use the same alias in the JKS as the CN from the certificate.
It'll help users distinguish between keystores, independent of file names
(applies to the next section, also)

== Section: Generate a certificate/keystore per host

"By issuing individual certificates to each entity, it gives proper control
to revoke/reissue certificates to clients as necessary, without widespread
interruption."

Note: This is true, and certainly the reason you want separate
certificates, but you should mention that there is currently no feature in
Accumulo to check a revocation list (we certainly don't expose a
configuration item for it, I don't know of any built-in JSSE system
property for it, and I Thrift has any mechanism built-in either... but it
should, and we should expose it when it does).

Suggestion: Be consistent with the filename extension. No need to do .crt
here when you used .pem/.der earlier. Just stick to one or the other.
Certificate formats are confusing enough.

== Section: Configure Accumulo Servers

Suggestion: You should mention chmod and file permissions to ensure only
the services that need access to the files can read/alter them. This
applies to the CA key also.

== Section: Configure Accumulo Clients

Suggestion: s/simple properties file/simple Java

Re: Tablet server thrift issue

2014-09-01 Thread Corey Nolet
As an update,

I raised the tablet server memory and I have not seen this error thrown
since. I'd like to say raising the memory, alone, was the solution but it
appears that I also may be having some performance issues with the switches
connecting the racks together. I'll update more as I dive in further.


On Fri, Aug 22, 2014 at 11:41 PM, Corey Nolet  wrote:

> Josh,
>
> Your advice is definitely useful- I also thought about catching the
> exception and retrying with a fresh batch writer but the fact that the
> batch writer failure doesn't go away without being re-instantiated is
> really only a nuisance. The TabletServerBatchWriter could be designed much
> better, I agree, but that is not the root of the problem.
>
> The Thrift exception that is causing the issue is what I'd like to get to
> the bottom of. It's throwing the following:
>
> *TApplicationException: applyUpdates failed: out of sequence response *
>
> I've never seen this exception before in regular use of the client API-
> but I also just updated to 1.6.0. Google isn't showing anything useful for
> how exactly this exception could come about other than using a bad
> threading model- and I don't see any drastic changes or other user
> complaints on the mailing list that would validate that line of thought.
> Quite frankly, I'm stumped. This could be a Thrift exception related to a
> Thrift bug or something bad on my system and have nothing to do with
> Accumulo.
>
> Chris Tubbs mentioned to me earlier that he recalled Keith and Eric had
> seen the exception before and may remember what it was/how they fixed it.
>
>
> On Fri, Aug 22, 2014 at 10:58 PM, Josh Elser  wrote:
>
>> Don't mean to tell you that I don't think there might be a bug/otherwise,
>> that's pretty much just the limit of what I know about the server-side
>> sessions :)
>>
>> If you have concrete "this worked in 1.4.4" and "this happens instead
>> with 1.6.0", that'd make a great ticket :D
>>
>> The BatchWriter failure case is pretty rough, actually. Eric has made
>> some changes to help already (in 1.6.1, I think), but it needs an overhaul
>> that I haven't been able to make time to fix properly, either. IIRC, the
>> only guarantee you have is that all mutations added before the last flush()
>> happened are durable on the server. Anything else is a guess. I don't know
>> the specifics, but that should be enough to work with (and saving off
>> mutations shouldn't be too costly since they're stored serialized).
>>
>>
>> On 8/22/14, 5:44 PM, Corey Nolet wrote:
>>
>>> Thanks Josh,
>>>
>>> I understand about the session ID completely but the problem I have is
>>> that
>>> the exact same client code worked, line for line, just fine in 1.4.4 and
>>> it's acting up in 1.6.0. I also seem to remember the BatchWriter
>>> automatically creating a new session when one expired without an
>>> exception
>>> causing it to fail on the client.
>>>
>>> I know we've made changes since 1.4.4 but I'd like to troubleshoot the
>>> actual issue of the BatchWriter failing due to the thrift exception
>>> rather
>>> than just catching the exception and trying mutations again. The other
>>> issue is that I've already submitted a bunch of mutations to the batch
>>> writer from different threads. Does that mean I need to be storing them
>>> off
>>> twice? (once in the BatchWriter's cache and once in my own)
>>>
>>> The BatchWriter in my ingester is constantly sending data and the tablet
>>> servers have been given more than enough memory to be able to keep up.
>>> There's no swap being used and the network isn't experiencing any errors.
>>>
>>>
>>> On Fri, Aug 22, 2014 at 4:54 PM, Josh Elser 
>>> wrote:
>>>
>>>  If you get an error from a BatchWriter, you pretty much have to throw
 away
 that instance of the BatchWriter and make a new one. See ACCUMULO-2990.
 If
 you want, you should be able to catch/recover from this without having
 to
 restart the ingester.

 If the session ID is invalid, my guess is that it hasn't been used
 recently and the tserver cleaned it up. The exception logic isn't the
 greatest (as it just is presented to you as a RTE).

 https://issues.apache.org/jira/browse/ACCUMULO-2990


 On 8/22/14, 4:35 PM, Corey Nolet wrote:

  Eric & Keith, Chris mentioned to me that you guys have seen this issue
> before. Any ideas from anyone else are much appreciated as well.
>
> I recently updated a project's dependencies to Accumulo 1.6.0 built
> with
> Hadoop 2.3.0. I've got CDH 5.0.2 deployed. The project has an ingest
> component which is running all the time with a batch writer using many
> threads to push mutations into Accumulo.
>
> The issue I'm having is a show stopper. At different intervals of time,
> sometimes an hour, sometimes 30 minutes, I'm getting
> MutationsRejectedExceptions (server errors) from the
> TabletServerBatchWriter. Once they start, I need to restart the

Re: Review Request 22658: ACCUMULO-2889: Batch metadata updates for new WALs

2014-09-01 Thread Jonathan Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22658/
---

(Updated Sept. 1, 2014, 11:28 a.m.)


Review request for accumulo.


Changes
---

Incorporating feedback from kturner. Removed the use of ThreadLocal and 
batching of updates happens in TabletServerLogger instead of in the writer 
class.


Bugs: ACCUMULO-2889
https://issues.apache.org/jira/browse/ACCUMULO-2889


Repository: accumulo


Description
---

Added additional methods to the Writer class to handle batching. 

Potential risks are that we're now holding onto the locks for a bit longer than 
we used to. All tablets present in a batch will have their logLock's locked 
until the batch is complete. 


Diffs (updated)
-

  core/src/main/java/org/apache/accumulo/core/client/impl/Writer.java d6762e7 
  
server/base/src/main/java/org/apache/accumulo/server/util/InternalBatchWriter.java
 PRE-CREATION 
  
server/base/src/main/java/org/apache/accumulo/server/util/MetadataTableUtil.java
 463ca57 
  server/tserver/src/main/java/org/apache/accumulo/tserver/Tablet.java f9fdacb 
  server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java 
97f2090 
  
server/tserver/src/main/java/org/apache/accumulo/tserver/log/TabletServerLogger.java
 d25ee75 
  test/src/test/java/org/apache/accumulo/server/util/InternalBatchWriterIT.java 
PRE-CREATION 
  test/src/test/java/org/apache/accumulo/test/BatchMetadataUpdatesIT.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/22658/diff/


Testing
---

Added a new IT.

We insert a few entries to tablets and ensure that the relevant entries appear 
in the WAL and Metadata Table.
One test case that isn't included yet is verifying that root + metadata table 
entries are entered correctly.


Thanks,

Jonathan Park