Re: compression of keys for a sequential scan over an inverted index

2015-10-26 Thread Josh Elser

Definitely. This is pretty cool. Thanks for sharing.

I'll have to try to make some time to poke around with this :)

William Slacum wrote:

Thanks, Jonathan! I've wondered about specific numbers on this topic
when dealing with geohashes, so this is a very useful tool.

On Sun, Oct 25, 2015 at 11:22 AM, Jonathan Wonders mailto:jwonder...@gmail.com>> wrote:

I have been able to put some more thought into this over the weekend
and make initial observations on tables I currently have populated.
Looking at the rfile-info for a few different tables, I noticed that
one which has particularly small lexicographical deltas between keys
costs an average of ~2.5 bits per key to store on disk.  All of the
data is stored in the row component of the key and a full row is
typically about 36 bytes.  I wrote a little utility to recreate
ScanResultobjects for batches of sequential key-value pairs returned
from a scanner and then used the TCompactProtocol to write the
ScanResult to a byte array.  Each key-value pair costs rougly 48
bytes which makes sense given that every row is different and there
will be some space required for the timestamps, visibilities, and
other bookeeping info.

Another table I looked at has larger lexicographical deltas between
keys and costs roughly 5 bytes per key to store on disk.  This table
is a reverse index with very large rows, each column within a row
identifies data that resides in another table.  Each column is
rougly 12 bytes uncompressed.  When encoded in a ScanResult, each
key-value pair costs roughly 25 bytes which makes sense since the
row cost should be negligible for large batch sizes and the overhead
from timestamp, visibility, and other bookkeeping info is roungly
the same as the other table.

Since compression depends heavily on both table design and the
actual data, it seemed the next logical step would be to create a
tool that the community could use to easily measure the compression
ratio for ScanResultobjects. So, I threw together a shell extension
to wrap the utility that I previously described.  It measures
compression ratio for the default strategy Key.compress as well as a
few other simple strategies that seemed reasonable to test. The
usage is almost the same as the scan command, it just prints out
compression statistics rather than the data.

It lives at https://github.com/jwonders/accumulo-experiments with
branches for Accumulo 1.6.x, 1.7.x, and 1.8.x.

Any feedback is welcome.  I hope others find this useful for
understanding this particular aspect of scan performance.

V/R
Jonathan


On Thu, Oct 22, 2015 at 4:37 PM, Jonathan Wonders
mailto:jwonder...@gmail.com>> wrote:

Josh,

Thanks for the information.  I did read through the discussion
about compression of visibility expressions and columns within
RFiles a while back which got me thinking about some of this. It
makes sense that gzip or lzo/snappy compression would have a
very noticeable impact when there are columns or visibility
expressions that are not compressed with RLE even if neighboring
rows have very small lexicographical deltas.

I will put some thought into desiging an experiment to evaluate
whether or not there is any benefit to applying RLE during
key-value transport from server to client.  Even if it proves to
be situationally beneficial, I think it could be implemented as
a common iterator similar to the WholeRowIterator.

Given, the current compression strategy I would expect better
server-client transport compression retrieving a single row with
many columns

: []

compared to many lexicographically close rows.

: []

with the understanding that very large rows can lead to poor
load balancing.

V/R
Jonathan

On Thu, Oct 22, 2015 at 11:54 AM, Josh Elser
mailto:josh.el...@gmail.com>> wrote:

Jonathan Wonders wrote:

I have been digging into some details of Accumulo to
model the disk and
network costs associated with various types of scan
patterns and I have
a few questions regarding compression.

Assuming an inverted index table with rows following the
pattern of



and a scan that specifies an exact key and value so as
to constrain the
range, it seems that the dominant factor in network
utiltization would
be sending key-value pairs from the tablet server to the
client and a
secondary factor would be transmitting data from

Re: Is there a sensible way to do this? Sequential Batch Scanner

2015-10-28 Thread Josh Elser

Rob Povey wrote:

However I’m pretty reticent right now to add anymore iterators to our
project, they’ve been a test nightmare for us internally.


Off-topic, I'd like to hear more about what is painful. Do you have the 
time to fork off a thread and let us know how it hurts?


Re: Transaction type query in accumulo

2015-10-30 Thread Josh Elser

Hi Shweta,

shweta.agrawal wrote:

Hi,

Is transaction type facility available in Accumulo?
I have read about transaction in accumulo which says " Accumulo
guarantees these ACID properties for a single mutation (a set of changes
for a single row) but does not provide support for atomic updates across
multiple rows"


This might be easier to reason about if you consider the Java API.

When you make a Mutation, all updates in that mutation will be applied 
or rejected.


Mutation m = new Mutation("row".getBytes());
m.put("cf", "cq1", "value1");
m.put("cf", "cq2", "value2");
batchwriter.put(m);
batchwriter.close();

In this case, Accumulo will either have 2 K/V pairs in "row" ("cf:cq1" 
=> "value1", "cf:cq2" => "value2") or no K/V pairs in "row".




In my case:
If one thread is updating the fields of a document then this document
should be locked so that other thread cannot modify that document.

I am trying to achieve this by a query through conditional mutation. I
am checking whether the particular entry exist or not then updating. But
the problem is I am doing this through 150 threads. If one thread finds
and updating particular entry then other thread should not get it.

So is this the case in conditional write?

We are achieving same thing through mongoDB by find and modify feature.

If one thread get particular document to update from conditional write
then other thread should get that particular document.


I'm not 100% sure how best to go about this. Maybe you could use a 
special column in the row to do the exclusion?


write cf:lock 
_update columns_
delete cf:lock 

Keith probably has a better suggestion :)


Please provide your inputs

Thanks
Shweta


Re: Quick question re UnknownHostException

2015-11-13 Thread Josh Elser

:) never think to expect a DNS issue until you have a DNS issue

Josef Roehrl - PHEMI wrote:

Hi Everyone,

Turns out that it was a DNS server issue exactly.  Had to get this
confirmed by the Data Centre, though.

Thanks!

On Fri, Nov 13, 2015 at 12:25 PM, Josef Roehrl - PHEMI
mailto:jroe...@phemi.com>> wrote:

Hi All,

3 times in the past few weeks (twice on 1 system, once on another),
the master gets UnknownHostException (s), one by one, for each of
the tablet servers.  Then, it wants to stop them. Eventually, all
the tablet servers quit.

It goes like this for all the tablet servers:

12 08:14:01,0498tserver:6   20  
ERROR


error sending update to tserver3:9997: 
org.apache.thrift.transport.TTransportException: java.net.UnknownHostException

12 09:01:53,0352master:12   
ERROR


org.apache.thrift.transport.TTransportException: 
java.net.UnknownHostException

12 16:35:50,0672master:110  
ERROR


unable to get tablet server status tserver3:9997[250e6cd2c500012] 
org.apache.thrift.transport.TTransportException: java.net.UnknownHostException



I've redacted the real host names, of course.

This could be a DNS problem, though the system was running fine for
days before this happened (same scenario on the 2 systems with
really quite different DNS servers).

If any one has a hint or seen something like this, I would
appreciate any pointers.

I have looked at the JIRA issues regarding DNS outages, but nothing
seems to fit this pattern.

Thanks

--


Josef Roehrl
Senior Software Developer
*PHEMI Systems*
180-887 Great Northern Way
Vancouver, BC V5T 4T5
604-336-1119
Website  Twitter
 Linkedin








--


Josef Roehrl
Senior Software Developer
*PHEMI Systems*
180-887 Great Northern Way
Vancouver, BC V5T 4T5
604-336-1119
Website  Twitter
 Linkedin





Re: Accumulo thrift proxy: Log files?

2015-11-29 Thread Josh Elser
The Accumulo proxy acts more like a client application than a server 
process. As such, it uses the "client" logging configuration 
(log4j.properties) which writes to stderr and stdout instead of writing 
to log files.


Srikanth Viswanathan wrote:

Hi all,

Does anyone know where the accumulo thrift proxy puts its log files? I
started it using $ACCUMULO_HOME/bin/accumulo proxy -p config.properties
I looked in $ACCUMULO_HOME/logs but they aren't there. Am I looking in
the right place?

Thanks,
Srikanth


Re: maximum number of connectors

2015-11-30 Thread Josh Elser
Connector is tied to a specific user, so you're tied to a user for a 
given instance.


I'm not aware of any testing in that direction (lots of active 
connectors). Connectors aren't particularly heavy, you could keep some 
cache of recently used instances and recreate them when they were 
evicted from the cache due to inactivity.


The only fundamental limitation of concurrent Connector instances that I 
can think of is at the RPC level. Eventually, the RPCs that the 
Connector is making to Accumulo servers correlates to server-side 
resources which are finite. If you have some reasonable hardware, I 
don't think this is a real concern.


Would be curious to hear back how this works.

mohit.kaushik wrote:

I am creating a connector per user as every user has different
authorizations sets. I want to know, is there any limit on creating
Accumulo connectors, what is the maximum number of connector that
Accumulo can handle?. For example if My application will have 3M users,
Is it correct to create 3M connections for them or there is any way to
share connections for different users having different authorizations?

Thanks
Mohit Kaushik



Re: Trigger for Accumulo table

2015-12-02 Thread Josh Elser

Hi Thai,

There is no out-of-the-box feature provided with Accumulo that does what 
you're asking for. Accumulo doesn't provide any functionality to push 
notifications to other systems. You could potentially maintain other 
tables/columns in which you maintain the last time a row was updated, 
but the onus is on your "other services" to read the table to find out 
when a change occurred (which is probably not scalable at "real time").


There are other systems you could likely leverage to solve this, 
depending on the durability and scalability that your application needs.


For a system "close" to Accumulo, you could take a look at Fluo [1] 
which is an implementation of Google's "Percolator" system. This is a 
system based on throughput rather than low-latency, so it may not be a 
good fit for your needs. There are probably other systems in the Apache 
ecosystem (Kafka, Storm, Flink or Spark Streaming maybe?) that are be 
helpful to your problem. I'm not an expert on these to recommend on (nor 
do I think I understand your entire architecture well enough).


Thai Ngo wrote:

Hi list,

I have a use-case when existing rows in a table will be updated by an
internal service. Data in a row of this table is composed of 2 parts:
1st part - immutable and the 2nd one - will be updated (filled in) a
little later.

Currently, I have a need of knowing when and which rows will be updated
in the table so that other services will be wisely start consuming the
data. It will make more sense when I need to consume the data in near
realtime. So developing a notification function or simpler - a trigger
is what I really want to do now.

I am curious to know if someone has done similar job or there are
features or APIs or best practices available for Accumulo so far. I'm
thinking of letting the internal service which updates the data notify
us whenever it updates the data.

What do you think?

Thanks,
Thai


Re: Trigger for Accumulo table

2015-12-02 Thread Josh Elser

oops :)

[1] http://fluo.io/

Josh Elser wrote:

Hi Thai,

There is no out-of-the-box feature provided with Accumulo that does what
you're asking for. Accumulo doesn't provide any functionality to push
notifications to other systems. You could potentially maintain other
tables/columns in which you maintain the last time a row was updated,
but the onus is on your "other services" to read the table to find out
when a change occurred (which is probably not scalable at "real time").

There are other systems you could likely leverage to solve this,
depending on the durability and scalability that your application needs.

For a system "close" to Accumulo, you could take a look at Fluo [1]
which is an implementation of Google's "Percolator" system. This is a
system based on throughput rather than low-latency, so it may not be a
good fit for your needs. There are probably other systems in the Apache
ecosystem (Kafka, Storm, Flink or Spark Streaming maybe?) that are be
helpful to your problem. I'm not an expert on these to recommend on (nor
do I think I understand your entire architecture well enough).

Thai Ngo wrote:

Hi list,

I have a use-case when existing rows in a table will be updated by an
internal service. Data in a row of this table is composed of 2 parts:
1st part - immutable and the 2nd one - will be updated (filled in) a
little later.

Currently, I have a need of knowing when and which rows will be updated
in the table so that other services will be wisely start consuming the
data. It will make more sense when I need to consume the data in near
realtime. So developing a notification function or simpler - a trigger
is what I really want to do now.

I am curious to know if someone has done similar job or there are
features or APIs or best practices available for Accumulo so far. I'm
thinking of letting the internal service which updates the data notify
us whenever it updates the data.

What do you think?

Thanks,
Thai


Re: Can't connect to Accumulo

2015-12-03 Thread Josh Elser
Could be that the Accumulo services are only listening on localhost and 
not the "external" interface for your VM. To get a connector, that's a 
call to a TabletServer which run on 9997 by default (and you have open).


Do a `netstat -nape | fgrep 9997 | fgrep LISTEN` in your VM and see what 
interface the server is bound to. I'd venture a guess that you just need 
to put the FQDN for your VM in $ACCUMULO_CONF_DIR/slaves (and masters, 
monitor, gc, tracers, for completeness) instead of localhost.


Mike Thomsen wrote:

I have Accumulo running in a VM. This Groovy script will connect just
fine from within the VM, but outside of the VM it hangs at the first
println statement.

String instance = "test"
String zkServers = "localhost:2181"
String principal = "root";
AuthenticationToken authToken = new PasswordToken("testing1234");

ZooKeeperInstance inst = new ZooKeeperInstance(instance, zkServers);
println "Attempting connection"
Connector conn = inst.getConnector(principal, authToken);
println "Connected!"

This is the listing of ports I have opened up in Vagrant:

config.vm.network "forwarded_port", guest: 2122, host: 2122
   config.vm.network "forwarded_port", guest: 2181, host: 2181
   config.vm.network "forwarded_port", guest: 2888, host: 2888
   config.vm.network "forwarded_port", guest: 3888, host: 3888
   config.vm.network "forwarded_port", guest: 4445, host: 4445
   config.vm.network "forwarded_port", guest: 4560, host: 4560
   config.vm.network "forwarded_port", guest: 6379, host: 6379
   config.vm.network "forwarded_port", guest: 8020, host: 8020
   config.vm.network "forwarded_port", guest: 8030, host: 8030
   config.vm.network "forwarded_port", guest: 8031, host: 8031
   config.vm.network "forwarded_port", guest: 8032, host: 8032
   config.vm.network "forwarded_port", guest: 8033, host: 8033
   config.vm.network "forwarded_port", guest: 8040, host: 8040
   config.vm.network "forwarded_port", guest: 8042, host: 8042
   config.vm.network "forwarded_port", guest: 8081, host: 8081
   config.vm.network "forwarded_port", guest: 8082, host: 8082
   config.vm.network "forwarded_port", guest: 8088, host: 8088
   config.vm.network "forwarded_port", guest: 9000, host: 9000
   config.vm.network "forwarded_port", guest: 9092, host: 9092
   config.vm.network "forwarded_port", guest: 9200, host: 9200
   config.vm.network "forwarded_port", guest: 9300, host: 9300
   config.vm.network "forwarded_port", guest: 9997, host: 9997
   config.vm.network "forwarded_port", guest: , host: 
   #config.vm.network "forwarded_port", guest: 10001, host: 10001
   config.vm.network "forwarded_port", guest: 10002, host: 10002
   config.vm.network "forwarded_port", guest: 11224, host: 11224
   config.vm.network "forwarded_port", guest: 12234, host: 12234
   config.vm.network "forwarded_port", guest: 19888, host: 19888
   config.vm.network "forwarded_port", guest: 42424, host: 42424
   config.vm.network "forwarded_port", guest: 49707, host: 49707
   config.vm.network "forwarded_port", guest: 50010, host: 50010
   config.vm.network "forwarded_port", guest: 50020, host: 50020
   config.vm.network "forwarded_port", guest: 50070, host: 50070
   config.vm.network "forwarded_port", guest: 50075, host: 50075
   config.vm.network "forwarded_port", guest: 50090, host: 50090
   config.vm.network "forwarded_port", guest: 50091, host: 50091
   config.vm.network "forwarded_port", guest: 50095, host: 50095

Any ideas why it is not letting my connect? It just hangs and never even
seems to time out.

Thanks,

Mike


Re: Can't connect to Accumulo

2015-12-04 Thread Josh Elser
Each line in the Accumulo "hosts" files (masters, slaves, etc) denote a 
host which the process should be run on, FYI.


What does netstat show for ports  and 9997? Those are the two ports 
that your client should ever need to talk to for Accumulo, IIRC.


Mike Thomsen wrote:

I stopped all of the services, removed localhost and even reinitialized
the node. When I brought it back up, that Groovy script hangs at the
line right after it says it's attempting to get a connection. Even
Ubuntu's firewall is turned off.

On Fri, Dec 4, 2015 at 10:50 AM, Adam Fuchs mailto:afu...@apache.org>> wrote:

Mike,

I suspect if you get rid of the "localhost" line and restart
Accumulo then you will get services listening on the non-loopback
IPs. Right now you have some of your processes accessible outside
your VM and others only accessible from inside, and you probably
have two tablet servers when you should only have one.

Cheers,
Adam



On Fri, Dec 4, 2015 at 9:50 AM, Mike Thomsen mailto:mikerthom...@gmail.com>> wrote:

I tried adding some read/write examples and ran into a problem.
It would hang at the first scan or write operation I tried. I
checked the master port () and it was only listening on
127.0.0.1: <http://127.0.0.1:>. netstat had two entries
for 9997. This is what conf/masters has for my VM:

# limitations under the License.

localhost
vagrant-ubuntu-vivid-64

It's the same with all of the other files (slaves, gc, etc.)

Any ideas?

Thanks,

Mike

On Thu, Dec 3, 2015 at 3:54 PM, Mike Thomsen
mailto:mikerthom...@gmail.com>> wrote:

Thanks! That was all that I needed to do.

On Thu, Dec 3, 2015 at 3:33 PM, Josh Elser
mailto:josh.el...@gmail.com>> wrote:

Could be that the Accumulo services are only listening
on localhost and not the "external" interface for your
VM. To get a connector, that's a call to a TabletServer
which run on 9997 by default (and you have open).

Do a `netstat -nape | fgrep 9997 | fgrep LISTEN` in your
VM and see what interface the server is bound to. I'd
venture a guess that you just need to put the FQDN for
your VM in $ACCUMULO_CONF_DIR/slaves (and masters,
monitor, gc, tracers, for completeness) instead of
localhost.


Mike Thomsen wrote:

I have Accumulo running in a VM. This Groovy script
will connect just
fine from within the VM, but outside of the VM it
hangs at the first
println statement.

String instance = "test"
String zkServers = "localhost:2181"
String principal = "root";
AuthenticationToken authToken = new
PasswordToken("testing1234");

ZooKeeperInstance inst = new
ZooKeeperInstance(instance, zkServers);
println "Attempting connection"
Connector conn = inst.getConnector(principal,
authToken);
println "Connected!"

This is the listing of ports I have opened up in
Vagrant:

config.vm.network "forwarded_port", guest: 2122,
host: 2122
config.vm.network "forwarded_port", guest: 2181,
host: 2181
config.vm.network "forwarded_port", guest: 2888,
host: 2888
config.vm.network "forwarded_port", guest: 3888,
host: 3888
config.vm.network "forwarded_port", guest: 4445,
host: 4445
config.vm.network "forwarded_port", guest: 4560,
host: 4560
config.vm.network "forwarded_port", guest: 6379,
host: 6379
config.vm.network "forwarded_port", guest: 8020,
host: 8020
config.vm.network "forwarded_port", guest: 8030,
host: 8030
config.vm.network "forwarded_port", guest: 8031,
host: 8031
config.vm.network "forwarded_port", guest: 8032,
host: 8032
config.vm.network "forwarded_port", guest: 8033,
host: 8033

Re: Can't connect to Accumulo

2015-12-04 Thread Josh Elser

Interesting. What version of Accumulo are you using?

Also, can you jstack your client application, maybe we can get a hint 
where it's stuck. You could also try increase the Log4j level in your 
client application for the 'org.apache.accumulo.core' package to DEBUG 
or TRACE.


Even better, if this is something you can share (making assumptions 
since it's Vagrant-based), feel free to. I'll try to run your example 
and poke around myself.


Mike Thomsen wrote:

This is the output from netstat:

vagrant@vagrant-ubuntu-vivid-64:/opt/accumulo$ netstat -nape | fgrep
 | fgrep LISTEN
(Not all processes could be identified, non-owned process info
  will not be shown, you would have to be root to see it all.)
tcp0  0 10.0.2.15: <http://10.0.2.15:>
0.0.0.0:*   LISTEN  1000   35450   3809/java
vagrant@vagrant-ubuntu-vivid-64:/opt/accumulo$ netstat -nape | fgrep
9997 | fgrep LISTEN
(Not all processes could be identified, non-owned process info
  will not be shown, you would have to be root to see it all.)
tcp0  0 10.0.2.15:9997 <http://10.0.2.15:9997>
0.0.0.0:*   LISTEN  1000   35962   3655/java

On Fri, Dec 4, 2015 at 12:35 PM, Josh Elser mailto:josh.el...@gmail.com>> wrote:

Each line in the Accumulo "hosts" files (masters, slaves, etc)
denote a host which the process should be run on, FYI.

What does netstat show for ports  and 9997? Those are the two
ports that your client should ever need to talk to for Accumulo, IIRC.

Mike Thomsen wrote:

I stopped all of the services, removed localhost and even
reinitialized
the node. When I brought it back up, that Groovy script hangs at the
line right after it says it's attempting to get a connection. Even
Ubuntu's firewall is turned off.

On Fri, Dec 4, 2015 at 10:50 AM, Adam Fuchs mailto:afu...@apache.org>
<mailto:afu...@apache.org <mailto:afu...@apache.org>>> wrote:

 Mike,

 I suspect if you get rid of the "localhost" line and restart
 Accumulo then you will get services listening on the
non-loopback
 IPs. Right now you have some of your processes accessible
outside
 your VM and others only accessible from inside, and you
probably
 have two tablet servers when you should only have one.

 Cheers,
 Adam



 On Fri, Dec 4, 2015 at 9:50 AM, Mike Thomsen
mailto:mikerthom...@gmail.com>
<mailto:mikerthom...@gmail.com <mailto:mikerthom...@gmail.com>>>
wrote:

 I tried adding some read/write examples and ran into a
problem.
 It would hang at the first scan or write operation I
tried. I
 checked the master port () and it was only listening on
127.0.0.1: <http://127.0.0.1:> <http://127.0.0.1:>.
netstat had two entries
 for 9997. This is what conf/masters has for my VM:

 # limitations under the License.

 localhost
 vagrant-ubuntu-vivid-64

 It's the same with all of the other files (slaves, gc,
etc.)

 Any ideas?

 Thanks,

 Mike

 On Thu, Dec 3, 2015 at 3:54 PM, Mike Thomsen
mailto:mikerthom...@gmail.com>
<mailto:mikerthom...@gmail.com <mailto:mikerthom...@gmail.com>>>
wrote:

 Thanks! That was all that I needed to do.

 On Thu, Dec 3, 2015 at 3:33 PM, Josh Elser
mailto:josh.el...@gmail.com>
<mailto:josh.el...@gmail.com <mailto:josh.el...@gmail.com>>> wrote:

 Could be that the Accumulo services are only
listening
 on localhost and not the "external" interface
for your
 VM. To get a connector, that's a call to a
TabletServer
 which run on 9997 by default (and you have open).

 Do a `netstat -nape | fgrep 9997 | fgrep
LISTEN` in your
 VM and see what interface the server is bound
to. I'd
 venture a guess that you just need to put the
FQDN for
 your VM in $ACCUMULO_CONF_DIR/slaves (and masters,
 monitor, gc, tracers, for completeness) instead of
 localhost.


 Mike Thomsen wrote:

 I have Accumulo running in a VM. This
Groovy script
 will connect just
 fine fro

Re: Can't connect to Accumulo

2015-12-07 Thread Josh Elser

Mike sent me a tarball of his Vagrant VM.

Following my own advice (via the --debug option on the shell):

2015-12-07 23:35:50,969 [rpc.ThriftUtil] TRACE: Opening normal transport
2015-12-07 23:35:50,969 [rpc.ThriftUtil] WARN : Failed to open transport 
to vagrant-ubuntu-vivid-64:9997
2015-12-07 23:35:50,969 [impl.ThriftTransportPool] DEBUG: Failed to 
connect to vagrant-ubuntu-vivid-64:9997 (12)
org.apache.thrift.transport.TTransportException: 
java.net.UnknownHostException
	at 
org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:313)
	at 
org.apache.accumulo.core.client.impl.ThriftTransportPool.createNewTransport(ThriftTransportPool.java:478)
	at 
org.apache.accumulo.core.client.impl.ThriftTransportPool.getAnyTransport(ThriftTransportPool.java:466)
	at 
org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:141)
	at 
org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:117)
	at 
org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:113)
	at 
org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerClient.java:95)
	at 
org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:61)
	at 
org.apache.accumulo.core.client.impl.ConnectorImpl.(ConnectorImpl.java:67)
	at 
org.apache.accumulo.core.client.ZooKeeperInstance.getConnector(ZooKeeperInstance.java:248)

at org.apache.accumulo.shell.Shell.config(Shell.java:362)
at org.apache.accumulo.shell.Shell.execute(Shell.java:571)
at org.apache.accumulo.start.Main$1.run(Main.java:93)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException
at sun.nio.ch.Net.translateException(Net.java:181)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:139)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:82)
	at 
org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:55)
	at 
org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:48)
	at 
org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:310)

... 13 more
2015-12-07 23:35:50,969 [impl.ServerClient] DEBUG: ClientService request 
failed null, retrying ...
org.apache.thrift.transport.TTransportException: Failed to connect to a 
server
	at 
org.apache.accumulo.core.client.impl.ThriftTransportPool.getAnyTransport(ThriftTransportPool.java:474)
	at 
org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:141)
	at 
org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:117)
	at 
org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:113)
	at 
org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerClient.java:95)
	at 
org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:61)
	at 
org.apache.accumulo.core.client.impl.ConnectorImpl.(ConnectorImpl.java:67)
	at 
org.apache.accumulo.core.client.ZooKeeperInstance.getConnector(ZooKeeperInstance.java:248)

at org.apache.accumulo.shell.Shell.config(Shell.java:362)
at org.apache.accumulo.shell.Shell.execute(Shell.java:571)
at org.apache.accumulo.start.Main$1.run(Main.java:93)
at java.lang.Thread.run(Thread.java:745)

Accumulo is using the FQDN of the VM. Adding in the proper entries to 
/etc/hosts on my local machine let me open the Accumulo shell locally 
(not in the VM.


Josh Elser wrote:

Interesting. What version of Accumulo are you using?

Also, can you jstack your client application, maybe we can get a hint
where it's stuck. You could also try increase the Log4j level in your
client application for the 'org.apache.accumulo.core' package to DEBUG
or TRACE.

Even better, if this is something you can share (making assumptions
since it's Vagrant-based), feel free to. I'll try to run your example
and poke around myself.

Mike Thomsen wrote:

This is the output from netstat:

vagrant@vagrant-ubuntu-vivid-64:/opt/accumulo$ netstat -nape | fgrep
 | fgrep LISTEN
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 10.0.2.15: <http://10.0.2.15:>
0.0.0.0:* LISTEN 1000 35450 3809/java
vagrant@vagrant-ubuntu-vivid-64:/opt/accumulo$ netstat -nape | fgrep
9997 | fgrep LISTEN
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 10.0.2.15:9997 <http://10.0.2.15:9997>
0.0.0.0:* LISTEN 1000 35962 3655/java

On Fri, Dec 4, 2015 at 12:35 PM, Josh Elser mailto:josh.el...@gmail.com>> wrote:

Each line in the Accumulo "hosts" files (masters, slaves, etc)
denote a host which the process should be run on, FYI.

What does netstat show for ports  and 9997? Those are the two
ports that your client should ever need to talk to for Accumulo, IIRC

Re: Can't connect to Accumulo

2015-12-08 Thread Josh Elser
Oh, well then. I didn't try running that groovy script. I can do that 
tonight :)


Mike Thomsen wrote:

The odd part is that I can do that too, but I can't connect via the
Groovy script that is in /vagrant_data (accumulo.groovy; Groovy
distribution in /vagrant_data/groovy) from outside the VM. Inside the
VM, it works just fine.

On Mon, Dec 7, 2015 at 11:40 PM, Josh Elser mailto:josh.el...@gmail.com>> wrote:

Mike sent me a tarball of his Vagrant VM.

Following my own advice (via the --debug option on the shell):

2015-12-07 23:35:50,969 [rpc.ThriftUtil] TRACE: Opening normal transport
2015-12-07 23:35:50,969 [rpc.ThriftUtil] WARN : Failed to open
transport to vagrant-ubuntu-vivid-64:9997
2015-12-07 23:35:50,969 [impl.ThriftTransportPool] DEBUG: Failed to
connect to vagrant-ubuntu-vivid-64:9997 (12)
org.apache.thrift.transport.TTransportException:
java.net.UnknownHostException
 at

org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:313)
 at

org.apache.accumulo.core.client.impl.ThriftTransportPool.createNewTransport(ThriftTransportPool.java:478)
 at

org.apache.accumulo.core.client.impl.ThriftTransportPool.getAnyTransport(ThriftTransportPool.java:466)
 at

org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:141)
 at

org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:117)
 at

org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:113)
 at

org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerClient.java:95)
 at

org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:61)
 at

org.apache.accumulo.core.client.impl.ConnectorImpl.(ConnectorImpl.java:67)
 at

org.apache.accumulo.core.client.ZooKeeperInstance.getConnector(ZooKeeperInstance.java:248)
 at org.apache.accumulo.shell.Shell.config(Shell.java:362)
 at org.apache.accumulo.shell.Shell.execute(Shell.java:571)
 at org.apache.accumulo.start.Main$1.run(Main.java:93)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException
 at sun.nio.ch.Net
<http://sun.nio.ch.Net>.translateException(Net.java:181)
 at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:139)
 at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:82)
 at

org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:55)
 at

org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:48)
 at

org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:310)
 ... 13 more
2015-12-07 23:35:50,969 [impl.ServerClient] DEBUG: ClientService
request failed null, retrying ...
org.apache.thrift.transport.TTransportException: Failed to connect
to a server
 at

org.apache.accumulo.core.client.impl.ThriftTransportPool.getAnyTransport(ThriftTransportPool.java:474)
 at

org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:141)
 at

org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:117)
 at

org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:113)
 at

org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerClient.java:95)
 at

org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:61)
 at

org.apache.accumulo.core.client.impl.ConnectorImpl.(ConnectorImpl.java:67)
 at

org.apache.accumulo.core.client.ZooKeeperInstance.getConnector(ZooKeeperInstance.java:248)
 at org.apache.accumulo.shell.Shell.config(Shell.java:362)
 at org.apache.accumulo.shell.Shell.execute(Shell.java:571)
 at org.apache.accumulo.start.Main$1.run(Main.java:93)
 at java.lang.Thread.run(Thread.java:745)

Accumulo is using the FQDN of the VM. Adding in the proper entries
to /etc/hosts on my local machine let me open the Accumulo shell
locally (not in the VM.


Josh Elser wrote:

Interesting. What version of Accumulo are you using?

Also, can you jstack your client application, maybe we can get a
hint
where it's stuck. You could also try increase the Log4j level in
your
client application for the 'org.apache.accumulo.core' package to
DEBUG
or TRACE.

Even better, if this is something you can share (making assumptions
since it's Vagrant-based), feel free to. I'll try to run your
example
and poke around myself.

 

Re: Trigger for Accumulo table

2015-12-08 Thread Josh Elser

Christopher wrote:

Look at org.apache.accumulo.core.constraints.Constraint for a
description and
org.apache.accumulo.core.constraints.DefaultKeySizeConstraint as an example.

In short, Mutations which are live-ingested into a tablet server are
validated against constraints you specify on the table. That means that
all Mutations written to a table go through this bit of user-provided
code at least once. You could use that fact to your advantage. However,
this would be highly experimental and might have some caveats to consider.


Yeah, big point here. You'd not be using a feature as it was designed. 
It might work, but you'd probably be on your own if the internals change.


I'd hate to see you go this route and have it break in a future release 
unexpectedly.



You can configure a constraint on a table with
connector.tableOperations().addConstraint(...)

On Sun, Dec 6, 2015 at 10:49 PM Thai Ngo mailto:baothai...@gmail.com>> wrote:

Christopher,

This is interesting! Could you please give me more details about this?

Thanks,
Thai

On Thu, Dec 3, 2015 at 12:17 PM, Christopher mailto:ctubb...@apache.org>> wrote:

You could also implement a constraint to notify an external
system when a row is updated.


On Wed, Dec 2, 2015, 22:54 Josh Elser mailto:josh.el...@gmail.com>> wrote:

oops :)

[1] http://fluo.io/

Josh Elser wrote:
 > Hi Thai,
 >
 > There is no out-of-the-box feature provided with Accumulo
that does what
 > you're asking for. Accumulo doesn't provide any
functionality to push
 > notifications to other systems. You could potentially
maintain other
 > tables/columns in which you maintain the last time a row
was updated,
 > but the onus is on your "other services" to read the
table to find out
 > when a change occurred (which is probably not scalable at
"real time").
 >
 > There are other systems you could likely leverage to
solve this,
 > depending on the durability and scalability that your
application needs.
 >
 > For a system "close" to Accumulo, you could take a look
at Fluo [1]
 > which is an implementation of Google's "Percolator"
system. This is a
 > system based on throughput rather than low-latency, so it
may not be a
 > good fit for your needs. There are probably other systems
in the Apache
 > ecosystem (Kafka, Storm, Flink or Spark Streaming maybe?)
that are be
 > helpful to your problem. I'm not an expert on these to
recommend on (nor
 > do I think I understand your entire architecture well
enough).
 >
 > Thai Ngo wrote:
 >> Hi list,
 >>
 >> I have a use-case when existing rows in a table will be
updated by an
 >> internal service. Data in a row of this table is
composed of 2 parts:
 >> 1st part - immutable and the 2nd one - will be updated
(filled in) a
 >> little later.
 >>
 >> Currently, I have a need of knowing when and which rows
will be updated
 >> in the table so that other services will be wisely start
consuming the
 >> data. It will make more sense when I need to consume the
data in near
 >> realtime. So developing a notification function or
simpler - a trigger
 >> is what I really want to do now.
 >>
 >> I am curious to know if someone has done similar job or
there are
 >> features or APIs or best practices available for
Accumulo so far. I'm
 >> thinking of letting the internal service which updates
the data notify
 >> us whenever it updates the data.
 >>
 >> What do you think?
 >>
 >> Thanks,
 >> Thai




Re: Can't connect to Accumulo

2015-12-09 Thread Josh Elser

Seems like it has something to do with you, because it worked fine for me.

I adapted 
http://stackoverflow.com/questions/11504197/groovy-configuring-logging-properties-depending-on-environment 
and it connected to Accumulo just fine.


https://paste.apache.org/egVb is the outline of the modifications I 
made. Maybe the extra debug will help you figure out why it isn't 
working for you.


Mike Thomsen wrote:

FWIW, I tried this VM as well and it failed. I forwarded the accumulo
ports with Vagrant and still nothing so it might be our corporate
environment.

https://github.com/MammothData/accumulo-vagrant

On Tue, Dec 8, 2015 at 3:16 PM, Josh Elser mailto:josh.el...@gmail.com>> wrote:

Oh, well then. I didn't try running that groovy script. I can do
that tonight :)

Mike Thomsen wrote:

The odd part is that I can do that too, but I can't connect via the
Groovy script that is in /vagrant_data (accumulo.groovy; Groovy
distribution in /vagrant_data/groovy) from outside the VM.
Inside the
VM, it works just fine.

On Mon, Dec 7, 2015 at 11:40 PM, Josh Elser
mailto:josh.el...@gmail.com>
<mailto:josh.el...@gmail.com <mailto:josh.el...@gmail.com>>> wrote:

 Mike sent me a tarball of his Vagrant VM.

 Following my own advice (via the --debug option on the shell):

 2015-12-07 23:35:50,969 [rpc.ThriftUtil] TRACE: Opening
normal transport
 2015-12-07 23:35:50,969 [rpc.ThriftUtil] WARN : Failed to open
 transport to vagrant-ubuntu-vivid-64:9997
 2015-12-07 23:35:50,969 [impl.ThriftTransportPool] DEBUG:
Failed to
 connect to vagrant-ubuntu-vivid-64:9997 (12)
 org.apache.thrift.transport.TTransportException:
 java.net.UnknownHostException
  at


org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:313)
  at


org.apache.accumulo.core.client.impl.ThriftTransportPool.createNewTransport(ThriftTransportPool.java:478)
  at


org.apache.accumulo.core.client.impl.ThriftTransportPool.getAnyTransport(ThriftTransportPool.java:466)
  at


org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:141)
  at


org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:117)
  at


org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:113)
  at


org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerClient.java:95)
  at


org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:61)
  at


org.apache.accumulo.core.client.impl.ConnectorImpl.(ConnectorImpl.java:67)
  at


org.apache.accumulo.core.client.ZooKeeperInstance.getConnector(ZooKeeperInstance.java:248)
  at
org.apache.accumulo.shell.Shell.config(Shell.java:362)
  at
org.apache.accumulo.shell.Shell.execute(Shell.java:571)
  at org.apache.accumulo.start.Main$1.run(Main.java:93)
  at java.lang.Thread.run(Thread.java:745)
 Caused by: java.net.UnknownHostException
  at sun.nio.ch.Net <http://sun.nio.ch.Net>
<http://sun.nio.ch.Net>.translateException(Net.java:181)

  at
sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:139)
  at
sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:82)
  at


org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:55)
  at


org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:48)
  at


org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:310)
  ... 13 more
 2015-12-07 23:35:50,969 [impl.ServerClient] DEBUG:
ClientService
 request failed null, retrying ...
 org.apache.thrift.transport.TTransportException: Failed to
connect
 to a server
  at


org.apache.accumulo.core.client.impl.ThriftTransportPool.getAnyTransport(ThriftTransportPool.java:474)
  at


org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:141)
  at


org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:117)
  at


org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:113)
  at


org.apach

Re: Multiple talbet servers on the same host

2015-12-10 Thread Josh Elser
Ahh, there is another port to change for TabletServers. I forgot about 
this when I read Billie's initial reply.


You can set replication.receipt.service.port=0 (if you don't care what 
port it uses) or set a unique port in conf2. This port is for a service 
running on each TabletServer which is part of the data-center 
replication feature (new in 1.7).


Sven Hodapp wrote:

Hi Billie,

it seems that in Accumulo 1.7 there is no ACCUMULO_PID_DIR anymore, it is 
generated in the start-server.sh script with ps.

So I've tried this:

$ export ACCUMULO_CONF_DIR="${ACCUMULO_CONF_DIR:$ACCUMULO_HOME/conf2}"
$ bin/accumulo tserver --address serverIP

This will start the tserver on port of conf2 (currently 29997), but tries to 
open another port which is in use:

Unable to start TServer
org.apache.thrift.transport.TTransportException: Could not create 
ServerSocket on address serverIP:10002.

I think it is in use by the other tablet server...

Regards,
Sven



Re: Multiple talbet servers on the same host

2015-12-10 Thread Josh Elser

Sven Hodapp wrote:

I've read in the Accumulo Book (p. 496) that it should be possible to start on 
a (fat) machine multiple tablet servers to scale (also) vertically. Sadly it's 
not described how to do it. Also I didn't find anything about this issue in the 
official documentation.


I've just created https://issues.apache.org/jira/browse/ACCUMULO-4072 
for us to get some documentation into the Accumulo User Manual on the 
matter. Thanks for letting us know we were missing this.


Re: Multiple talbet servers on the same host

2015-12-11 Thread Josh Elser

Great! Glad to hear you got it working.

For reference, you could also set tserver.port.client=0 and have 
consistent accumulo-site.xml files. Accumulo don't need static ports for 
the TabletServer thrift server.


Thanks for sharing instructions. I'm sure this will be useful.

Sven Hodapp wrote:

Hi Josh,
HI Billie,

now it works, thank you both!

So in short:

1. Copy ACCUMULO_HOME to another distinct directory
2. export ACCUMULO_HOME=/path/to/new/directory
3. Edit conf/accumulo-site.xml in the new directory

   
 tserver.port.client
 29997  
   

   
 replication.receipt.service.port
 0  
   

4. bin/tup.sh to start the tablet servers

Maybe you can recycle this for the user manual.

Regards,
Sven



Re: Table Statistics

2015-12-15 Thread Josh Elser
No, we don't have any means to obtain table statistics through our 
public API (the collection of code which we guarantee stability on).


We have https://issues.apache.org/jira/browse/ACCUMULO-3206 open for 
future work, but no one has started working on it.


Dylan's comment from that mail thread you linked to is likely to still 
be functional (or very close to it) in 1.7.


peter.mar...@baesystems.com wrote:

Hi,

I was wondering if there is any “recognized” way to obtain table statistics.

Ideally, given a Key range I would like to know the number of distinct
rowids, entries and amount of data (in bytes) in that key range.

I assume that Accumulo holds at least some of this information
internally, partly because I can see some of this

through the monitor, and partly because it must know something about the
quantity of data held in order to be able

to implement the table threshold.

In my case the tables are very static and so the “estimates” that the
monitor has are likely to sufficiently accurate for my purposes.

I have found this link

http://apache-accumulo.1065345.n5.nabble.com/Determining-tablets-assigned-to-table-splits-and-the-number-of-rows-in-each-tablet-td11546.html

which describes a process (which I haven’t tried yet) to get the number
of entries in a range.

Which would probably be sufficient for me and would certainly be a good
start.

However it seems to be using internal data structures and non-published
APIs, which is less than ideal.

And it seems to be written against Accumulo version 1.6.

I’m using Accumulo 1.7. Is there anything better than I can do or is it
recommended that this is the way to go?

Regards,

Z

Please consider the environment before printing this email. This message
should be regarded as confidential. If you have received this email in
error please notify the sender and destroy it immediately. Statements of
intent shall only become binding when confirmed in hard copy by an
authorised signatory. The contents of this email may relate to dealings
with other companies under the control of BAE Systems Applied
Intelligence Limited, details of which can be found at
http://www.baesystems.com/Businesses/index.htm.


Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-17 Thread Josh Elser

mohit.kaushik wrote:

On 12/16/2015 09:07 PM, Eric Newton wrote:

I was making the huge assumption that your client runs with the
accumulo scripts and it is not one of the accumulo known start points:
in this case, it is given the JVM parameters of ACCUMULO_OTHER_OPTS.

Perhaps I am asking something very obvious but I did not find any
guidelines to run clients with accumulo scripts and still i am not clear
how ACCUMULO_OTHER_OPTS which are set on server affects clients.


Yeah, I wanted to make this point too. ACCUMULO_OTHER_OPTS is only 
relevant if you're using the `accumulo` command to run your code. I 
don't think I would call it "common" for client applications to be 
runnable by the `accumulo` script -- most times your code is completely 
disjoint from the servers.


If you're launching it yourself, you would need to set the proper JVM 
heap/GC properties on your own: e.g. increase -Xmx, turn on 
-XX:+UseConcMarkSweepGC. These two should be enough for basic tweaking :)


Re: Mutation Rejected exception with server Error 1

2015-12-23 Thread Josh Elser

Eric Newton wrote:


Failure to talk to zookeeper is *really* unexpected.

Have you noticed your nodes using any significant swap?


Emphasis on this. Failing to connect to ZooKeeper for 60s (2*30) is a 
very long time (although, I think I have seen JVM GC pauses longer before).


A couple of generic ZooKeeper questions:

1. Can you share your zoo.cfg?

2. Make sure that ZooKeeper has a "dedicated" drive for it's dataDir. 
HDFS DataNodes using the same drive as ZooKeeper for its transaction log 
can cause ZooKeeper to be starved for I/O throughput. A normal 
"spinning" disk is also better for ZK over SSDs (last I read).


3. Check OS/host level metrics on these ZooKeeper hosts during the times 
you see these failures.


4. Consider moving your ZooKeeper hosts to "less busy" nodes if you can. 
You can consider adding more ZooKeeper hosts to the quorum, but keep in 
mind that this will increase the minimum latency for ZooKeeper 
operations (as more nodes need to acknowledge updates n/2 + 1)


Re: Accumulo Shell logging WARN : Found no client.conf in default paths. need suggestion.

2015-12-24 Thread Josh Elser

Make sure you're using the right configuration files (ACCUMULO_HOME and/or 
ACCUMULO_CONF_DIR environment variables).

If the monitor says that there are tabletservers running, your client is likely 
trying to connect to an instance which isn't running (they're not looking at 
the same place)

A missing client.conf is not going to cause these issues for an out-of-the-box 
configuration (although it's curious that the example configurations didn't 
contain one).

Ziaur Rahman wrote:

Hi,
Accumulo dev, My accumulo-1.7.0 starts well with hadoop-2,7.1 and 
zookeeper-3.4.6 in 1GB conf native-standalone/standalone mode. But 
when I go to start shell it shows error like-


"zia@zia-Aspire-V5-473:~$ 
/home/zia/Installs/dbe/accumulo-1.7.0/bin/accumulo shell -u root -p 
asdfghjk


2015-12-24 15:17:05,544 [client.ClientConfiguration] WARN : Found no 
client.conf in default paths. Using default client configuration values.


2015-12-24 15:17:05,549 [client.ClientConfiguration] WARN : Found no 
client.conf in default paths. Using default client configuration values.


2015-12-24 15:17:06,476 [client.ClientConfiguration] WARN : Found no 
client.conf in default paths. Using default client configuration values.


2015-12-24 15:17:06,720 [trace.DistributedTrace] INFO : SpanReceiver 
org.apache.accumulo.tracer.ZooTraceClient was loaded successfully.


2015-12-24 15:17:06,772 [impl.ServerClient] WARN : There are no tablet 
servers: check that zookeeper and accumulo are running."



In accumulo monitor shows- 1 tabletserver, 3 tables and 6 localhost 
clients.


What can I do now? Would anybody show light on the problem?


Re: Map Lexicoder

2015-12-28 Thread Josh Elser
Looks like you would have to implement some kind of ComparableMap to be 
able to use the PairLexicoder (see that the parameterization requires 
both types in the Pair to implement Comparable). The Pair lexicoder 
requires these Comparable types to align itself with the original goal 
of the Lexicoders: provide byte-array serialization for types whose sort 
order matches the original object's ordering.


Typically, when we have key to value style data we want to put in 
Accumulo, it makes sense to leverage the Column Qualifier and the Value, 
instead of serializing everything into one Accumulo Value. Iterators 
make it easy to do server-side predicates and transformations. My hunch 
is that this is another reason why you don't already see a MapLexicoder 
provided.


One technical difficulty you might run into implementing a generalized 
MapLexicoder is how you delimit the key and value in one pair and how 
you delimit many pairs from each other. Commonly, the "null" byte (\x00) 
is used as a separator since it doesn't often appear in user-data. I'm 
not sure if some of the other Lexicoders already use this in their 
serialization (e.g. the ListLexicoder might, I haven't looked at the 
code). Nesting Lexicoders generically might be tricky (although not 
impossible) -- thought it was worth mentioning to make sure you thought 
about it.


Adam J. Shook wrote:

Hello all,

Any suggestions for using a Map Lexicoder (or implementing one)?  I am
currently using a new ListLexicoder(new PairLexicoder(some lexicoder,
some lexicoder), which is working for single maps.  However, when one of
the lexicoders in the Pair is itself a Map (and therefore another
ListLexicoder(PairLexicoder)), an exception is being thrown because
ArrayList is not Comparable.

Regards,
--Adam


Re: Map Lexicoder

2015-12-28 Thread Josh Elser

Gotcha, thanks for the background.

I think as long as you can preserve the same level of compatibility with 
the other lexicoders, this would be a nice addition. If it's an itch you 
want to scratch, others probably will want to do the same too :)


Keith probably knows the most about what current works off the top of 
his head (since he wrote the Lexicoders, IIRC), but I imagine he's 
taking some time off work and isn't watch the list mailing list closely.


If you get stuck with how to implement this, let me know and I can try 
to poke around at the implementation too.


Adam J. Shook wrote:

Hi Josh,

Thanks for the advice.  I'm with you on using the CQ and Value instead
of putting the whole map into a Value, but what I am working on is using
the relational model of mapping data to Accumulo and expects the value
of the cell to be in the Value.  Certainly some optimization
opportunities by using the 'better' ways for storing data in Accumulo,
but I'd like to get this working before diving into that rabbit hole.

A brief look at the ListLexicoder encodes each element of the list using
a sub-lexicoder and escapes each element (0x00 -> 0x01 0x01 and 0x01 ->
0x01 0x02).  The voodoo here escapes me a little (pun!), but it seems to
be enough to enable multi-dimensional arrays encoded by nesting
ListLexicoders (up to 4D, haven't tried a fifth dimension).  I would
expect something similar could be done using a Map.  Would a
MapLexicoder be something worth contributing to the project?  I'd be
happy to give it a stab.

--Adam

On Mon, Dec 28, 2015 at 12:21 PM, Josh Elser mailto:josh.el...@gmail.com>> wrote:

Looks like you would have to implement some kind of ComparableMap to
be able to use the PairLexicoder (see that the parameterization
requires both types in the Pair to implement Comparable). The Pair
lexicoder requires these Comparable types to align itself with the
original goal of the Lexicoders: provide byte-array serialization
for types whose sort order matches the original object's ordering.

Typically, when we have key to value style data we want to put in
Accumulo, it makes sense to leverage the Column Qualifier and the
Value, instead of serializing everything into one Accumulo Value.
Iterators make it easy to do server-side predicates and
transformations. My hunch is that this is another reason why you
don't already see a MapLexicoder provided.

One technical difficulty you might run into implementing a
generalized MapLexicoder is how you delimit the key and value in one
pair and how you delimit many pairs from each other. Commonly, the
"null" byte (\x00) is used as a separator since it doesn't often
appear in user-data. I'm not sure if some of the other Lexicoders
already use this in their serialization (e.g. the ListLexicoder
might, I haven't looked at the code). Nesting Lexicoders generically
might be tricky (although not impossible) -- thought it was worth
mentioning to make sure you thought about it.


Adam J. Shook wrote:

Hello all,

Any suggestions for using a Map Lexicoder (or implementing
one)?  I am
currently using a new ListLexicoder(new PairLexicoder(some
lexicoder,
some lexicoder), which is working for single maps.  However,
when one of
the lexicoders in the Pair is itself a Map (and therefore another
ListLexicoder(PairLexicoder)), an exception is being thrown because
ArrayList is not Comparable.

Regards,
--Adam




Re: Map Lexicoder

2015-12-30 Thread Josh Elser
Yeah, I could see Key ordering or Entry ordering both make sense 
depending on the context :)


Adam J. Shook wrote:

I personally think all maps with the same keys should be sorted
together, but I think it'd be best to support and document both ways and
leave that up to the user.  I'm sure either way could be argued, and
this is certainly an edge case for lexicoders.

On Tue, Dec 29, 2015 at 7:16 PM, Keith Turner mailto:ke...@deenlo.com>> wrote:

Do you want all maps with the same keys to sort together?  If so
doing abc123 would make that happen.

The map data below

   { a : 1 , b : 2, c : 3 }
   { a : 1 , x : 8, y : 9 }
   { a : 4 , b : 5, c : 6 }

Would sort like the following if encoding key values pairs

   a1b2c3
   a1x8y9
   a4b5c6

If encoding all key and then all values, it would sort as follows

   abc123
   abc456
   axy189



On Tue, Dec 29, 2015 at 4:53 PM, Adam J. Shook mailto:adamjsh...@gmail.com>> wrote:

Agreed, I came to the same conclusion while implementing.  The
final result that I have is a SortedMapLexicoder to avoid any
comparisons going haywire.  Additionally, would it be best to
encode the map as an array of keys followed by an array of
values, or encode all key value pairs back-to-back:

{ a : 1 , b : 2, c : 3 } encoded as

a1b2c3
-or-
abc123

Feels like I should be encoding a list of keys, then the list of
values, and then concatenating these two encoded byte arrays.  I
think the end solution will be to support both?  I'm having a
hard time reconciling which method is better, if any.  Hard to
find some good examples of people who are sorting a list of maps.

On Tue, Dec 29, 2015 at 2:47 PM, Keith Turner mailto:ke...@deenlo.com>> wrote:



On Mon, Dec 28, 2015 at 11:47 AM, Adam J. Shook
mailto:adamjsh...@gmail.com>> wrote:

Hello all,

Any suggestions for using a Map Lexicoder (or
implementing one)?  I am currently using a new
ListLexicoder(new PairLexicoder(some lexicoder, some
lexicoder), which is working for single maps.  However,
when one of the lexicoders in the Pair is itself a Map
(and therefore another ListLexicoder(PairLexicoder)), an
exception is being thrown because ArrayList is not
Comparable.



Since maps do not have a well defined order of keys and
values, comparison is tricky.   The purpose of Lexicoders is
encode things in such a way that the lexicographical
comparison of the serialized data is possible.  With a
hashmap if I add the same data in the same order to two
different hash map instances, its possible that when
iterating over those maps I could see the data in different
orders.   This could lead to two maps constructed in the
same way at different times (like different JVMs with
different implementations of HashMap) generating different
data that compare as different.  Ideally comparison of the
two would yield equality.

Something like LinkedHashMap does not have this problem for
the same insertion order.  If you want things to be
comparable regardless of insertion order (which I think is
more intuitive), then SortedMap seems like it would be a
good candidate.  So maybe a SortedMapLexicoder would be a
better thing to offer?


Regards,
--Adam







Re: Accumulo monitor not coming up

2016-01-09 Thread Josh Elser
Assuming this is the case, it may be interesting to you to know that we
changed the default port from 50095 to 9995 in the upcoming Accumulo 1.8.
On Jan 8, 2016 6:45 AM, "Eric Newton"  wrote:

> It's possible some other process grabbed port 50095, and that is
> preventing the monitor from starting. In future releases, the default port
> for the monitor will be in a lower range to avoid this very problem.
>
> Try running "netstat -nape | grep 50095".
>
> This should show you information about network sockets and connections to
> port 50095. The very end of the line will provide process information (pid).
>
> If you provide your master ".debug" logs, it might help to explain the
> shutdown issues you are experiencing.
>
> -Eric
>
>
> On Fri, Jan 8, 2016 at 5:12 AM, mohit.kaushik 
> wrote:
>
>>
>> The Accumulo cluster is working fine(all other processes running) but
>> yesterday I restarted one node on which hadoop datanode, zookeeper node,
>> tserver, monitor were running. I started all the processes again but
>> Monitor did not start. I stoped the cluster and restarted but monitor still
>> did not appear.
>>
>> I have JMX configured for monitor process. Visual vm showing everything
>> like monitor is running.. When I start it again from the master server with
>> command "*start-server.sh ip monitor*" it says monitor is already
>> running with pid.  I can check the process is running with " *ps -ef |
>> grep pid*" But as port 50095 is not running simply I can not access the
>> in browser too. I have checked for the firewalls are stopped.
>>
>> And not only on specific server if I start monitor on other servers the
>> behaviour is exactly same either JMX is configured or not. Logs have no
>> error. I also have checked there is no memory issue.
>> One more strange behaviour noticed, If I stop cluster it does not stop
>> cleanly and I have to press ctrl+c and when I start the cluster after start
>> the login prompt hangs forever and I have open another window and again ssh
>> to master again to work on it.
>>
>> Please suggest, I am using Apache Accumulo 1.7.0.
>>
>> Thanks
>> Mohit Kaushik
>>
>>
>


Re: Three day Fluo Common Crawl test

2016-01-12 Thread Josh Elser

Great job!

Keith Turner wrote:

We just completed a three day test of Fluo using Common Crawl data that
went pretty well.

http://fluo.io/webindex-long-run/




Re: Accumulo monitor not coming up

2016-01-13 Thread Josh Elser

mohit.kaushik wrote:

But found it running on one of the server after a day which is not the
configured server for monitor process


That's... strange. We don't automatically start any processes, so I'd 
recommend you investigate what commands you're running :)


The Monitor, like the Master, is designed to operated only when one 
instance is running. If you have multiple Monitors running, you might, 
for example, see the "Recent Logs" split across multiple instances. As 
such, we use ZooKeeper as a barrier to prevent multiple Monitor servers 
from running. This let's us have some "hot-standby" instances for the 
cluster which enable you to avoid any downtime (e.g. if an operations 
team is relying on the monitor to be running)


As such, it's possible that you'll see a monitor process running, but 
the web server inside the process is not. You can check ZooKeeper 
(zkCli.sh) for the active Monitor:


`get /accumuo//monitor/http_addr`

> Please also tell when are you planning to release Accumulo-1.7.1 as I 
am also facing some issues which i found fixed there. when I checked, It 
was expected to release on 15-Dec-2015 but Perhaps delayed for the last 
left bug fix, I suppose.


1.7.1 is on it's way out. We don't have firm dates that software will be 
released, so the best I can give is "soon".


After we get the open issues on JIRA cleaned up, we'll have to run it 
through some testing. Hopefully we can get it out the door while it's 
still January.


Re: Accumulo monitor not coming up

2016-01-14 Thread Josh Elser

I apologize in advance for that ugly command :)

It really should be as simple as `./stop-server.sh  `.

There's a new start-daemon.sh script in 1.8. Maybe we can make a better 
invocation around stopping a process in 1.8 too.


mohit.kaushik wrote:

I was using /*"*//*./stop-server.sh $hostname monitor"*/ to stop monitor
process
But I should have used
/*"./stop-server.sh $hostname $ACCUMULO_HOME/.*/accumulo-start.*.jar
monitor TERM*/"
which I found in the stop-all.sh script.


Re: [ANNOUNCE] Fluo 1.0.0-beta-2 is released

2016-01-19 Thread Josh Elser

+1

William Slacum wrote:

Cool beans, Keith!

On Tue, Jan 19, 2016 at 11:30 AM, Keith Turner mailto:ke...@deenlo.com>> wrote:

The Fluo project is happy to announce a 1.0.0-beta-2[1] release
which is the
third release of Fluo and likely the final release before 1.0.0. Many
improvements in this release were driven by the creation of two new Fluo
related projects:

   * Fluo recipes[2] is a collection of common development patterns
designed to
 make Fluo application development easier. Creating Fluo recipes
required
 new Fluo functionality and updates to the Fluo API. The first
release of
 Fluo recipes has been made and is available in Maven Central.

   * WebIndex[3] is an example Fluo application that indexes links
to web pages
 in multiple ways. Webindex enabled the testing of Fluo on real
data at
 scale.  It also inspired improvements to Fluo to allow it to
work better
 with Apache Spark.

Fluo is now at a point where its two cluster test suites,
Webindex[3] and
Stress[4], are running well for long periods on Amazon EC2. We
invite early
adopters to try out the beta-2 release and help flush out problems
before
1.0.0.

[1]: http://fluo.io/1.0.0-beta-2-release/
[2]: https://github.com/fluo-io/fluo-recipes
[3]: https://github.com/fluo-io/webindex
[4]: https://github.com/fluo-io/fluo-stress




Re: Unable to configure kerberos impersonation with Ambari

2016-01-25 Thread Josh Elser
Caught up with Russ in IRC -- without trying to reproduce it by hand, 
this strongly looks like an issue in Ambari (preventing these characters).


FYI, Billie.

Russ Weeks wrote:

Hi, folks,

I'm following the instructions in the 1.7 user manual to set up Kerberos
impersonation. I'm using Ambari 2.1.2.1 to specify the configuration in
accumulo-site.xml.

If I understand the manual correctly, I need to define a property key of
the form: instance.rpc.sasl.impersonation.username/server@REALM.users.
The value should be a CSV list of kerberos principals.

The problem is that Ambari will reject a property key with the
characters "@" or "/". Is Ambari being overly restrictive or have I
misunderstood the user manual? Is there any workaround other than
manually editing accumulo-site.xml (which I think will be overwritten as
soon as Ambari discovers that it's out-of-sync)?

Thanks,
-Russ


Re: Kerberos and the Accumulo Proxy

2016-01-25 Thread Josh Elser

Hi Tristen,

I'm glad you found that issue. As much as I griped about it, I had 
completely forgotten about the issue. Sadly, it looks like that fix is 
already contained in 2.3.2.0[1] Your client code does look correct as 
well (compared to the tests for this[2]). There are some good Kerberos 
tests in the codebase if you know to look for 'em.


The error message is a bit confusing since it's a client (your python) 
talking to a client (the proxy) talking to the servers (Accumulo 
services). I think this error message is saying that the Proxy is 
failing to authenticate to Accumulo. This is about to get lengthy.


A little bit of background on how this is supposed to work which hinges 
on this fact: A Kerberos client's "secret" information is not made 
available to the server. This means that your client keytab is not made 
available to the Proxy server. The implications this creates for the use 
case you outline is that the Proxy server *cannot* make a connection to 
Accumulo on your behalf because Kerberos knows that the Proxy isn't 
actually you! (this is actually cool if you think about it).


This is where the Proxy impersonation configuration that Russ recently 
asked about comes into play. The common approach (e.g. Hive, HBase) is 
to configure Accumulo to "trust" the Proxy (identified by the Kerberos 
principal) to "act" as a different user. In other words, impersonation 
allows a client to authenticate to Accumulo as a different user than the 
client's Kerberos credentials says it is.


So, client <-> Proxy works like a normal connection. However, proxy <-> 
{tserver,master} has the connection set up as the Proxy's Kerberos 
identity, but Accumulo *allows* the Proxy to actually say that it is 
your client.


As such, your code is 100% correct, but it isn't going to work in the 
way you're trying to use it. If you want a centralized Proxy server 
instance, you'll need to set up impersonation. The other option is 
(which, I think it better since we don't have a good multi-tenancy story 
for the Proxy) is to run a Proxy server instance with your client 
Kerberos credentials instead of the "accumulo" principal. This gets 
around the problem because both the client and the Proxy act as "client" 
and you don't get the mismatch.


This got really long, and I'm sorry for that :). Let me know, I'd be 
happy to put some of this into the user manual if it's lacking.


- Josh

[1] 
https://github.com/hortonworks/accumulo-release/blob/HDP-2.3.2.0-tag/proxy/src/main/java/org/apache/accumulo/proxy/Proxy.java#L245-L248
[2] 
https://github.com/hortonworks/accumulo-release/blob/HDP-2.3.2.0-tag/proxy/src/main/java/org/apache/accumulo/proxy/TestProxyClient.java


Tristen Georgiou wrote:

I'm using Ambari 2.1.2.1, HDP 2.3.2 (so Accumulo 1.7.0) and I'm trying
to get a Kerberized Accumulo proxy up and running; I can successully
start the proxy, but I am having trouble connecting with it.

Here is my Accumulo proxy properties file (I've censored my actual FQDN's):

useMockInstance=false
useMiniAccumulo=false
protocolFactory=org.apache.thrift.protocol.TCompactProtocol$Factory
tokenClass=org.apache.accumulo.core.client.security.tokens.KerberosToken
port=42425
maxFrameSize=16M
thriftServerType=sasl
kerberosPrincipal=accumulo/mas3.example@example.com

kerberosKeytab=/etc/security/keytabs/accumulo.service.keytab
instance=agile_accumulo
zookeepers=mas1.example.com:2181
,mas2.example.com:2181
,mas3.example.com:2181


The proxy starts up fine, and then via Python I am doing the following:

transport =
TTransport.TSaslClientTransport(TSocket.TSocket('mas3.example.com
', 42425), 'mas3.example.com
', 'accumulo', QOP='auth')
protocol = TCompactProtocol.TCompactProtocol(transport)
client = AccumuloProxy.Client(protocol)
transport.open()
login = client.login('cent...@example.com ', {})

Where I've created the principal cent...@example.com
 and have run kinit on the server where I am
trying to connect to the proxy from (not from mas3)

The proxy log responds with this:

2016-01-25 21:42:01,294 [proxy.ProxyServer] ERROR: Failed to login
org.apache.accumulo.core.client.AccumuloSecurityException: Error
BAD_CREDENTIALS for user Principal in credentials object should match
kerberos principal. Expected 'accumulo/mas3.example@example.com' but
was 'cent...@example.com ' - Username or
Password is Invalid
at
org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:63)
at
org.apache.accumulo.core.client.impl.ConnectorImpl.(ConnectorImpl.java:67)
at
org.apache.accumulo.core.client.ZooKeeperInstance.getConnector(ZooKeeperInstance.java:248)
at org.apache.accumulo.proxy.ProxyServer.getConnector(ProxyServer.java:232)
at org.apache.accumulo.proxy.ProxyServer.logi

Re: Accumulo and Kerberos

2016-01-26 Thread Josh Elser

Hi Roman,

Accumulo services (TabletServer, Master, etc) all use a keytab to 
automatically obtain a ticket from the KDC when they start up. You do 
not need to do anything with kinit when starting Accumulo.


One worry is ACCUMULO-4069[1] with all presently released versions (most 
notably 1.7.0 which you are using). This is a bug in which services did 
not automatically renew their ticket. We're working on a 1.7.1, but it's 
not out yet.


As for debugging your issue, take a look at the Kerberos section on 
debugging in the user manual [2]. Take a very close look at the 
principal the service is using to obtain the ticket and what the 
principal is for your keytab. A good sanity check is to make sure you 
can `kinit` in the shell using the keytab and the correct principal 
(rule out the keytab being incorrect).


If you still get stuck, collect the output specifying 
-Dsun.security.krb5.debug=true in accumulo-env.sh (per the instructions) 
and try enabling log4j DEBUG on 
org.apache.hadoop.security.UserGroupInformation.


- Josh

[1] https://issues.apache.org/jira/browse/ACCUMULO-4069
[2] http://accumulo.apache.org/1.7/accumulo_user_manual.html#_debugging

roman.drap...@baesystems.com wrote:

Hi there,

Trying to setup Accumulo 1.7 on Kerberized cluster. Only interested in
master/tablets to be kerberized (not end-users). Configured everything
as per manual:

1)Created principals

2)Generated glob keytab

3)Modified accumulo-site.xml providing general.kerberos.keytab and
general.kerberos.principal

If I start as accumulo user I get: Caused by: GSSException: No valid
credentials provided (Mechanism level: Failed to find any Kerberos tgt)

However, if I give explicitly a token with kinit and keytab generated
above in the shell – it works as expected. To my understanding Accumulo
has to obtain tickets automatically? Or the idea is to write a cron job
and apply kinit to every tablet server per day?

Regards,

Roman

Please consider the environment before printing this email. This message
should be regarded as confidential. If you have received this email in
error please notify the sender and destroy it immediately. Statements of
intent shall only become binding when confirmed in hard copy by an
authorised signatory. The contents of this email may relate to dealings
with other companies under the control of BAE Systems Applied
Intelligence Limited, details of which can be found at
http://www.baesystems.com/Businesses/index.htm.


Re: Accumulo and Kerberos

2016-01-26 Thread Josh Elser
I would strongly recommend that you do not use the HDFS classloader. It 
is known to be very broken in what you download as 1.7.0. There are a 
number of JIRA issues about this which stem from a lack of a released 
commons-vfs2-2.1.


That being said, I have not done anything with running Accumulo out of 
HDFS with Kerberos enabled. AFAIK, you're in untraveled waters.


re: the renewal bug: When the ticket expires, the Accumulo service will 
die. Your options are to deploy a watchdog process that would restart 
the service, download the fix from the JIRA case and rebuild Accumulo 
yourself, or build 1.7.1-SNAPSHOT from our codebase. I would recommend 
using 1.7.1-SNAPSHOT as it should be the least painful (1.7.1-SNAPSHOT 
now is likely to not change significantly from what is ultimately 
released as 1.7.1)


roman.drap...@baesystems.com wrote:

Hi Josh,

Yes, will do. Just in the meantime - I can see a different issue on slave 
nodes. If I try to start in isolation (bin/start-here.sh) with or without doing 
kinit I always see the error below.

2016-01-26 18:31:13,873 [start.Main] ERROR: Problem initializing the class 
loader
java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.accumulo.start.Main.getClassLoader(Main.java:68)
 at org.apache.accumulo.start.Main.main(Main.java:52)
Caused by: org.apache.commons.vfs2.FileSystemException: Could not determine the type of file 
"hdfs:///platform/lib/.*.jar".
 at 
org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:1522)
 at 
org.apache.commons.vfs2.provider.AbstractFileObject.getType(AbstractFileObject.java:489)
 at 
org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.resolve(AccumuloVFSClassLoader.java:143)
 at 
org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.resolve(AccumuloVFSClassLoader.java:121)
 at 
org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.getClassLoader(AccumuloVFSClassLoader.java:211)
 ... 6 more
Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE 
authentication is not enabled.  Available:[TOKEN, KERBEROS]

I guess it might be different to what I observe on the master node. If I don't 
get ticket explicitly, I get the error mentioned in the previous email. However 
if do (and it does not matter for what user I have a ticket now - whether it's 
accumulo, hdfs or hive) - it works. So I started to think, maybe the problem 
related to some action (for example to vfs as per above) that tries to access 
HDFS before doing a proper authentication with Kerberos? Any ideas?

Also, if we go live with 1.7.0 - what approach would you recommend for renewing 
tickets? Does it require stopping and starting the cluster?

Regards,
Roman



-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 18:10
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

Hi Roman,

Accumulo services (TabletServer, Master, etc) all use a keytab to automatically 
obtain a ticket from the KDC when they start up. You do not need to do anything 
with kinit when starting Accumulo.

One worry is ACCUMULO-4069[1] with all presently released versions (most 
notably 1.7.0 which you are using). This is a bug in which services did not 
automatically renew their ticket. We're working on a 1.7.1, but it's not out 
yet.

As for debugging your issue, take a look at the Kerberos section on debugging 
in the user manual [2]. Take a very close look at the principal the service is 
using to obtain the ticket and what the principal is for your keytab. A good 
sanity check is to make sure you can `kinit` in the shell using the keytab and 
the correct principal (rule out the keytab being incorrect).

If you still get stuck, collect the output specifying 
-Dsun.security.krb5.debug=true in accumulo-env.sh (per the instructions) and 
try enabling log4j DEBUG on org.apache.hadoop.security.UserGroupInformation.

- Josh

[1] https://issues.apache.org/jira/browse/ACCUMULO-4069
[2] http://accumulo.apache.org/1.7/accumulo_user_manual.html#_debugging

roman.drap...@baesystems.com wrote:

Hi there,

Trying to setup Accumulo 1.7 on Kerberized cluster. Only interested in
master/tablets to be kerberized (not end-users). Configured everything
as per manual:

1)Created principals

2)Generated glob keytab

3)Modified accumulo-site.xml providing general.kerberos.keytab and
general.kerberos.principal

If I start as accumulo user I get: Caused by: GSSException: No valid
credentials provided (Mechanism level: Failed to find any Kerberos
tgt)

However, if I giv

Re: Accumulo and Kerberos

2016-01-26 Thread Josh Elser
The normal classloader (on the local filesystem) which is configured out 
of the box.


roman.drap...@baesystems.com wrote:

Hi Josh,

I can confirm that issue on the master is related to VFS classloader!  
Commented out classloader and now it works without kinit. So it seems it tries 
loading classes before Kerberos authentication happened. What classloader 
should I use instead?

Regards,
Roman

-Original Message-
From: roman.drap...@baesystems.com [mailto:roman.drap...@baesystems.com]
Sent: 26 January 2016 19:43
To: user@accumulo.apache.org
Subject: RE: Accumulo and Kerberos

Hi Josh,

Two quick questions.

1) What should I use instead of HDFS classloader? All examples seem to be from 
hdfs.
2) Whan 1.7.1 release is scheduled for (approx.) ?

Regards,
Roman

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 19:01
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

I would strongly recommend that you do not use the HDFS classloader. It is 
known to be very broken in what you download as 1.7.0. There are a number of 
JIRA issues about this which stem from a lack of a released commons-vfs2-2.1.

That being said, I have not done anything with running Accumulo out of HDFS 
with Kerberos enabled. AFAIK, you're in untraveled waters.

re: the renewal bug: When the ticket expires, the Accumulo service will die. 
Your options are to deploy a watchdog process that would restart the service, 
download the fix from the JIRA case and rebuild Accumulo yourself, or build 
1.7.1-SNAPSHOT from our codebase. I would recommend using 1.7.1-SNAPSHOT as it 
should be the least painful (1.7.1-SNAPSHOT now is likely to not change 
significantly from what is ultimately released as 1.7.1)

roman.drap...@baesystems.com wrote:

Hi Josh,

Yes, will do. Just in the meantime - I can see a different issue on slave 
nodes. If I try to start in isolation (bin/start-here.sh) with or without doing 
kinit I always see the error below.

2016-01-26 18:31:13,873 [start.Main] ERROR: Problem initializing the
class loader java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.accumulo.start.Main.getClassLoader(Main.java:68)
  at org.apache.accumulo.start.Main.main(Main.java:52)
Caused by: org.apache.commons.vfs2.FileSystemException: Could not determine the type of file 
"hdfs:///platform/lib/.*.jar".
  at 
org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:1522)
  at 
org.apache.commons.vfs2.provider.AbstractFileObject.getType(AbstractFileObject.java:489)
  at 
org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.resolve(AccumuloVFSClassLoader.java:143)
  at 
org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.resolve(AccumuloVFSClassLoader.java:121)
  at 
org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.getClassLoader(AccumuloVFSClassLoader.java:211)
  ... 6 more
Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE
authentication is not enabled.  Available:[TOKEN, KERBEROS]

I guess it might be different to what I observe on the master node. If I don't 
get ticket explicitly, I get the error mentioned in the previous email. However 
if do (and it does not matter for what user I have a ticket now - whether it's 
accumulo, hdfs or hive) - it works. So I started to think, maybe the problem 
related to some action (for example to vfs as per above) that tries to access 
HDFS before doing a proper authentication with Kerberos? Any ideas?

Also, if we go live with 1.7.0 - what approach would you recommend for renewing 
tickets? Does it require stopping and starting the cluster?

Regards,
Roman



-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 18:10
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

Hi Roman,

Accumulo services (TabletServer, Master, etc) all use a keytab to automatically 
obtain a ticket from the KDC when they start up. You do not need to do anything 
with kinit when starting Accumulo.

One worry is ACCUMULO-4069[1] with all presently released versions (most 
notably 1.7.0 which you are using). This is a bug in which services did not 
automatically renew their ticket. We're working on a 1.7.1, but it's not out 
yet.

As for debugging your issue, take a look at the Kerberos section on debugging 
in the user manual [2]. Take a very close look at the principal the service is 
using to obtain the ticket and what the principal is for your keytab. A good 
sanity check is to make sure you can `kinit` in 

Re: Accumulo and Kerberos

2016-01-26 Thread Josh Elser
Ok, let me repeat: running a `kinit` in your local shell has *no 
bearing* on what Accumulo is doing. This is fundamentally not how it 
works. There are libraries in the JDK which perform the login with the 
KDC using the keytab you provide in accumulo-site.xml. Accumulo is not 
using the ticket cache which your `kinit` creates.




You should see a message in the log stating that the Kerberos login 
happened (or didn't). The server should exit if it fails to log in (but 
I don't know if I've actively tested that). Do you see this message? 
Does it say you successfully logged in (and the principal you logged in as)?


roman.drap...@baesystems.com wrote:

Ok, there is some progress. So these issues were definitely related to VFS 
classloader - now works both on the client and master - so I guess a bug is 
found.

And it looks like there is a very similar issue related to instance_id

On the slaves (does not matter whether I do kinit hdfs or not) I always receive 
when I start the node:

 2016-01-26 20:36:41,744 [tserver.TabletServer] ERROR: Uncaught 
exception in TabletServer.main, exiting
 java.lang.RuntimeException: Can't tell if Accumulo is initialized; 
can't read instance id at 
hdfs://cr-platform-qa23-01.cyberreveal.local:8020/accumulo/instance_id

On the master I can see the same issue when I do bin/stop-all.sh without kinit 
hdfs and it disappears if I have a hdfs ticket.

I tried both: hadoop fs -chown -R accumulo:hdfs /accumulo and hadoop fs -chown 
-R accumulo:accumulo /accumulo - same behavior

Any thoughts please?




-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 20:08
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

The normal classloader (on the local filesystem) which is configured out of the 
box.

roman.drap...@baesystems.com wrote:

Hi Josh,

I can confirm that issue on the master is related to VFS classloader!  
Commented out classloader and now it works without kinit. So it seems it tries 
loading classes before Kerberos authentication happened. What classloader 
should I use instead?

Regards,
Roman

-Original Message-
From: roman.drap...@baesystems.com
[mailto:roman.drap...@baesystems.com]
Sent: 26 January 2016 19:43
To: user@accumulo.apache.org
Subject: RE: Accumulo and Kerberos

Hi Josh,

Two quick questions.

1) What should I use instead of HDFS classloader? All examples seem to be from 
hdfs.
2) Whan 1.7.1 release is scheduled for (approx.) ?

Regards,
Roman

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 19:01
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

I would strongly recommend that you do not use the HDFS classloader. It is 
known to be very broken in what you download as 1.7.0. There are a number of 
JIRA issues about this which stem from a lack of a released commons-vfs2-2.1.

That being said, I have not done anything with running Accumulo out of HDFS 
with Kerberos enabled. AFAIK, you're in untraveled waters.

re: the renewal bug: When the ticket expires, the Accumulo service
will die. Your options are to deploy a watchdog process that would
restart the service, download the fix from the JIRA case and rebuild
Accumulo yourself, or build 1.7.1-SNAPSHOT from our codebase. I would
recommend using 1.7.1-SNAPSHOT as it should be the least painful
(1.7.1-SNAPSHOT now is likely to not change significantly from what is
ultimately released as 1.7.1)

roman.drap...@baesystems.com wrote:

Hi Josh,

Yes, will do. Just in the meantime - I can see a different issue on slave 
nodes. If I try to start in isolation (bin/start-here.sh) with or without doing 
kinit I always see the error below.

2016-01-26 18:31:13,873 [start.Main] ERROR: Problem initializing the
class loader java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.accumulo.start.Main.getClassLoader(Main.java:68)
   at org.apache.accumulo.start.Main.main(Main.java:52)
Caused by: org.apache.commons.vfs2.FileSystemException: Could not determine the type of file 
"hdfs:///platform/lib/.*.jar".
   at 
org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:1522)
   at 
org.apache.commons.vfs2.provider.AbstractFileObject.getType(AbstractFileObject.java:489)
   at 
org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.resolve(AccumuloVFSClassLoader.java:143)
   at 
org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.resolve(Accumu

Re: Accumulo and Kerberos

2016-01-26 Thread Josh Elser
Your confusion is stemming from what stop-all.sh is actually doing 
(although, I still have no idea how stopping processes has any bearing 
on how they start :smile:). Notably, this script will invoke `accumulo 
admin stopAll` to trigger a graceful shutdown before stopping the 
services hard (`kill`).


So, as you would run these scripts as the 'accumulo' user without 
Kerberos, you should also be logged in as the 'accumulo' Kerberos user 
when starting them. This might be missing from the docs. Please do 
suggest where some documentation should be added to cover this.


If it doesn't go without saying, this is a separate issue from your 
services not logging in correctly. Can you share logs? Try enabling 
-Dsun.security.kr5b.debug=true in the appropriate environment variable 
(for the service you want to turn it on for) in accumulo-env.sh and then 
start the services again (hopefully, sharing that too if the problem 
isn't obvious).


roman.drap...@baesystems.com wrote:

I want to believe in this, but what I see contradicts this statement..

I do bin/stop-all.sh on master.

If I have a ticket cache for hdfs user, I don't see any errors.
If I don't a have ticket cache for hdfs user, I see these errors.

I can see that all slaves and master successfully logged in as accumulo user.

However slaves are failing straight away due to the error I posted in the 
previous email. I also see this error when I stop the master and I don't' have 
ticket cache for hdfs user, however I don't see it if I have ticket cache (as 
per above)... It's kind of a reflection of the previous problem with vfs.




-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 21:08
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

Ok, let me repeat: running a `kinit` in your local shell has *no
bearing* on what Accumulo is doing. This is fundamentally not how it works. 
There are libraries in the JDK which perform the login with the KDC using the 
keytab you provide in accumulo-site.xml. Accumulo is not using the ticket cache 
which your `kinit` creates.



You should see a message in the log stating that the Kerberos login happened 
(or didn't). The server should exit if it fails to log in (but I don't know if 
I've actively tested that). Do you see this message?
Does it say you successfully logged in (and the principal you logged in as)?

roman.drap...@baesystems.com wrote:

Ok, there is some progress. So these issues were definitely related to VFS 
classloader - now works both on the client and master - so I guess a bug is 
found.

And it looks like there is a very similar issue related to instance_id

On the slaves (does not matter whether I do kinit hdfs or not) I always receive 
when I start the node:

  2016-01-26 20:36:41,744 [tserver.TabletServer] ERROR: Uncaught 
exception in TabletServer.main, exiting
  java.lang.RuntimeException: Can't tell if Accumulo is
initialized; can't read instance id at
hdfs://cr-platform-qa23-01.cyberreveal.local:8020/accumulo/instance_id

On the master I can see the same issue when I do bin/stop-all.sh without kinit 
hdfs and it disappears if I have a hdfs ticket.

I tried both: hadoop fs -chown -R accumulo:hdfs /accumulo and hadoop
fs -chown -R accumulo:accumulo /accumulo - same behavior

Any thoughts please?




-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 20:08
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

The normal classloader (on the local filesystem) which is configured out of the 
box.

roman.drap...@baesystems.com wrote:

Hi Josh,

I can confirm that issue on the master is related to VFS classloader!  
Commented out classloader and now it works without kinit. So it seems it tries 
loading classes before Kerberos authentication happened. What classloader 
should I use instead?

Regards,
Roman

-Original Message-
From: roman.drap...@baesystems.com
[mailto:roman.drap...@baesystems.com]
Sent: 26 January 2016 19:43
To: user@accumulo.apache.org
Subject: RE: Accumulo and Kerberos

Hi Josh,

Two quick questions.

1) What should I use instead of HDFS classloader? All examples seem to be from 
hdfs.
2) Whan 1.7.1 release is scheduled for (approx.) ?

Regards,
Roman

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 19:01
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

I would strongly recommend that you do not use the HDFS classloader. It is 
known to be very broken in what you download as 1.7.0. There are a number of 
JIRA issues about this which stem from a lack of a released commons-vfs2-2.1.

That being said, I have not done anything with running Accumulo out of HDFS 
with Kerberos enabled. AFAIK, you're in untraveled waters.

re: the renewal bug: When the ticket expires, the Accumulo service
will die. 

Re: Accumulo and Kerberos

2016-01-26 Thread Josh Elser
at com.sun.proxy.$Proxy22.getListing(Unknown Source)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:554)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy23.getListing(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1963)
 ... 16 more
2016-01-26 22:29:20,053 [tserver.TabletServer] ERROR: Uncaught exception in 
TabletServer.main, exiting
java.lang.RuntimeException: Can't tell if Accumulo is initialized; can't read 
instance id at hdfs://:8020/accumulo/instance_id
 at 
org.apache.accumulo.core.zookeeper.ZooUtil.getInstanceIDFromHdfs(ZooUtil.java:76)
 at 
org.apache.accumulo.core.zookeeper.ZooUtil.getInstanceIDFromHdfs(ZooUtil.java:51)
 at 
org.apache.accumulo.server.client.HdfsZooInstance._getInstanceID(HdfsZooInstance.java:137)
 at 
org.apache.accumulo.server.client.HdfsZooInstance.getInstanceID(HdfsZooInstance.java:121)
 at 
org.apache.accumulo.server.conf.ServerConfigurationFactory.(ServerConfigurationFactory.java:113)
 at 
org.apache.accumulo.tserver.TabletServer.main(TabletServer.java:2952)
 at 
org.apache.accumulo.tserver.TServerExecutable.execute(TServerExecutable.java:33)
 at org.apache.accumulo.start.Main$1.run(Main.java:93)
 at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE 
authentication is not enabled.  Available:[TOKEN, KERBEROS]
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
 at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1965)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1946)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
 at 
org.apache.accumulo.core.zookeeper.ZooUtil.getInstanceIDFromHdfs(ZooUtil.java:59)
 ... 8 more
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
 at org.apache.hadoop.ipc.Client.call(Client.java:1468)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy22.getListing(Unknown Source)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:554)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy23.getListing(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1963)
 ... 16 more

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 21:39
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

Your confusion is stemming from what stop-all

Re: Accumulo and Kerberos

2016-01-27 Thread Josh Elser

I don't understand what you mean by "Classpath did not help".

The way this is designed to work is that the HADOOP_CONF_DIR you set in 
accumulo-env.sh gets added to the classpath. This should prevent the 
need for you to copy/link any Hadoop configuration files into the 
Accumulo installation (making upgrades less error prone).


Glad to hear you got it working.

roman.drap...@baesystems.com wrote:

Hi Josh,

Thanks a lot for your guess. Classpath did not help, however symlinks from 
Hadoop conf directory to Accumulo conf directory worked perfectly.

Regards,
Roman

-Original Message-----
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 23:26
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

Ok, so we can confirm that the TabletServer logged in. This is what your .out 
file is telling us.

This is looking like you don't have your software configured correctly.
As this error message is trying to tell you, HDFS supports three types of authentication tokens: "simple", "token", and 
"kerberos". "Simple" is what is used by default (without Kerberos). "Kerberos" refers to clients with a 
Kerberos ticket. Ignore "token" for now as it's irrelevant.

We can tell from the stack trace that the TabletServer made an RPC to a datanode. For 
some reason, this RPC was requesting "simple"
authentication and not "kerberos". The datanode is telling you that it's not allowed to accept your 
"simple" token and that you need to use "kerberos" (or "token").

If I had to venture a guess, it would be that you have Accumulo configured to 
use the wrong Hadoop configuration files, notably core-site.xml and 
hdfs-site.xml.

Try the command `accumulo classpath` command and verify that the Hadoop 
configuration files included there are the correct ones (the ones that are 
configured for Kerberos).

- Josh

roman.drap...@baesystems.com wrote:

Hi Josh,

Tried on the tserver - does not really give more information. I am trying just 
bin/start-here.sh on the slave (master is successfully running).

This is what I can see in ".out" log

2016-01-26 22:29:18,233 [security.SecurityUtil] INFO : Attempting to
login with keytab as accumulo/@
2016-01-26 22:29:18,371 [util.NativeCodeLoader] WARN : Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
2016-01-26 22:29:18,685 [security.SecurityUtil] INFO : Succesfully
logged in as user accumulo/@

This is ".log" log - it looks like something with Zookeper Utils..

2016-01-26 22:29:19,963 [fs.VolumeManagerImpl] WARN :
dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is
possible on hard system reset or power loss
2016-01-26 22:29:19,966 [conf.Property] DEBUG: Loaded class :
org.apache.accumulo.server.fs.PerTableVolumeChooser
2016-01-26 22:29:20,050 [zookeeper.ZooUtil] ERROR: Problem reading
instance id out of hdfs at hdfs://:8020/accumulo/instance_id
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not 
enabled.  Available:[TOKEN, KERBEROS]
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
  at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
  at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
  at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1965)
  at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1946)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
  at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
  at 
org.apache.accumulo.core.zookeeper.ZooUtil.getInstanceIDFromHdfs(ZooUtil.java:59)
  at 
org.apache.accumulo.core.zookeeper.ZooUtil.getInstanceIDFromHdfs(ZooUtil.java:51)
  at 
org.apache.accumulo.server.client.HdfsZooInstance._getInstanceID(HdfsZooInstance.java:137)
  at 
org.apache.accumulo.server.client.HdfsZooInstance.getInstanceID(HdfsZooInstance.java:121)
  at 
org.apache.accumulo.server.conf.S

Re: Problem configuring Ganglia(jmxtrans) with Accumulo 1.7.0

2016-01-27 Thread Josh Elser

Mohit,

I would strongly recommend you take a look at the Hadoop Metrics2 
support that was added in 1.7.0 to push metrics to Ganglia instead of 
using JMXTrans.


This was one of the big reasons we chose to adopt the new metrics 
system. Using this integration will prevent you from having to 
run/monitor jmxtrans (which is always nice in a production environment). 
There should be a commented out example of how to configure Ganglia in 
the provided hadoop-metrics2-accumulo.properties file in the example 
configurations.


http://accumulo.apache.org/1.7/accumulo_user_manual.html#_metrics2_configuration

- Josh

mohit.kaushik wrote:

I am trying to setup Ganglia for Monitoring Accumulo 1.7.0 Cluster
having 5 nodes. I also configured ganglia for Accumulo 1.6.1 standalone
cluster which is working fine and shows all the required metrices
related to Accumulo.

After scanning jmxtrans.log, I found that the its not exexuting quring
properly. It starts the job and finish in next line
[27 Jan 2016 17:38:15] [ServerScheduler_Worker-1] 3960536 DEBUG
(com.googlecode.jmxtrans.jobs.ServerJob:31) - + Started server job:
Server [host=localhost, port=9006,
url=service:jmx:rmi:///jndi/rmi://localhost:9006/jmxrmi,
cronExpression=null, numQueryThreads=null]
[27 Jan 2016 17:38:15] [ServerScheduler_Worker-1] 3960542 DEBUG
(com.googlecode.jmxtrans.jobs.ServerJob:50) - + Finished server job:
Server [host=localhost, port=9006,
url=service:jmx:rmi:///jndi/rmi://localhost:9006/jmxrmi,
cronExpression=null, numQueryThreads=null]

But this not the case with Accumulo 1.6.1 cluster it executes the query
and send the ganglia metrices which I can see on ganglia web ui...

[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360736 DEBUG
(com.googlecode.jmxtrans.jobs.ServerJob:31) - + Started server job:
Server [host=mohit, port=9006,
url=service:jmx:rmi:///jndi/rmi://mohit:9006/jmxrmi,
cronExpression=null, numQueryThreads=null]
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360742 DEBUG
(com.googlecode.jmxtrans.util.JmxUtils:195) - Executing queryName:
accumulo.server.metrics:instance=tserver,name=TabletServerMBean,service=TServerInfo
from query: Query
[obj=accumulo.server.metrics:service=TServerInfo,name=TabletServerMBean,instance=tserver,
resultAlias=Accumulo, attr=[Entries, EntriesInMemory, Ingest,
MajorCompactions, MajorCompactionsQueued, MinorCompactions,
MinorCompactionsQueued, OnlineCount, OpeningCount, Queries,
UnopenedCount, TotalMinorCompactions, HoldTime, AverageFilesPerTablet,
Name]]
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360746 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.Entries=14
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360746 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.EntriesInMemory=0
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360747 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.Ingest=0
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360748 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.MajorCompactions=0
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360748 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.MajorCompactionsQueued=0
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360749 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.MinorCompactions=0
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360749 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.MinorCompactionsQueued=0
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360750 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.OnlineCount=3
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360750 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.OpeningCount=0
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360751 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.Queries=995
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360751 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.UnopenedCount=0
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360752 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.TotalMinorCompactions=33
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360752 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.HoldTime=0.0
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360752 DEBUG
(com.googlecode.jmxtrans.model.output.GangliaWriter:141) - Sending
Ganglia metric Accumulo.AverageFilesPerTablet=0.
[27 Jan 2016 17:39:25] [ServerScheduler_Worker-9] 360753 DEBU

Re: Accumulo and Kerberos

2016-01-27 Thread Josh Elser
The directory being on the classpath is all that you need. Directories 
will be made available, you shouldn't be listing the files explicitly.


The example accumulo-site.xml files do this for you:

  
general.classpaths

...elided...
  
  $HADOOP_CONF_DIR,
  
...elided...

Classpaths that accumulo checks for updates and class 
files.

  

This is all you should need to get the necessary Hadoop configuration 
files made available to the Accumulo services.


roman.drap...@baesystems.com wrote:

Well I used general.classpaths and absolute paths to individual files - "accumulo classpath" picked 
them up. When I tried just specifying a folder "/etc/hadoop/conf/" or 
"/etc/hadoop/conf/*.xml" - for some reason it did not see the files. Nothing from the above helped 
me. Then I found a script by Squirrel for MapR where they copied files to accumulo conf folder - tried 
symlinks and it magically started to work.

-----Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 27 January 2016 16:17
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

I don't understand what you mean by "Classpath did not help".

The way this is designed to work is that the HADOOP_CONF_DIR you set in 
accumulo-env.sh gets added to the classpath. This should prevent the need for 
you to copy/link any Hadoop configuration files into the Accumulo installation 
(making upgrades less error prone).

Glad to hear you got it working.

roman.drap...@baesystems.com wrote:

Hi Josh,

Thanks a lot for your guess. Classpath did not help, however symlinks from 
Hadoop conf directory to Accumulo conf directory worked perfectly.

Regards,
Roman

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 23:26
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

Ok, so we can confirm that the TabletServer logged in. This is what your .out 
file is telling us.

This is looking like you don't have your software configured correctly.
As this error message is trying to tell you, HDFS supports three types of authentication tokens: "simple", "token", and 
"kerberos". "Simple" is what is used by default (without Kerberos). "Kerberos" refers to clients with a 
Kerberos ticket. Ignore "token" for now as it's irrelevant.

We can tell from the stack trace that the TabletServer made an RPC to a datanode. For 
some reason, this RPC was requesting "simple"
authentication and not "kerberos". The datanode is telling you that it's not allowed to accept your 
"simple" token and that you need to use "kerberos" (or "token").

If I had to venture a guess, it would be that you have Accumulo configured to 
use the wrong Hadoop configuration files, notably core-site.xml and 
hdfs-site.xml.

Try the command `accumulo classpath` command and verify that the Hadoop 
configuration files included there are the correct ones (the ones that are 
configured for Kerberos).

- Josh

roman.drap...@baesystems.com wrote:

Hi Josh,

Tried on the tserver - does not really give more information. I am trying just 
bin/start-here.sh on the slave (master is successfully running).

This is what I can see in ".out" log

2016-01-26 22:29:18,233 [security.SecurityUtil] INFO : Attempting to
login with keytab as accumulo/@
2016-01-26 22:29:18,371 [util.NativeCodeLoader] WARN : Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
2016-01-26 22:29:18,685 [security.SecurityUtil] INFO : Succesfully
logged in as user accumulo/@

This is ".log" log - it looks like something with Zookeper Utils..

2016-01-26 22:29:19,963 [fs.VolumeManagerImpl] WARN :
dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is
possible on hard system reset or power loss
2016-01-26 22:29:19,966 [conf.Property] DEBUG: Loaded class :
org.apache.accumulo.server.fs.PerTableVolumeChooser
2016-01-26 22:29:20,050 [zookeeper.ZooUtil] ERROR: Problem reading
instance id out of hdfs at hdfs://:8020/accumulo/instance_id
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not 
enabled.  Available:[TOKEN, KERBEROS]
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.jav

Re: Accumulo and Kerberos

2016-01-27 Thread Josh Elser
Ok -- just wanted to make sure it was clear that you were doing more 
than was intended. If that works for you, super.


There is no firm date on 1.7.1. I know it's being worked by a couple of 
us, but, as we're all volunteers, it'll happen when it happens :)


roman.drap...@baesystems.com wrote:

I honestly tried this - as this was the first thing that I found in accumulo 
examples; I don't know - maybe the problem with this approach is still on our 
side and requires additional investigation, however a state of ecstasy [that I 
received when everything started to work] tells me not to chase extra honours 
in solving brainteasers [with your help of course] :-)

By the way, what do you think is the approximate expected date for 1.7.1 GA 
release (for Kerberos tickets renewal)?


-Original Message-----
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 27 January 2016 21:25
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

The directory being on the classpath is all that you need. Directories will be 
made available, you shouldn't be listing the files explicitly.

The example accumulo-site.xml files do this for you:


  general.classpaths
  
...elided...

$HADOOP_CONF_DIR,
  ...elided...
  
  Classpaths that accumulo checks for updates and class 
files.


This is all you should need to get the necessary Hadoop configuration files 
made available to the Accumulo services.

roman.drap...@baesystems.com wrote:

Well I used general.classpaths and absolute paths to individual files - "accumulo classpath" picked 
them up. When I tried just specifying a folder "/etc/hadoop/conf/" or 
"/etc/hadoop/conf/*.xml" - for some reason it did not see the files. Nothing from the above helped 
me. Then I found a script by Squirrel for MapR where they copied files to accumulo conf folder - tried 
symlinks and it magically started to work.

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 27 January 2016 16:17
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

I don't understand what you mean by "Classpath did not help".

The way this is designed to work is that the HADOOP_CONF_DIR you set in 
accumulo-env.sh gets added to the classpath. This should prevent the need for 
you to copy/link any Hadoop configuration files into the Accumulo installation 
(making upgrades less error prone).

Glad to hear you got it working.

roman.drap...@baesystems.com wrote:

Hi Josh,

Thanks a lot for your guess. Classpath did not help, however symlinks from 
Hadoop conf directory to Accumulo conf directory worked perfectly.

Regards,
Roman

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 26 January 2016 23:26
To: user@accumulo.apache.org
Subject: Re: Accumulo and Kerberos

Ok, so we can confirm that the TabletServer logged in. This is what your .out 
file is telling us.

This is looking like you don't have your software configured correctly.
As this error message is trying to tell you, HDFS supports three types of authentication tokens: "simple", "token", and 
"kerberos". "Simple" is what is used by default (without Kerberos). "Kerberos" refers to clients with a 
Kerberos ticket. Ignore "token" for now as it's irrelevant.

We can tell from the stack trace that the TabletServer made an RPC to a datanode. For 
some reason, this RPC was requesting "simple"
authentication and not "kerberos". The datanode is telling you that it's not allowed to accept your 
"simple" token and that you need to use "kerberos" (or "token").

If I had to venture a guess, it would be that you have Accumulo configured to 
use the wrong Hadoop configuration files, notably core-site.xml and 
hdfs-site.xml.

Try the command `accumulo classpath` command and verify that the Hadoop 
configuration files included there are the correct ones (the ones that are 
configured for Kerberos).

- Josh

roman.drap...@baesystems.com wrote:

Hi Josh,

Tried on the tserver - does not really give more information. I am trying just 
bin/start-here.sh on the slave (master is successfully running).

This is what I can see in ".out" log

2016-01-26 22:29:18,233 [security.SecurityUtil] INFO : Attempting to
login with keytab as accumulo/@
2016-01-26 22:29:18,371 [util.NativeCodeLoader] WARN : Unable to
load native-hadoop library for your platform... using builtin-java
classes where applicable
2016-01-26 22:29:18,685 [security.SecurityUtil] INFO : Succesfully
logged in as user accumulo/@

This is ".log" log - it looks like something with Zookeper Utils..

2016-01-26 22:29:19,963 [fs.VolumeManagerImpl] WARN :
dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is
possible on hard system reset or power loss
2016-01-26 22:29:19,966 [c

Re: How to choose BinId for Document partitioned index

2016-02-06 Thread Josh Elser
You can get *really* fancy if you have lots of ingesters and lots of 
servers, include some attribute in the data you're hashing to control 
how many servers a given client will need to write to for some batch of 
documents. This is probably overkill for most setups though.


Guava provides a decent murmur3 implementation which will be much faster 
than your run-of-the-mill MD5 for generating the hash (which you'll mod 
by the max number of bins).


William Slacum wrote:

Often it'll be a hash of the document mod the number of bins you're
using. The hash should be "good" in the sense that it uniquely
identifies the document. It can be as simple as some unique field in the
document or just a hash (like murmur) of the whole document.

On Saturday, February 6, 2016, Jamie Johnson mailto:jej2...@gmail.com>> wrote:

Just found this excellent write up that explains a bit.

https://www.slideshare.net/mobile/acordova00/text-indexing-in-accumulo

On Feb 6, 2016 8:52 AM, "Jamie Johnson" > wrote:

Reading the examples for table design I've come across a
question associated with the document partitioned index,
specifically what is typically chosen as the BinId or maybe more
appropriately what factors should influence what is chosen as
the BinId and what impact do they have?



Re: How to choose BinId for Document partitioned index

2016-02-07 Thread Josh Elser

Yes! Very astute, Jamie :)

For the wikisearch schemas, the general idea is that the inverted index 
tables can prune your row space for some terms. This way, you can know 
the exact rows you have to search in the sharded table to get good 
parallelism without a full-table scan.


Jamie Johnson wrote:

Thanks guys.  I was also looking at some of the examples and saw the
event store, I like the idea of including time as a prefix to the
binning to limit the number of servers that need to be hit for time
bound queries.  Without something like this queries end up having to hit
all tablets right?  It's not always a full table scan since the
iterators can bail on a row part way through but still needs to hit
every row to some extent right?

I also was looking at the wiki example but wasn't able to find a good
description of how all the tables are used, does anything more exist?

On Feb 6, 2016 2:20 PM, "Josh Elser" mailto:josh.el...@gmail.com>> wrote:

You can get *really* fancy if you have lots of ingesters and lots of
servers, include some attribute in the data you're hashing to
control how many servers a given client will need to write to for
some batch of documents. This is probably overkill for most setups
though.

Guava provides a decent murmur3 implementation which will be much
faster than your run-of-the-mill MD5 for generating the hash (which
you'll mod by the max number of bins).

William Slacum wrote:

Often it'll be a hash of the document mod the number of bins you're
using. The hash should be "good" in the sense that it uniquely
identifies the document. It can be as simple as some unique
field in the
document or just a hash (like murmur) of the whole document.

On Saturday, February 6, 2016, Jamie Johnson mailto:jej2...@gmail.com>
<mailto:jej2...@gmail.com <mailto:jej2...@gmail.com>>> wrote:

 Just found this excellent write up that explains a bit.

https://www.slideshare.net/mobile/acordova00/text-indexing-in-accumulo

 On Feb 6, 2016 8:52 AM, "Jamie Johnson" mailto:jej2...@gmail.com>
mailto:jej2...@gmail.com>');>> wrote:

 Reading the examples for table design I've come across a
 question associated with the document partitioned index,
 specifically what is typically chosen as the BinId or
maybe more
 appropriately what factors should influence what is
chosen as
 the BinId and what impact do they have?



Re: Bug in either InMemoryMap or NativeMap

2016-02-19 Thread Josh Elser

Dan, you're capable of opening an issue yourself, btw :)

https://issues.apache.org/jira/secure/CreateIssue!default.jspa

It's nice to have the history that you did this great foot-work with 
your name as the reporter.


Dan Blum wrote:

Yes, please open an issue for this.

In the meantime, as a workaround is it safe to assign an arbitrary
increasing timestamp when calling Mutation.put()? That seems the
simplest way to get the ColumnUpdates to be treated properly.

*From:*Keith Turner [mailto:ke...@deenlo.com]
*Sent:* Friday, February 19, 2016 5:11 PM
*To:* user@accumulo.apache.org
*Cc:* Jonathan Lasko; Maxwell Jordan; kstud...@bbn.com
*Subject:* Re: Bug in either InMemoryMap or NativeMap

On Fri, Feb 19, 2016 at 3:34 PM, Dan Blum mailto:db...@bbn.com>> wrote:

(Resend: I forgot to actually subscribe before sending originally.)

I noticed a difference in behavior between our cluster and our tests running
on MiniCluster: when multiple put() calls are made to a Mutation with the
same CF, CQ, and CV and no explicit timestamp, on a live cluster only the
last one is written, whereas in Mini all of them are.

Of course in most cases it wouldn't matter but if there is a Combiner set on
the column (which is the case I am dealing with) then it does.

I believe the difference in behavior is due to code in NativeMap._mutate and
InMemoryMap.DefaultMap.mutate. In the former if there are multiple
ColumnUpdates in a Mutation they all get written with the same mutationCount
value; I haven't looked at the C++ map code but I assume that this means
that entries with the same CF/CQ/CV/timestamp will overwrite each other. In
contrast, in DefaultMap multiple ColumnUpdates are stored with an
incrementing kvCount, so the keys will necessarily be distinct.

You made this issue easy to track down.

This seems like a bug w/ the native map. The code allocates a unique int
for each key/value in the mutation.


https://github.com/apache/accumulo/blob/rel/1.6.5/server/tserver/src/main/java/org/apache/accumulo/tserver/InMemoryMap.java#L476

It seems like the native map code should increment like the DefaultMap
code does. Specifically it seems like the following code should
increment mutationCount (coordinating with the code that calls it)

https://github.com/apache/accumulo/blob/rel/1.6.5/server/tserver/src/main/java/org/apache/accumulo/tserver/NativeMap.java#L532

Would you like to open an issue in Jira?


My main question is: which of these is the intended behavior? We'll
obviously need to change our code to work with NativeMap's current
implementation regardless (since we don't want to use the Java maps on a
live cluster), but it would be useful to know if that change is
temporary or
permanent.

My secondary question is whether there is any trick to getting
native maps
to work in MiniCluster, which would be very helpful for our testing. I
changed the configuration XML we use and I can see that it picks up the
change - server.Accumulo logs "tserver.memory.maps.native.enabled =
true,"
but NativeMap never logs that it tries to load the library so the
setting
seems to be dropped somewhere.



Re: Unable to get Mini to use native maps - 1.6.2

2016-02-23 Thread Josh Elser

Hi Dan,

I'm seeing in our internal integration tests that we have some 
configuration happening which (at least, intends to) configure the 
native maps for the minicluster.


If you're not familiar, the MiniAccumuloConfig and MiniAccumuloCluster 
classes are thin wrappers around MiniAccumuloConfigImpl and 
MiniAccumuloClusterImpl. There is a setNativeLibPaths method on 
MiniAccumuloConfigImpl which you can use to provide the path to the 
native library shared object (.so). You will probably have to switch 
from MiniAccumuloConfig/MiniAccumuloCluster to 
MiniAccumuloConfigImpl/MiniAccumuloClusterImpl to use the "hidden" methods.


You could also look at MiniClusterHarness.java in >=1.7 if you want a 
concrete example of how we initialize things for our tests.


- Josh

Dan Blum wrote:

In order to test to make sure we don't have more code that needs a
workaround for https://issues.apache.org/jira/browse/ACCUMULO-4148 I am
trying again to enable the native maps for Mini, which we use for testing.

I set tserver.memory.maps.native.enabled to true in the site XML, and this
is getting picked up since I see this in the Mini logs:

[server.Accumulo] INFO : tserver.memory.maps.native.enabled = true

However, NativeMap should log something when it tries to load the library,
whether it succeeds or fails, but it logs nothing. The obvious conclusion is
that something about how MiniAccumuloCluster starts means that this setting
is ignored or overridden, but I am not finding it. (I see the mergeProp call
in MiniAccumuloConfigImpl.initialize which will set TSERV_NATIVEMAP_ENABLED
to false, but that should only set it if it's not already in the properties,
which it should be, and as far as I can tell the log message above is issued
after this.)



Re: Unable to get Mini to use native maps - 1.6.2

2016-02-23 Thread Josh Elser
Well, I'm near positive that 1.6.2 had native maps working, so there 
must be something unexpected happening :). MAC should be very close to 
what a real standalone instance is doing -- if you have the ability to 
share some end-to-end project with where you are seeing this, that'd be 
extremely helpful (e.g. a Maven project that we can just run would be 
superb).


Dan Blum wrote:

I'll take a look but I don't think the path is the problem - NativeMap
should try to load the library regardless of whether this path is set and
will log if it can't find it. This isn't happening.

-----Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Tuesday, February 23, 2016 12:27 PM
To: user@accumulo.apache.org
Subject: Re: Unable to get Mini to use native maps - 1.6.2

Hi Dan,

I'm seeing in our internal integration tests that we have some
configuration happening which (at least, intends to) configure the
native maps for the minicluster.

If you're not familiar, the MiniAccumuloConfig and MiniAccumuloCluster
classes are thin wrappers around MiniAccumuloConfigImpl and
MiniAccumuloClusterImpl. There is a setNativeLibPaths method on
MiniAccumuloConfigImpl which you can use to provide the path to the
native library shared object (.so). You will probably have to switch
from MiniAccumuloConfig/MiniAccumuloCluster to
MiniAccumuloConfigImpl/MiniAccumuloClusterImpl to use the "hidden" methods.

You could also look at MiniClusterHarness.java in>=1.7 if you want a
concrete example of how we initialize things for our tests.

- Josh

Dan Blum wrote:

In order to test to make sure we don't have more code that needs a
workaround for https://issues.apache.org/jira/browse/ACCUMULO-4148 I am
trying again to enable the native maps for Mini, which we use for testing.

I set tserver.memory.maps.native.enabled to true in the site XML, and this
is getting picked up since I see this in the Mini logs:

[server.Accumulo] INFO : tserver.memory.maps.native.enabled = true

However, NativeMap should log something when it tries to load the library,
whether it succeeds or fails, but it logs nothing. The obvious conclusion

is

that something about how MiniAccumuloCluster starts means that this

setting

is ignored or overridden, but I am not finding it. (I see the mergeProp

call

in MiniAccumuloConfigImpl.initialize which will set

TSERV_NATIVEMAP_ENABLED

to false, but that should only set it if it's not already in the

properties,

which it should be, and as far as I can tell the log message above is

issued

after this.)





Re: Unable to get Mini to use native maps - 1.6.2

2016-02-23 Thread Josh Elser
MiniAccumuloCluster spawns its own processes, though. Calling 
NativeMap.isLoaded() in your test JVM isn't proving anything.


That's why you need to call these methods on MAC, you would need to 
check the TabletServer*.log file(s), and make sure that its 
configuration is set up properly to find the .so.


Does that make sense? Did I misinterpret you?

Dan Blum wrote:

I'll see what I can do, but there's no simple way to pull out something
small we can share (and it would have to be a gradle project).

I confirmed that the path is not the immediate issue by adding an explicit
call to NativeMap.isLoaded() at the start of my test - that produces logging
from NativeMap saying it can't find the library, which is what I expect.
Without this call NativeMap still logs nothing so the setting that should
cause it to be referenced is getting overridden somewhere. Calling
InstanceOperations.getSiteConfiguration and getSystemConfiguration shows
that the native maps are enabled, however.

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Tuesday, February 23, 2016 12:56 PM
To: user@accumulo.apache.org
Subject: Re: Unable to get Mini to use native maps - 1.6.2

Well, I'm near positive that 1.6.2 had native maps working, so there
must be something unexpected happening :). MAC should be very close to
what a real standalone instance is doing -- if you have the ability to
share some end-to-end project with where you are seeing this, that'd be
extremely helpful (e.g. a Maven project that we can just run would be
superb).

Dan Blum wrote:

I'll take a look but I don't think the path is the problem - NativeMap
should try to load the library regardless of whether this path is set and
will log if it can't find it. This isn't happening.

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Tuesday, February 23, 2016 12:27 PM
To: user@accumulo.apache.org
Subject: Re: Unable to get Mini to use native maps - 1.6.2

Hi Dan,

I'm seeing in our internal integration tests that we have some
configuration happening which (at least, intends to) configure the
native maps for the minicluster.

If you're not familiar, the MiniAccumuloConfig and MiniAccumuloCluster
classes are thin wrappers around MiniAccumuloConfigImpl and
MiniAccumuloClusterImpl. There is a setNativeLibPaths method on
MiniAccumuloConfigImpl which you can use to provide the path to the
native library shared object (.so). You will probably have to switch
from MiniAccumuloConfig/MiniAccumuloCluster to
MiniAccumuloConfigImpl/MiniAccumuloClusterImpl to use the "hidden"

methods.

You could also look at MiniClusterHarness.java in>=1.7 if you want a
concrete example of how we initialize things for our tests.

- Josh

Dan Blum wrote:

In order to test to make sure we don't have more code that needs a
workaround for https://issues.apache.org/jira/browse/ACCUMULO-4148 I am
trying again to enable the native maps for Mini, which we use for

testing.

I set tserver.memory.maps.native.enabled to true in the site XML, and

this

is getting picked up since I see this in the Mini logs:

[server.Accumulo] INFO : tserver.memory.maps.native.enabled = true

However, NativeMap should log something when it tries to load the

library,

whether it succeeds or fails, but it logs nothing. The obvious conclusion

is

that something about how MiniAccumuloCluster starts means that this

setting

is ignored or overridden, but I am not finding it. (I see the mergeProp

call

in MiniAccumuloConfigImpl.initialize which will set

TSERV_NATIVEMAP_ENABLED

to false, but that should only set it if it's not already in the

properties,

which it should be, and as far as I can tell the log message above is

issued

after this.)





Re: Unable to get Mini to use native maps - 1.6.2

2016-02-23 Thread Josh Elser
It would be really helpful if you could write up a minimal Groovy 
project that we can run that is doing exactly what you're trying.


I'm personally not sure what else to tell you: there are specific 
methods exposed which set up the native maps for MAC and these are 
exercised by our internal tests. If there's something we can pull 
down/run, it'll be much easier for us to provide more help/recommendations.


Dan Blum wrote:

I understand, but I should be seeing the same configuration as the MAC
process - or if not, why would it be different?

The MAC logs have mostly been what I have been looking at. As noted, they
have logging from NativeMap at all, which means it isn't even trying to load
the library. Granted I have to make sure it can find the library, but that
doesn't help if it never looks for it at all.

-----Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Tuesday, February 23, 2016 2:03 PM
To: user@accumulo.apache.org
Subject: Re: Unable to get Mini to use native maps - 1.6.2

MiniAccumuloCluster spawns its own processes, though. Calling
NativeMap.isLoaded() in your test JVM isn't proving anything.

That's why you need to call these methods on MAC, you would need to
check the TabletServer*.log file(s), and make sure that its
configuration is set up properly to find the .so.

Does that make sense? Did I misinterpret you?

Dan Blum wrote:

I'll see what I can do, but there's no simple way to pull out something
small we can share (and it would have to be a gradle project).

I confirmed that the path is not the immediate issue by adding an explicit
call to NativeMap.isLoaded() at the start of my test - that produces

logging

from NativeMap saying it can't find the library, which is what I expect.
Without this call NativeMap still logs nothing so the setting that should
cause it to be referenced is getting overridden somewhere. Calling
InstanceOperations.getSiteConfiguration and getSystemConfiguration shows
that the native maps are enabled, however.

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Tuesday, February 23, 2016 12:56 PM
To: user@accumulo.apache.org
Subject: Re: Unable to get Mini to use native maps - 1.6.2

Well, I'm near positive that 1.6.2 had native maps working, so there
must be something unexpected happening :). MAC should be very close to
what a real standalone instance is doing -- if you have the ability to
share some end-to-end project with where you are seeing this, that'd be
extremely helpful (e.g. a Maven project that we can just run would be
superb).

Dan Blum wrote:

I'll take a look but I don't think the path is the problem - NativeMap
should try to load the library regardless of whether this path is set and
will log if it can't find it. This isn't happening.

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Tuesday, February 23, 2016 12:27 PM
To: user@accumulo.apache.org
Subject: Re: Unable to get Mini to use native maps - 1.6.2

Hi Dan,

I'm seeing in our internal integration tests that we have some
configuration happening which (at least, intends to) configure the
native maps for the minicluster.

If you're not familiar, the MiniAccumuloConfig and MiniAccumuloCluster
classes are thin wrappers around MiniAccumuloConfigImpl and
MiniAccumuloClusterImpl. There is a setNativeLibPaths method on
MiniAccumuloConfigImpl which you can use to provide the path to the
native library shared object (.so). You will probably have to switch
from MiniAccumuloConfig/MiniAccumuloCluster to
MiniAccumuloConfigImpl/MiniAccumuloClusterImpl to use the "hidden"

methods.

You could also look at MiniClusterHarness.java in>=1.7 if you want a
concrete example of how we initialize things for our tests.

- Josh

Dan Blum wrote:

In order to test to make sure we don't have more code that needs a
workaround for https://issues.apache.org/jira/browse/ACCUMULO-4148 I am
trying again to enable the native maps for Mini, which we use for

testing.

I set tserver.memory.maps.native.enabled to true in the site XML, and

this

is getting picked up since I see this in the Mini logs:

[server.Accumulo] INFO : tserver.memory.maps.native.enabled = true

However, NativeMap should log something when it tries to load the

library,

whether it succeeds or fails, but it logs nothing. The obvious

conclusion

is

that something about how MiniAccumuloCluster starts means that this

setting

is ignored or overridden, but I am not finding it. (I see the mergeProp

call

in MiniAccumuloConfigImpl.initialize which will set

TSERV_NATIVEMAP_ENABLED

to false, but that should only set it if it's not already in the

properties,

which it should be, and as far as I can tell the log message above is

issued

after this.)





Re: IOException in internalRead! & transient exception communicating with ZooKeeper

2016-02-24 Thread Josh Elser
ZooKeeper is a funny system. This kind of ConnectionLossException is a 
normal "state" that a ZooKeeper client can enter. We handle this 
condition in Accumulo, retrying the operation (in this case, a 
`create()`), after the client can reconnect to the ZooKeeper servers in 
the background.


ConnectionLossExceptions can be indicative of over-saturation of your 
nodes. A ZooKeeper client might lose it's connection because it is 
starved for CPU time. It can also indicate that the ZooKeeper servers 
might be starved for resources.


* Check the ZooKeeper server logs for any errors about dropped 
connections (maxClientCnxns)
* Make sure your servers running Accumulo are not running at 100% total 
CPU usage and that there is free memory (no swapping).


ACCUMULO-3336 is about a different ZooKeeper error condition called a 
"session loss". This is when the entire ZooKeeper session needs to be 
torn down and recreated. This only happens after prolonged pauses in the 
client JVM or the ZooKeeper servers actively drop your connections due 
to the internal configuration (maxClientCnxns). The stacktrace you 
copied is not a session loss error.


Are you saying that when a ZooKeeper server dies, you cannot use 
Accumulo? How many are you running?


mohit.kaushik wrote:

Sent so early...

Another exception I am getting frequently with zookeeper which is a
bigger problem.
ACCUMULO-3336  says
it is unresolved yet

Saw (possibly) transient exception communicating with ZooKeeper
org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/accumulo/f8708e0d-9238-41f5-b948-8f435fd01207/gc/lock
at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at 
org.apache.accumulo.fate.zookeeper.ZooReader.getStatus(ZooReader.java:132)
at 
org.apache.accumulo.fate.zookeeper.ZooLock.process(ZooLock.java:383)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at 
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

And the worst case is whenever a zookeeper goes down cluster becomes
unreacheble for the time being, untill it restarts ingest process halts.

What do you suggest, I need to resolve these problems. I do not want to
be the ingest process to stop ever.

Thanks
Mohit kaushik


On 02/22/2016 12:06 PM, mohit.kaushik wrote:

I am facing the below given exception continuously, the count keeps on 
increasing every sec(current value around 3000 on a server) I can see the 
exception for all 3 tablet servers.

ACCUMULO-2420    says that 
this exception comes when a client closes a connection before scan completes. But the 
connection is not closed every thread uses a common connection object to ingest and 
query, then what could cause this exception?

java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at 
org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:537)
at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338)
at 
org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.select(CustomNonBlockingServer.java:228)
at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.run(CustomNonBlockingServer.java:184)

Regards
Mohit kaushik



Re: Custom formatter

2016-03-03 Thread Josh Elser
Hi Sravan,

I don't think Formatters have the ability to automatically obtain
table configuration at runtime. The lifecycle for Formatters is a bit
different than Iterators. Formatters are also only run client-side --
you could try to do something that reads from Java system properties
and pass in those options via the Accumulo shell.

On Thu, Mar 3, 2016 at 12:30 PM, Sravankumar Reddy Javaji (BLOOMBERG/
731 LEX)  wrote:
> Hello,
>
> Is it possible to provide options to custom Formatter in accumulo?
>
> I tried using OptionDescriber but it seems that OptionDescriber gets invoked
> only during setiterator command.
>
> Or at least is there any way to get current table properties (on which table
> the custom fomatter is set). I mean if the formatter was set on TABLE_A,
> then it has to load all the table properties into code during
> initialization. So, that I can set the required properties to table using
> "config" and will load this properties to custom formatter.
>
> -
> Regards,
> Sravan


Fwd: 1.6 Javadoc missing classes

2016-03-04 Thread Josh Elser
Good catch, Dan. Thanks for letting us know. Moving this one over to the 
dev list to discuss further.


Christopher, looks like it might also be good to include iterator 
javadocs despite not being in public API (interfaces, and o.a.a.c.i.user?).


 Original Message 
Subject: 1.6 Javadoc missing classes
Date: Fri, 4 Mar 2016 15:59:26 -0500
From: Dan Blum 
Reply-To: user@accumulo.apache.org
To: 

A lot of classes seem to have gone missing from
http://accumulo.apache.org/1.6/apidocs/ - SortedKeyValueIterator would be an
obvious example.



Re: 1.6 Javadoc missing classes

2016-03-04 Thread Josh Elser

Maybe the distributed tracing APIs?

Christopher wrote:

Sure, we can include that. Are there any other classes which would be
good to have javadocs for which aren't public API?

On Fri, Mar 4, 2016 at 4:03 PM Josh Elser mailto:josh.el...@gmail.com>> wrote:

Good catch, Dan. Thanks for letting us know. Moving this one over to the
dev list to discuss further.

Christopher, looks like it might also be good to include iterator
javadocs despite not being in public API (interfaces, and
o.a.a.c.i.user?).

 Original Message 
Subject: 1.6 Javadoc missing classes
Date: Fri, 4 Mar 2016 15:59:26 -0500
From: Dan Blum mailto:db...@bbn.com>>
Reply-To: user@accumulo.apache.org <mailto:user@accumulo.apache.org>
To: mailto:user@accumulo.apache.org>>

A lot of classes seem to have gone missing from
http://accumulo.apache.org/1.6/apidocs/ - SortedKeyValueIterator
would be an
obvious example.



Re: 1.6 Javadoc missing classes

2016-03-04 Thread Josh Elser
Oh, right. I forgot about the htrace shift. Don't we still have some 
public-facing wrappers? I think we should have Javadocs published for 
whatever the distributed tracing example we have published.


http://accumulo.apache.org/1.7/accumulo_user_manual.html#_instrumenting_a_client

Christopher wrote:

The tracing APIs vary from version to version significantly. That puts a
lot of extra effort on the person updating the included packages. How
important are those now we're transitioning to use an external dependency?

On Fri, Mar 4, 2016 at 5:17 PM Josh Elser mailto:josh.el...@gmail.com>> wrote:

Maybe the distributed tracing APIs?

Christopher wrote:
 > Sure, we can include that. Are there any other classes which would be
 > good to have javadocs for which aren't public API?
 >
 > On Fri, Mar 4, 2016 at 4:03 PM Josh Elser mailto:josh.el...@gmail.com>
 > <mailto:josh.el...@gmail.com <mailto:josh.el...@gmail.com>>> wrote:
 >
 > Good catch, Dan. Thanks for letting us know. Moving this one
over to the
 > dev list to discuss further.
 >
 > Christopher, looks like it might also be good to include iterator
 > javadocs despite not being in public API (interfaces, and
 > o.a.a.c.i.user?).
 >
 >  Original Message 
 > Subject: 1.6 Javadoc missing classes
 > Date: Fri, 4 Mar 2016 15:59:26 -0500
 > From: Dan Blum mailto:db...@bbn.com>
<mailto:db...@bbn.com <mailto:db...@bbn.com>>>
 > Reply-To: user@accumulo.apache.org
<mailto:user@accumulo.apache.org> <mailto:user@accumulo.apache.org
<mailto:user@accumulo.apache.org>>
 > To: mailto:user@accumulo.apache.org> <mailto:user@accumulo.apache.org
<mailto:user@accumulo.apache.org>>>
 >
 > A lot of classes seem to have gone missing from
 > http://accumulo.apache.org/1.6/apidocs/ - SortedKeyValueIterator
 > would be an
 > obvious example.
 >



Re: IOException in internalRead! & transient exception communicating with ZooKeeper

2016-03-11 Thread Josh Elser

Are you using a Scanner and then a BatchWriter for that existence check?

mohit.kaushik wrote:

I have upgraded to Accumulo-1.7.1 but the problem doesn't goes
completely. Now strangely I am getting the same error on a single server
not all. Is it because of the lookup that the application always does to
check the existence of a document before inserting one?

recent logs
Keith if you say I will create a jira issue for this if required.

Thanks


On 02/27/2016 04:03 AM, Keith Turner wrote:



On Fri, Feb 26, 2016 at 7:33 AM, mohit.kaushik
mailto:mohit.kaus...@orkash.com>> wrote:

Thanks Keith, But My Accumulo clients are using same connection
object. And the count for these WARN increase every second . Can
Monitor cause these exceptions?


I don't think so, but not 100% sure.  I think the MAster process
usually talks to the tservers to gather info and then the monitor
talks to the tserver.

I am wondering if there is any reason that this message should be
logged at WARN.  Seems like a routine event, should we open an issue
to look into logging this at a lower level?



On 02/24/2016 08:18 PM, Keith Turner wrote:

You can probably ignore those.  I think its caused by an Accumulo
client closing its connection.

On Wed, Feb 24, 2016 at 6:35 AM, mohit.kaushik
mailto:mohit.kaus...@orkash.com>> wrote:

here is screenshot, should I ignore these warnings?


internal read exception


On 02/22/2016 12:23 PM, mohit.kaushik wrote:

Sent so early...

Another exception I am getting frequently with zookeeper
which is a bigger problem.
ACCUMULO-3336
 says
it is unresolved yet
Saw (possibly) transient exception communicating with ZooKeeper
org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/accumulo/f8708e0d-9238-41f5-b948-8f435fd01207/gc/lock
at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at 
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at 
org.apache.accumulo.fate.zookeeper.ZooReader.getStatus(ZooReader.java:132)
at 
org.apache.accumulo.fate.zookeeper.ZooLock.process(ZooLock.java:383)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at 
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
And the worst case is whenever a zookeeper goes down cluster
becomes unreacheble for the time being, untill it restarts
ingest process halts.

What do you suggest, I need to resolve these problems. I do
not want to be the ingest process to stop ever.

Thanks
Mohit kaushik


On 02/22/2016 12:06 PM, mohit.kaushik wrote:

I am facing the below given exception continuously, the count keeps on 
increasing every sec(current value around 3000 on a server) I can see the 
exception for all 3 tablet servers.

ACCUMULO-2420    
says that this exception comes when a client closes a connection before scan 
completes. But the connection is not closed every thread uses a common connection 
object to ingest and query, then what could cause this exception?

java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at 
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at 
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at 
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at 
org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:537)
at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338)
at 
org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.select(CustomNonBlockingServer.java:228)
at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.run(CustomNonBlockingServer.java:184)

Regards
Mohit kaushik





Re: IOException in internalRead! & transient exception communicating with ZooKeeper

2016-03-12 Thread Josh Elser



Keith Turner wrote:

On Fri, Feb 26, 2016 at 7:33 AM, mohit.kaushik mailto:mohit.kaus...@orkash.com>> wrote:

Thanks Keith, But My Accumulo clients are using same connection
object. And the count for these WARN increase every second . Can
Monitor cause these exceptions?


I don't think so, but not 100% sure.  I think the MAster process usually
talks to the tservers to gather info and then the monitor talks to the
tserver.

I am wondering if there is any reason that this message should be logged
at WARN.  Seems like a routine event, should we open an issue to look
into logging this at a lower level?


Yeah, this is my hunch too. I know we can be a little lax in letting 
connections close via their configured timeout. It seems to me like this 
is just the server saying "oh, this got closed when I was expecting more 
data from the the client!". Would need to be certain like you say tho.


Re: Accumulo 1.7.1 on Docker

2016-03-14 Thread Josh Elser

Hi Sven,

I can't seem to find the source for your Docker image (nor am smart 
enough to figure out how to extract it). Any chance you can point us to 
that?


That said, I did run your image, and it looks like you didn't actually 
start Hadoop, ZooKeeper, or Accumulo.


Our documentation treats Hadoop and ZooKeeper primarily as 
prerequisites, so check their docs for good instructions.


For Accumulo -

Configuration: 
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_installation
Initialization: 
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_initialization
Starting: 
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_starting_accumulo


Sven Hodapp wrote:

Dear reader,

I'd like to create a docker image with accumulo 1.7.1.
I've install it from scratch with hadoop-2.7.2, zookeeper-3.4.8 and 
accumulo-1.7.1.
I'm going though the installation and every time I'll end up like that:

root@deebd8e29683:/# accumulo shell -u root
Password: **
2016-03-13 13:18:31,627 [trace.DistributedTrace] INFO : SpanReceiver 
org.apache.accumulo.tracer.ZooTraceClient was loaded successfully.
2016-03-13 13:20:31,955 [impl.ServerClient] WARN : There are no tablet servers: 
check that zookeeper and accumulo are running.

Anybody got an idea whats wrong? I have no idea anymore...

I'll currently share this docker image on docker hub.
If you want to try it you can simply start:

docker run -it themerius/accumulo /bin/bash

All binaries and configs are in there. In ./init.sh are my setup steps.

Regards,
Sven


Re: Accumulo 1.7.1 on Docker

2016-03-14 Thread Josh Elser

Ok! Thanks, I'll look for that init.sh script.

We're not too... docker-savvy here yet :), but we can definitely help 
out if we can find an automation script. Let me get back to you.


Sven Hodapp wrote:

Hi Josh,

thanks for your answer.
Currently I haven't uploaded the Dockerfile... If you want I'll upload it!
(But currently it only is an debian with ssh, rsync, jdk7, the unzipped 
distributions of hadoop, zookeeper and accumulo, and the set of environment 
variables)

Currently, and for testing, I start all things manually.
The steps I do is documented in /init.sh within the docker container.
I've enshured that dfs and zookeeper are up before starting accumulo.

Regards,
Sven

- Ursprüngliche Mail -

Von: "Josh Elser"
An: "user"
Gesendet: Montag, 14. März 2016 15:41:16
Betreff: Re: Accumulo 1.7.1 on Docker



Hi Sven,

I can't seem to find the source for your Docker image (nor am smart
enough to figure out how to extract it). Any chance you can point us to
that?

That said, I did run your image, and it looks like you didn't actually
start Hadoop, ZooKeeper, or Accumulo.

Our documentation treats Hadoop and ZooKeeper primarily as
prerequisites, so check their docs for good instructions.

For Accumulo -

Configuration:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_installation
Initialization:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_initialization
Starting:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_starting_accumulo

Sven Hodapp wrote:

Dear reader,

I'd like to create a docker image with accumulo 1.7.1.
I've install it from scratch with hadoop-2.7.2, zookeeper-3.4.8 and
accumulo-1.7.1.
I'm going though the installation and every time I'll end up like that:

root@deebd8e29683:/# accumulo shell -u root
Password: **
2016-03-13 13:18:31,627 [trace.DistributedTrace] INFO : SpanReceiver
org.apache.accumulo.tracer.ZooTraceClient was loaded successfully.
2016-03-13 13:20:31,955 [impl.ServerClient] WARN : There are no tablet servers:
check that zookeeper and accumulo are running.

Anybody got an idea whats wrong? I have no idea anymore...

I'll currently share this docker image on docker hub.
If you want to try it you can simply start:

 docker run -it themerius/accumulo /bin/bash

All binaries and configs are in there. In ./init.sh are my setup steps.

Regards,
Sven


Re: Accumulo 1.7.1 on Docker

2016-03-14 Thread Josh Elser

So, I got it running, and ran into the same problem you described.

The odd thing is that both the Accumulo Master and TabletServer seemed 
to be... stuck. They both claimed to be running, but they were both hung 
(unable to communicate). This ultimately caused the TabletServer to lose 
its ZooKeeper lock (and why the Shell didn't think it was running).


I'm running into all sorts of docker issues ATM (trying to upgrade now, 
hoping it will help), but I'd recommend trying to increase the amount of 
heap for both the master and tserver from 128M to 256M in 
/accumulo-1.7.1/conf/accumulo-env.sh (the -Xmx value).


Josh Elser wrote:

Ok! Thanks, I'll look for that init.sh script.

We're not too... docker-savvy here yet :), but we can definitely help
out if we can find an automation script. Let me get back to you.

Sven Hodapp wrote:

Hi Josh,

thanks for your answer.
Currently I haven't uploaded the Dockerfile... If you want I'll upload
it!
(But currently it only is an debian with ssh, rsync, jdk7, the
unzipped distributions of hadoop, zookeeper and accumulo, and the set
of environment variables)

Currently, and for testing, I start all things manually.
The steps I do is documented in /init.sh within the docker container.
I've enshured that dfs and zookeeper are up before starting accumulo.

Regards,
Sven

----- Ursprüngliche Mail -

Von: "Josh Elser"
An: "user"
Gesendet: Montag, 14. März 2016 15:41:16
Betreff: Re: Accumulo 1.7.1 on Docker



Hi Sven,

I can't seem to find the source for your Docker image (nor am smart
enough to figure out how to extract it). Any chance you can point us to
that?

That said, I did run your image, and it looks like you didn't actually
start Hadoop, ZooKeeper, or Accumulo.

Our documentation treats Hadoop and ZooKeeper primarily as
prerequisites, so check their docs for good instructions.

For Accumulo -

Configuration:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_installation
Initialization:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_initialization
Starting:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_starting_accumulo


Sven Hodapp wrote:

Dear reader,

I'd like to create a docker image with accumulo 1.7.1.
I've install it from scratch with hadoop-2.7.2, zookeeper-3.4.8 and
accumulo-1.7.1.
I'm going though the installation and every time I'll end up like that:

root@deebd8e29683:/# accumulo shell -u root
Password: **
2016-03-13 13:18:31,627 [trace.DistributedTrace] INFO : SpanReceiver
org.apache.accumulo.tracer.ZooTraceClient was loaded successfully.
2016-03-13 13:20:31,955 [impl.ServerClient] WARN : There are no
tablet servers:
check that zookeeper and accumulo are running.

Anybody got an idea whats wrong? I have no idea anymore...

I'll currently share this docker image on docker hub.
If you want to try it you can simply start:

docker run -it themerius/accumulo /bin/bash

All binaries and configs are in there. In ./init.sh are my setup steps.

Regards,
Sven


Re: Accumulo 1.7.1 on Docker

2016-03-14 Thread Josh Elser
Well, I'm not sure wtf is going on, but I added some sleeps to your 
init.sh script (30seconds before starting ZooKeeper and before init'ing 
Accumulo) and now things worked just fine.


I'm not seeing any Accumulo problems here, but rather the same Docker 
absurdities I run into every time I try to use it :). I also just 
upgrade to Docker 1.10.3 if that helps.


Josh Elser wrote:

So, I got it running, and ran into the same problem you described.

The odd thing is that both the Accumulo Master and TabletServer seemed
to be... stuck. They both claimed to be running, but they were both hung
(unable to communicate). This ultimately caused the TabletServer to lose
its ZooKeeper lock (and why the Shell didn't think it was running).

I'm running into all sorts of docker issues ATM (trying to upgrade now,
hoping it will help), but I'd recommend trying to increase the amount of
heap for both the master and tserver from 128M to 256M in
/accumulo-1.7.1/conf/accumulo-env.sh (the -Xmx value).

Josh Elser wrote:

Ok! Thanks, I'll look for that init.sh script.

We're not too... docker-savvy here yet :), but we can definitely help
out if we can find an automation script. Let me get back to you.

Sven Hodapp wrote:

Hi Josh,

thanks for your answer.
Currently I haven't uploaded the Dockerfile... If you want I'll upload
it!
(But currently it only is an debian with ssh, rsync, jdk7, the
unzipped distributions of hadoop, zookeeper and accumulo, and the set
of environment variables)

Currently, and for testing, I start all things manually.
The steps I do is documented in /init.sh within the docker container.
I've enshured that dfs and zookeeper are up before starting accumulo.

Regards,
Sven

- Ursprüngliche Mail -

Von: "Josh Elser"
An: "user"
Gesendet: Montag, 14. März 2016 15:41:16
Betreff: Re: Accumulo 1.7.1 on Docker



Hi Sven,

I can't seem to find the source for your Docker image (nor am smart
enough to figure out how to extract it). Any chance you can point us to
that?

That said, I did run your image, and it looks like you didn't actually
start Hadoop, ZooKeeper, or Accumulo.

Our documentation treats Hadoop and ZooKeeper primarily as
prerequisites, so check their docs for good instructions.

For Accumulo -

Configuration:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_installation
Initialization:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_initialization

Starting:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_starting_accumulo



Sven Hodapp wrote:

Dear reader,

I'd like to create a docker image with accumulo 1.7.1.
I've install it from scratch with hadoop-2.7.2, zookeeper-3.4.8 and
accumulo-1.7.1.
I'm going though the installation and every time I'll end up like
that:

root@deebd8e29683:/# accumulo shell -u root
Password: **
2016-03-13 13:18:31,627 [trace.DistributedTrace] INFO : SpanReceiver
org.apache.accumulo.tracer.ZooTraceClient was loaded successfully.
2016-03-13 13:20:31,955 [impl.ServerClient] WARN : There are no
tablet servers:
check that zookeeper and accumulo are running.

Anybody got an idea whats wrong? I have no idea anymore...

I'll currently share this docker image on docker hub.
If you want to try it you can simply start:

docker run -it themerius/accumulo /bin/bash

All binaries and configs are in there. In ./init.sh are my setup
steps.

Regards,
Sven


Re: Accumulo 1.7.1 on Docker

2016-03-22 Thread Josh Elser
Every time I use Docker, I seem to run into odd situations like this. 
Very frustrating.


I would love to know what's happening and get this working reliably. 
Having an "Apache-owned" docker image for Accumulo would be nice (and 
would help us internally as well as externally, IMO).


Andrew Hulbert wrote:

+1 for Josh's suggestion.

Not sure if its the same problem, but I too had to add some sleeps for
single node accumulo startup scripts directly before and after init'ing
accumulo. I dug in once and it seemed that the datanode needed more time
to let other processes know the files in HDFS existed at /accumulo.
Perhaps it was awaiting the exit from safe mode? Three seconds did it
for me:
https://github.com/jahhulbert-ccri/cloud-local/blob/master/bin/cloud-local.sh#L109



On 03/14/2016 01:49 PM, Josh Elser wrote:

Well, I'm not sure wtf is going on, but I added some sleeps to your
init.sh script (30seconds before starting ZooKeeper and before
init'ing Accumulo) and now things worked just fine.

I'm not seeing any Accumulo problems here, but rather the same Docker
absurdities I run into every time I try to use it :). I also just
upgrade to Docker 1.10.3 if that helps.

Josh Elser wrote:

So, I got it running, and ran into the same problem you described.

The odd thing is that both the Accumulo Master and TabletServer seemed
to be... stuck. They both claimed to be running, but they were both hung
(unable to communicate). This ultimately caused the TabletServer to lose
its ZooKeeper lock (and why the Shell didn't think it was running).

I'm running into all sorts of docker issues ATM (trying to upgrade now,
hoping it will help), but I'd recommend trying to increase the amount of
heap for both the master and tserver from 128M to 256M in
/accumulo-1.7.1/conf/accumulo-env.sh (the -Xmx value).

Josh Elser wrote:

Ok! Thanks, I'll look for that init.sh script.

We're not too... docker-savvy here yet :), but we can definitely help
out if we can find an automation script. Let me get back to you.

Sven Hodapp wrote:

Hi Josh,

thanks for your answer.
Currently I haven't uploaded the Dockerfile... If you want I'll upload
it!
(But currently it only is an debian with ssh, rsync, jdk7, the
unzipped distributions of hadoop, zookeeper and accumulo, and the set
of environment variables)

Currently, and for testing, I start all things manually.
The steps I do is documented in /init.sh within the docker container.
I've enshured that dfs and zookeeper are up before starting accumulo.

Regards,
Sven

- Ursprüngliche Mail -

Von: "Josh Elser"
An: "user"
Gesendet: Montag, 14. März 2016 15:41:16
Betreff: Re: Accumulo 1.7.1 on Docker



Hi Sven,

I can't seem to find the source for your Docker image (nor am smart
enough to figure out how to extract it). Any chance you can point
us to
that?

That said, I did run your image, and it looks like you didn't
actually
start Hadoop, ZooKeeper, or Accumulo.

Our documentation treats Hadoop and ZooKeeper primarily as
prerequisites, so check their docs for good instructions.

For Accumulo -

Configuration:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_installation

Initialization:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_initialization


Starting:
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_starting_accumulo




Sven Hodapp wrote:

Dear reader,

I'd like to create a docker image with accumulo 1.7.1.
I've install it from scratch with hadoop-2.7.2, zookeeper-3.4.8 and
accumulo-1.7.1.
I'm going though the installation and every time I'll end up like
that:

root@deebd8e29683:/# accumulo shell -u root
Password: **
2016-03-13 13:18:31,627 [trace.DistributedTrace] INFO : SpanReceiver
org.apache.accumulo.tracer.ZooTraceClient was loaded successfully.
2016-03-13 13:20:31,955 [impl.ServerClient] WARN : There are no
tablet servers:
check that zookeeper and accumulo are running.

Anybody got an idea whats wrong? I have no idea anymore...

I'll currently share this docker image on docker hub.
If you want to try it you can simply start:

docker run -it themerius/accumulo /bin/bash

All binaries and configs are in there. In ./init.sh are my setup
steps.

Regards,
Sven




Re: Accumulo 1.7.1 on Docker

2016-03-22 Thread Josh Elser

Cool beans.

If you're interested, myself (as well as others, I'm sure) would love to 
work on this.


Andrew Hulbert wrote:

With the single-node/docker I had to add a "sleep 3" after "hdfs
dfsadmin -safemode" returns too though before the "accumulo init"

I think having an Apache-owned docker for accumulo would rock! We have
made a few but having a canonical one would be great.

On 03/22/2016 02:47 PM, Michael Wall wrote:

Sven, did you ever put up a Dockerfile?  I've done something like this
waiting for HDFS to come out of safe mode.

echo "Starting Hadoop"
${HADOOP_HOME}/sbin/start-dfs.sh
echo "Waiting for hadoop to come out of safeMode"
local x=0
while [ $x -lt 1 ] do # try 5 times, for slow computers
  ${HADOOP_HOME}/bin/hdfs dfsadmin -safemode && x=5
  x=$(( $x + 1 ))
done

On Tue, Mar 22, 2016 at 2:41 PM, Josh Elser mailto:josh.el...@gmail.com>> wrote:

Every time I use Docker, I seem to run into odd situations like
this. Very frustrating.

I would love to know what's happening and get this working
reliably. Having an "Apache-owned" docker image for Accumulo would
be nice (and would help us internally as well as externally, IMO).


Andrew Hulbert wrote:

+1 for Josh's suggestion.

Not sure if its the same problem, but I too had to add some
sleeps for
single node accumulo startup scripts directly before and after
init'ing
accumulo. I dug in once and it seemed that the datanode needed
more time
to let other processes know the files in HDFS existed at
/accumulo.
Perhaps it was awaiting the exit from safe mode? Three seconds
did it
for me:

https://github.com/jahhulbert-ccri/cloud-local/blob/master/bin/cloud-local.sh#L109



On 03/14/2016 01:49 PM, Josh Elser wrote:

Well, I'm not sure wtf is going on, but I added some
sleeps to your
init.sh script (30seconds before starting ZooKeeper and before
init'ing Accumulo) and now things worked just fine.

I'm not seeing any Accumulo problems here, but rather the
same Docker
absurdities I run into every time I try to use it :). I
also just
upgrade to Docker 1.10.3 if that helps.

Josh Elser wrote:

So, I got it running, and ran into the same problem
you described.

The odd thing is that both the Accumulo Master and
TabletServer seemed
to be... stuck. They both claimed to be running, but
they were both hung
(unable to communicate). This ultimately caused the
TabletServer to lose
its ZooKeeper lock (and why the Shell didn't think it
was running).

I'm running into all sorts of docker issues ATM
(trying to upgrade now,
hoping it will help), but I'd recommend trying to
increase the amount of
heap for both the master and tserver from 128M to 256M in
/accumulo-1.7.1/conf/accumulo-env.sh (the -Xmx value).

Josh Elser wrote:

Ok! Thanks, I'll look for that init.sh script.

We're not too... docker-savvy here yet :), but we
can definitely help
out if we can find an automation script. Let me
get back to you.

Sven Hodapp wrote:

Hi Josh,

thanks for your answer.
Currently I haven't uploaded the Dockerfile...
If you want I'll upload
it!
(But currently it only is an debian with ssh,
rsync, jdk7, the
unzipped distributions of hadoop, zookeeper
and accumulo, and the set
of environment variables)

Currently, and for testing, I start all things
manually.
The steps I do is documented in /init.sh
within the docker container.
I've enshured that dfs and zookeeper are up
before starting accumulo.

Regards,
Sven

- Ursprüngliche Mail -

Von: "Josh

Elser"<<mailto:josh.el...@gmail.com>josh.el...@gmail.com>
An: "user"mailto:user@accumulo.apache.org>>
Gesendet: Montag, 

Re: 1.6 Javadoc missing classes

2016-03-28 Thread Josh Elser
So this just bit me. I went looking for Iterators and was confused why 
they weren't there.


Christopher wrote:

Sure, we can include that. Are there any other classes which would be
good to have javadocs for which aren't public API?

On Fri, Mar 4, 2016 at 4:03 PM Josh Elser mailto:josh.el...@gmail.com>> wrote:

Good catch, Dan. Thanks for letting us know. Moving this one over to the
dev list to discuss further.

Christopher, looks like it might also be good to include iterator
javadocs despite not being in public API (interfaces, and
o.a.a.c.i.user?).

 Original Message 
Subject: 1.6 Javadoc missing classes
Date: Fri, 4 Mar 2016 15:59:26 -0500
From: Dan Blum mailto:db...@bbn.com>>
Reply-To: user@accumulo.apache.org <mailto:user@accumulo.apache.org>
To: mailto:user@accumulo.apache.org>>

A lot of classes seem to have gone missing from
http://accumulo.apache.org/1.6/apidocs/ - SortedKeyValueIterator
would be an
obvious example.



Re: Fwd: why compaction failure on one table brings other tables offline, how to recover

2016-04-08 Thread Josh Elser



Billie Rinaldi wrote:

*From:* Jayesh Patel
*Sent:* Thursday, April 07, 2016 4:36 PM
*To:* 'user@accumulo.apache.org '
mailto:user@accumulo.apache.org>>
*Subject:* RE: why compaction failure on one table brings other tables
offline, how to recover

__ __

I have a 3 node Accumulo 1.7 cluster with a few small tables (few MB in
size at most).

__ __

I had one of those table fail minc because I had configured a
SummingCombiner with FIXEDLEN but had smaller values:

MinC failed (trying to convert to long, but byte array isn't long
enough, wanted 8 found 1) to create
hdfs://instance-accumulo:8020/accumulo/tables/1/default_tablet/F0002bcs.rf_tmp
retrying ...

__ __

I have learned since to set the ‘lossy’ parameter to true to avoid this.
*Why is the default value for it false* if it can cause catastrophic
failure that you’ll read about ahead.


I'm pretty sure I told you this on StackOverflow, but if you're not 
writing 8-byte long values, don't used FIXEDLEN. Use VARLEN instead.



However, this brought other the tablets for other tables offline without
any apparent errors or warnings. *Can someone please explain why?*


Can you provide logs? We are not wizards :)


In order to recover from this, I did a ‘droptable’ from the shell on the
affected tables, but they all got stuck in the ‘DELETING’ state.  I was
able to finally delete them using zkcli ‘rmr’ command. *Is there a
better way?*


Again, not sure why they would have gotten stuck in the deleting phase 
without more logs/context (nor how far along in the deletion process 
they got). It's possible that there were still entries in the 
accumulo.metadata table.



I’m assuming there is a more proper way because when I created the
tables again (with the same name), they went back to having a single
offline tablet right away. *Is this because there are “traces” of the
old table left behind that affect the new table even though the new
table has a different table id?*  I ended up wiping out hdfs and
recreating the accumulo instance. 


Accumulo uses monotonically increasing IDs to identify tables. The 
human-readable names are only there for your benefit. Creating a table 
with the same name would not cause a problem. It sounds like you got the 
metadata table in a bad state or have tabletservers in a bad state (if 
you haven't restarted them).



It seems that a small bug, writing 1 byte value instead of 8 bytes,
caused us to dump the whole accumulo instance.  Luckily the data wasn’t
that important, but this whole episode makes us wonder why doing things
the right way (assuming there is a right way) wasn’t obvious or if
Accumulo is just very fragile.



Causing Accumulo to be unable to flush data from memory to disk in a 
minor compaction is a very bad idea. One that we cannot automatically 
recover from because of the combiner configuration you set.


If you can provide logs and stack traces from the Accumulo services, we 
can try to help you further. This is not normal. If you don't believe 
me, take a look at the distributed tests we run each release where we 
write hundreds of gigabytes of data across many servers while randomly 
killing Accumulo processes.




Please ask away any questions/clarification you might have. We’ll
appreciate any input you might have so we make educated decisions about
using Accumulo going forward.

__ __

Thank you,

Jayesh




Re: Optimize Accumulo scan speed

2016-04-10 Thread Josh Elser



Mario Pastorelli wrote:

Hi,

I'm currently having some scan speed issues with Accumulo and I would
like to understand why and how can I solve it. I have geographical data
and I use as primary key the day and then the geohex, which is a
linearisation of lat and lon. The reason for this key is that I always
query the data for one day but for a set of geohexes with represent a
zone, so with this schema I can scan use a single scan to read all the
data for one day with few seeks. My problem is that the scan is
painfully slow: for instance, to read 5617019 rows it takes around 17
seconds and the scan speed is 13MB/s, less than 750k scan entries/s and
around 300 seeks. I enable the tracer and this is what I've got


13MB/s sounds like you're only actually querying one TabletServer. Dave 
and Andrew hit the nail on the head suggesting some sharding on the 
rowId. That will help get more servers involved in servicing your query.


You can also try turning on TRACE logging via log4j on 
org.apache.accumulo.core.client.impl. That should give you some insight 
about what the client is actually doing WRT RPCs.



17325+0 Dice@srv1 Dice.query
11+1 Dice@srv1 scan 11+1 Dice@srv1 scan:location
5+13 Dice@srv1 scan 5+13 Dice@srv1 scan:location
4+19 Dice@srv1 scan 4+19 Dice@srv1 scan:location
5+23 Dice@srv1 scan 4+24 Dice@srv1 scan:location
I'm not sure how to speedup the scanning. I have the following question:
   - is this speed normal?
   - can I involve more servers in the scan? Right now only two server
have the ranges but with a cluster of 15 machines it would be nice to
involve more of them. Is it possible?

Thanks,
Mario




Re: Fwd: why compaction failure on one table brings other tables offline, how to recover

2016-04-11 Thread Josh Elser
Do you mean that after an OOME, the tserver process didn't die and got 
into this bad state with an permanently offline tablet?


Christopher wrote:

You might be seeing https://issues.apache.org/jira/browse/ACCUMULO-4160

On Mon, Apr 11, 2016 at 5:52 PM Jayesh Patel mailto:jpa...@keywcorp.com>> wrote:

There really aren't a lot of log messages that can explain why
tablets for other tables went offline except the following:

2016-04-11 13:32:18,258
[tserver.TabletServerResourceManager$AssignmentWatcher] WARN :
tserver:instance-accumulo-3 Assignment for 2<< has been running for
at least 973455566ms
java.lang.Exception: Assignment of 2<<
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(Unknown Source)
 at

java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
Source)
 at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown
Source)
 at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown 
Source)
 at
java.util.concurrent.locks.ReentrantLock$FairSync.lock(Unknown Source)
 at java.util.concurrent.locks.ReentrantLock.lock(Unknown Source)
 at

org.apache.accumulo.tserver.TabletServer.acquireRecoveryMemory(TabletServer.java:2230)
 at
org.apache.accumulo.tserver.TabletServer.access$2600(TabletServer.java:252)
 at

org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2150)
 at
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
 at

org.apache.accumulo.tserver.ActiveAssignmentRunnable.run(ActiveAssignmentRunnable.java:61)
 at
org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
 at
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
 at java.lang.Thread.run(Unknown Source)

Table 2<< here doesn't have the issue with minc failing and so
shouldn’t be offline.  These messages happened on a restart of a
tserver if that offers any clues.  All the nodes were rebooted at
that time due to a power failure.  I'm assuming that it's tablet
went offline soon after this message first appeared in the logs.

Other tidbit of note is that the Accumulo operates for hours/days
without taking the tablets offline even though minc is failing and
it's the crash of a tserver due to OutOfMemory situation in one case
that seems to have taken the tablet offline.  Is it safe to assume
that other tservers are not able to pick up the tablets that are
failing minc from a crashed tserver?

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com
<mailto:josh.el...@gmail.com>]
Sent: Friday, April 08, 2016 10:52 AM
To: user@accumulo.apache.org <mailto:user@accumulo.apache.org>
Subject: Re: Fwd: why compaction failure on one table brings other
tables offline, how to recover



Billie Rinaldi wrote:
 > *From:* Jayesh Patel
 > *Sent:* Thursday, April 07, 2016 4:36 PM
 > *To:* 'user@accumulo.apache.org <mailto:user@accumulo.apache.org>
<mailto:user@accumulo.apache.org <mailto:user@accumulo.apache.org>>'
 > mailto:user@accumulo.apache.org>
<mailto:user@accumulo.apache.org <mailto:user@accumulo.apache.org>>>
 > *Subject:* RE: why compaction failure on one table brings other
tables
 > offline, how to recover
 >
 > __ __
 >
 > I have a 3 node Accumulo 1.7 cluster with a few small tables (few MB
 > in size at most).
 >
 > __ __
 >
 > I had one of those table fail minc because I had configured a
 > SummingCombiner with FIXEDLEN but had smaller values:
 >
 > MinC failed (trying to convert to long, but byte array isn't long
 > enough, wanted 8 found 1) to create
 >
hdfs://instance-accumulo:8020/accumulo/tables/1/default_tablet/F0002bc
 > s.rf_tmp
 > retrying ...
 >
 > __ __
 >
 > I have learned since to set the ‘lossy’ parameter to true to
avoid this.
 > *Why is the default value for it false* if it can cause catastrophic
 > failure that you’ll read about ahead.

I'm pretty sure I told you this on StackOverflow, but if you're not
writing 8-byte long values, don't used FIXEDLEN. Use VARLEN instead.

 > However, this brought other the tablets for other tables offline
 > without any apparent errors or warnings. *Can someone please exp

Re: Fwd: why compaction failure on one table brings other tables offline, how to recover

2016-04-11 Thread Josh Elser

Sorry, I meant that to Jayesh, not to you, Christopher :)

Christopher wrote:

I just meant that if there is a problem loading one tablet, other
tablets may stay indefinitely in an offline state due to ACCUMULO-4160,
however it got to that point.

On Mon, Apr 11, 2016 at 6:35 PM Josh Elser mailto:josh.el...@gmail.com>> wrote:

Do you mean that after an OOME, the tserver process didn't die and got
into this bad state with an permanently offline tablet?

Christopher wrote:
 > You might be seeing
https://issues.apache.org/jira/browse/ACCUMULO-4160
 >
 > On Mon, Apr 11, 2016 at 5:52 PM Jayesh Patel mailto:jpa...@keywcorp.com>
 > <mailto:jpa...@keywcorp.com <mailto:jpa...@keywcorp.com>>> wrote:
 >
 > There really aren't a lot of log messages that can explain why
 > tablets for other tables went offline except the following:
 >
 > 2016-04-11 13:32:18,258
 > [tserver.TabletServerResourceManager$AssignmentWatcher] WARN :
 > tserver:instance-accumulo-3 Assignment for 2<< has been
running for
 > at least 973455566ms
 > java.lang.Exception: Assignment of 2<<
 >  at sun.misc.Unsafe.park(Native Method)
 >  at java.util.concurrent.locks.LockSupport.park(Unknown
Source)
 >  at
 >
  
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
 > Source)
 >  at
 >
  
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown
 > Source)
 >  at
 >
  java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown 
Source)
 >  at
 >
  java.util.concurrent.locks.ReentrantLock$FairSync.lock(Unknown Source)
 >  at java.util.concurrent.locks.ReentrantLock.lock(Unknown
Source)
 >  at
 >
  
org.apache.accumulo.tserver.TabletServer.acquireRecoveryMemory(TabletServer.java:2230)
 >  at
 >
  
org.apache.accumulo.tserver.TabletServer.access$2600(TabletServer.java:252)
 >  at
 >
  
org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2150)
 >  at
 >
  org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
 >  at
 >
  
org.apache.accumulo.tserver.ActiveAssignmentRunnable.run(ActiveAssignmentRunnable.java:61)
 >  at
 >
  org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
 >  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 > Source)
 >  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 > Source)
 >  at
 >
  org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
 >  at java.lang.Thread.run(Unknown Source)
 >
 > Table 2<< here doesn't have the issue with minc failing and so
 > shouldn’t be offline.  These messages happened on a restart of a
 > tserver if that offers any clues.  All the nodes were rebooted at
 > that time due to a power failure.  I'm assuming that it's tablet
 > went offline soon after this message first appeared in the logs.
 >
 > Other tidbit of note is that the Accumulo operates for hours/days
 > without taking the tablets offline even though minc is
failing and
 > it's the crash of a tserver due to OutOfMemory situation in
one case
 > that seems to have taken the tablet offline.  Is it safe to
assume
 > that other tservers are not able to pick up the tablets that are
 > failing minc from a crashed tserver?
 >
 > -Original Message-
 > From: Josh Elser [mailto:josh.el...@gmail.com
<mailto:josh.el...@gmail.com>
 > <mailto:josh.el...@gmail.com <mailto:josh.el...@gmail.com>>]
 > Sent: Friday, April 08, 2016 10:52 AM
 > To: user@accumulo.apache.org
<mailto:user@accumulo.apache.org> <mailto:user@accumulo.apache.org
<mailto:user@accumulo.apache.org>>
 > Subject: Re: Fwd: why compaction failure on one table brings
other
 > tables offline, how to recover
 >
 >
 >
 > Billie Rinaldi wrote:
 > > *From:* Jayesh Patel
 > > *Sent:* Thursday, April 07, 2016 4:36 PM
 > > *To:* 'user@accumulo.apache.org
<mailto:user@accumulo.apache.org> <mailto:user@accumulo.apache.org
<mailto:user@accumulo.apache.org>>
 > <mailto:user@accumulo.apache.org
<mailto:user@accumulo.a

Re: Fwd: why compaction failure on one table brings other tables offline, how to recover

2016-04-12 Thread Josh Elser

Jayesh Patel wrote:

Josh, The OOM tserver process was killed by the kernel, it didn't hang
around.  I tried restarting it manually, but it ran out of memory right
away and was killed again leaving the tablet offline.  It must have a
huge "recovery" log to go through.  HDFS
/accumulo/wal/instance-accumulo+9997/24e08581-a081-4b41-afc5-d75bdda6cf15 is
about 42MB, and machine has about 300MB free and apparently not enough
for tserver.



Ok, cool. If you're that constrained on resources, you can also try 
reducing the property tserver.sort.buffer.size in accumulo-site.xml. It 
defaults to 200M, you could try 25M or 50M instead.


This is a buffer size that is used for sorting log edits during the 
recovery process. This might help if you never make it through the 
recovery process.


300MB is a little low in general as far as headroom goes (especially 
when you're already not giving Accumulo enough RAM). Typically, you want 
to ensure that you give the operating system at least 1G of memory for 
itself.


Re: Querying Accumulo Data Using Java

2016-04-28 Thread Josh Elser

Hi Ben,

Looks like you're on the right track. Iterator priorities are a little 
obtuse at first glance; you probably want to change the 1 to 15 (we can 
touch on the "why" later).


As far as Iterators/Filters that you should feel comfortable using, 
check out: 
https://github.com/apache/accumulo/tree/master/core/src/main/java/org/apache/accumulo/core/iterators/user


The "system" package are not meant for public consumption (and often are 
used to implement internal functionality). This is probably why you're 
having a hard time figuring out how to use it.


Don't miss the methods on Scanner: fetchColumnFamily(Text), 
fetchColumn(Text, Text), and fetchColumn(Column). These are how you can 
easily do column family or column family + qualifier filtering.


For example, if you wanted to filter on the column family "foo" and the 
column qualifier "bar":


```
Scanner scan = connector.createScanner("table", auths);
scan.fetchColumn(new Text("foo"), new Text("bar"));
RowIterator rowIterator = new RowIterator(scan);
while (...) { ... }
```

The fetch*() methods are also accumulative. If you want to fetch 
multiple cf's (or cf+cq pairs), you can invoke the method multiple times.


Ben Craig wrote:

Hey Guys I'm new to Accumulo and trying to learn how to query data.  I
think I've got the basics down like:

//create a scanner
Scanner scan = connector.createScanner( "table", auths );

//create a filter
IteratorSetting itr1 = new IteratorSetting( 1, "TimeFilter",
AgeOffFilter.class );
itr1.addOption( TTL, Long.toString( DEFAULT_QUERY_TIME ) );
scan.addScanIterator( itr1 );

//iterate over the resulting rows
RowIterator rowIterator = new RowIterator( scan );
while ( rowIterator.hasNext() )
{
}

I've been playing around with some of the built in filters and have been
able to apply multiple filters on top of each other.  Some of the
filters I'm having issues with where they take a complex java object and
not just option

For example ColumnQualifierFilter.java


When we use Iterator Settings the class is implicitly created but if I
want to use the ColumnQualifierFilter I need to create one and pass it a
set of columns.  I've been playing around with it for a while and havn't
been able to learn how to use it properly.

The constructor takes a sorted key value iterator.  How do I get this
sorted key value iterator?  Do I start with a scanner or do you start
with another type of scanner?  Do I just make one?
new ArrayList>(); ? And the data goes
into it?



I've read through this Accumulo
 book but it just
shows how you can use the Scanner/Iterator Settings to query.

If anyone has any suggestions / documentation / examples it be much
appreciated.

Thanks,

Ben


Re: Querying Accumulo Data Using Java

2016-04-28 Thread Josh Elser
A common pattern we have (but by no means the only tool) are static 
methods on the Iterator/Filter to help serialize domain-specific 
configuration in string=>string format.


e.g. LongCombiner.setEncodingType(IteratorSetting, LongCombiner.Type)

Dan Blum wrote:

You can’t pass objects because the iterator stack will be run in a
different JVM; anything that isn’t a string or coercible to a string
would need to be serialized. So if you have something complex you will
need to serialize it into a string somehow and then deserialize it in
your filter class, which is quite doable.

*From:*Ben Craig [mailto:ben.cra...@gmail.com]
*Sent:* Thursday, April 28, 2016 12:54 PM
*To:* user@accumulo.apache.org
*Subject:* Re: Querying Accumulo Data Using Java

Hey Josh,


Thanks for the response. I was getting pretty lost trying to use the
Internal Iterator. I will try using the fetchColumn on the scanner. I
guess I have one last question is there any possible way to pass a java
object to a custom filter or are we limited to the PropertyMap of
 ? I think its the String,String will work in almost all
cases I need to do but am more curious than anything.

Thanks,


Ben

On Thu, Apr 28, 2016 at 1:34 PM, Josh Elser mailto:josh.el...@gmail.com>> wrote:

Hi Ben,

Looks like you're on the right track. Iterator priorities are a little
obtuse at first glance; you probably want to change the 1 to 15 (we can
touch on the "why" later).

As far as Iterators/Filters that you should feel comfortable using,
check out:
https://github.com/apache/accumulo/tree/master/core/src/main/java/org/apache/accumulo/core/iterators/user

The "system" package are not meant for public consumption (and often are
used to implement internal functionality). This is probably why you're
having a hard time figuring out how to use it.

Don't miss the methods on Scanner: fetchColumnFamily(Text),
fetchColumn(Text, Text), and fetchColumn(Column). These are how you can
easily do column family or column family + qualifier filtering.

For example, if you wanted to filter on the column family "foo" and the
column qualifier "bar":

```
Scanner scan = connector.createScanner("table", auths);
scan.fetchColumn(new Text("foo"), new Text("bar"));
RowIterator rowIterator = new RowIterator(scan);
while (...) { ... }
```

The fetch*() methods are also accumulative. If you want to fetch
multiple cf's (or cf+cq pairs), you can invoke the method multiple times.

Ben Craig wrote:

Hey Guys I'm new to Accumulo and trying to learn how to query data. I
think I've got the basics down like:

//create a scanner
Scanner scan = connector.createScanner( "table", auths );

//create a filter
IteratorSetting itr1 = new IteratorSetting( 1, "TimeFilter",
AgeOffFilter.class );
itr1.addOption( TTL, Long.toString( DEFAULT_QUERY_TIME ) );
scan.addScanIterator( itr1 );

//iterate over the resulting rows
RowIterator rowIterator = new RowIterator( scan );
while ( rowIterator.hasNext() )
{
}

I've been playing around with some of the built in filters and have been
able to apply multiple filters on top of each other. Some of the
filters I'm having issues with where they take a complex java object and
not just option

For example ColumnQualifierFilter.java
<https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/system/ColumnQualifierFilter.java>

When we use Iterator Settings the class is implicitly created but if I
want to use the ColumnQualifierFilter I need to create one and pass it a
set of columns. I've been playing around with it for a while and havn't
been able to learn how to use it properly.

The constructor takes a sorted key value iterator. How do I get this
sorted key value iterator? Do I start with a scanner or do you start
with another type of scanner? Do I just make one?
new ArrayList>(); ? And the data goes
into it?



I've read through this Accumulo
<http://shop.oreilly.com/product/0636920032304.do> book but it just
shows how you can use the Scanner/Iterator Settings to query.

If anyone has any suggestions / documentation / examples it be much
appreciated.

Thanks,

Ben



Re: Reuse Accumulo lexicographical ordering

2016-05-10 Thread Josh Elser

Hi Mario,

I'm not sure I 100% understand your question. Are you asking about the 
code which sorts Accumulo Keys?


If so, Key implements the Comparable interface (the `compareTo(Key)` 
method). You might be able to make use of the `compareTo(Key, 
PartialKey)` method as well. You can use this with standard sorting 
implementations (e.g. Collections.sort(..) or any SortedMap implementation).


- Josh

Mario Pastorelli wrote:

Hi,
I would like to reuse the ordering of byte arrays that Accumulo uses for
the keys. Is it exposed to the users? Where can I find it?

Thanks,
Mario

--
Mario Pastorelli| TERALYTICS

*software engineer*

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone:+41794381682
email: mario.pastore...@teralytics.ch

www.teralytics.net 

Company registration number: CH-020.3.037.709-7 | Trade register Canton
Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
Yann de Vries

This e-mail message contains confidential information which is for the
sole attention and use of the intended recipient. Please notify us at
once if you think that it may not be intended for you and delete it
immediately.



Re: tableOperations().create hangs

2016-05-19 Thread Josh Elser

FYI, you're still missing a few :)

http://accumulo.apache.org/1.7/accumulo_user_manual#_network

I don't think that we're missing any though (the few I don't recognize 
look like other services on your system, HDFS, ZK, etc)


David Boyd wrote:

Boy I feel stupid.  I thought I had opened up all the required ports in
IPTables for this to work.

Thanks Sven and Michael

The following is what I put in IP tables to make everything accessible:

# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 2181 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 4560 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9000 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9997 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport  -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50010 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50020 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50070 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50075 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50090 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50091 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50095 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT


On 5/19/16 10:50 AM, Michael Wall wrote:

David,

If you are still having problems, can you take a jstack of the process
while it is hung and send it?

Mike

On Thu, May 19, 2016 at 9:59 AM, Sven Hodapp
mailto:sven.hod...@scai.fraunhofer.de>> wrote:

Hi David,

I had the same issue. I've found out to modify table configuration
in client code the master server must be available.
You have said, that you already altered the configuration, but
maybe the master is nevertheless inaccessible for the client?

Regards,
Sven

--
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hod...@scai.fraunhofer.de 
www.scai.fraunhofer.de 

- Ursprüngliche Mail -
> Von: "David Boyd" mailto:db...@incadencecorp.com>>
> An: "user" mailto:user@accumulo.apache.org>>
> CC: "Jing Yang" mailto:jy...@incadencecorp.com>>
> Gesendet: Donnerstag, 19. Mai 2016 15:22:26
> Betreff: tableOperations().create hangs

> Hello:
>
> I have the following code in one of my methods:
>
> if(!dbConnector.tableOperations().exists(coalesceTable)) {
> System.err.println("creating table " +
coalesceTable);
> dbConnector.tableOperations().create(coalesceTable);
> System.err.println("created table " +
coalesceTable);
> }
>
> The exists call works just fine.  If the table exists (e.g. I
created it
> from the accumulo shell)
> the code moves on and writes to the table just fine.
>
> If the table does not exist the create call is called and the entire
> process hangs.
>
> This same code works just fine using MiniAccumuloCluster in my
junit test.
>
> My accumlo is a single node 1.6.1 instance running in a separate
VM on
> my laptop.
>
> I saw a similar thread
>

(http://mail-archives.apache.org/mod_mbox/accumulo-dev/201310.mbox/%3c1382562693449-5858.p...@n5.nabble.com%3E)
> where the user set all the accumulo conf entries to the IP versus
> localhost, but I had already done that.  I reverified that
nothing in my
> accumulo configuration
> uses localhost.
>
> Any help would be appreciated.
>
>
>
> --
> = mailto:db...@incadencecorp.com
 
> David W. Boyd
> VP,  Data Solutions
> 10432 Balls Ford, Suite 240
> Manassas, VA 20109
> office: +1-703-552-2862 
> cell: +1-703-402-7908 
> == http://www.incadencecorp.com/ 
> ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference
Architecture
> Chair ANSI/INCITS TC Big Data
> Co-chair NIST Big Data Public Working Group Reference Architecture
> First Robotic Mentor - FRC, FTC -
www.iliterobotics.org
> Board Member- USSTEM Foundation -
www.usstem.org
>
> The information contained in this message may be privileged
> and/or confidential and

Accumulo folks at Hadoop Summit San Jose

2016-05-19 Thread Josh Elser
Out of curiosity, are there going to be any Accumulo-folks at Hadoop 
Summit in San Jose, CA at the end of June?


- Josh


Re: Feedback about techniques for tuning batch scanning for my problem

2016-05-23 Thread Josh Elser
Hi Mario,

If you have a finite number of locations, you could also try configuring a
locality group for each location. This would prune out a significant amount
of data.

I also wonder if you might have better performance by making the row a
concatenation of your location and entity identifier. I think actual
performance of this compared to what you have now would depend on the
number of entities and locations per day. This might be something you can
experiment with. Instead of adding some shard bit, you could reduce the
split threshold to get more parallelism.

I don't have the manual in front of me, but there is a property which
controls the server side batch side (how much data will be collected by a
server before it's send back to your batchscanner). If you have a lot of
processing by the client, you could lower that buffer to receive smaller
batches of data more frequently.
On May 19, 2016 11:08 AM, "Mario Pastorelli" 
wrote:

> Hey people,
> I'm trying to tune a bit the query performance to see how fast it can go
> and I thought it would be great to have comments from the community. The
> problem that I'm trying to solve in Accumulo is the following: we want to
> store the entities that have been in a certain location in a certain day.
> The location is a Long and the entity id is a Long. I want to be able to
> scan ~1M of rows in few seconds, possibly less than one. Right now, I'm
> doing the following things:
>
>1. I'm using a sharding byte at the start of the rowId to keep the
>data in the same range distributed in the cluster
>2. all the records are encoded, one single record is composed by
>   1. rowId: 1 shard byte + 3 bytes for the day
>   2. column family: 8 byte for the long corresponding to the hash of
>   the location
>   3. column qualifier: 8 byte corresponding to the identifier of the
>   entity
>   4. value: 2 bytes for some additional information
>3. I use a batch scanner because I don't need sorting and it's faster
>
> As expected, it takes few seconds to scan 1M rows but now I'm wondering if
> I can improve it. My ideas are the following:
>
>1. set table.compaction.major.ration to 1 because I don't care about
>the ingestion performance and this should improve the query performance
>2. pre-split tables to match the number of servers and then use a byte
>of shard as first byte of the rowId. This should improve both writing and
>reading the data because both should work in parallel for what I understood
>3. enable bloom filter on the table
>
> Do you think those ideas make sense? Furthermore, I have two questions:
>
>1. considering that a single entry is only 22 bytes but I'm going to
>scan ~1M records per query, do you think I should change the BatchScanner
>buffers somehow?
>2. anything else to improve the scan speed? Again, I don't care about
>the ingestion time
>
> Thanks for the help!
>
> --
> Mario Pastorelli | TERALYTICS
>
> *software engineer*
>
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
> phone: +41794381682
> email: mario.pastore...@teralytics.ch
> www.teralytics.net
>
> Company registration number: CH-020.3.037.709-7 | Trade register Canton
> Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
> de Vries
>
> This e-mail message contains confidential information which is for the
> sole attention and use of the intended recipient. Please notify us at once
> if you think that it may not be intended for you and delete it immediately.
>


Re: Feedback about techniques for tuning batch scanning for my problem

2016-05-23 Thread Josh Elser
This probably isn't a big issue unless you're running into stability issues
with Accumulo. They're both designed to scale horizontally. Unless you have
a reason that they can't be colocated, it's fine.
On May 21, 2016 2:29 PM, "David Medinets"  wrote:

> Why are you sharing the machines accumulo and Spark? Does Spark give you
> any kind of data locality that accumlo does? Could it be better to use the
> full amount of memory for each?
> On May 21, 2016 1:15 PM, "Mario Pastorelli" <
> mario.pastore...@teralytics.ch> wrote:
>
>> Currently setting the number of threads to both the number of servers and
>> the number of cores yield to the similar performance for scanning with
>> BatchScanner. Thanks for the advice, I will try to use half of cores of
>> each machines on the cluster.
>>
>> Anything else?
>>
>> On Sat, May 21, 2016 at 5:03 AM, David Medinets > > wrote:
>>
>>> It's been a few years so I don't remember the specific property names.
>>> Set one thread count to the number of servers times the number of cores to
>>> start. Divide by .5 if spark is equally as active as  accumulo. Look in
>>> properties.java for the property names.
>>>
>>> On Fri, May 20, 2016 at 10:09 AM, Mario Pastorelli <
>>> mario.pastore...@teralytics.ch> wrote:
>>>
 Machines have 32 cores shared between Accumulo and Spark. Each machine
 has 5 disks on which there is HDFS and that Accumulo can use. How many
 threads I should used?

 On Fri, May 20, 2016 at 3:49 PM, David Medinets <
 david.medin...@gmail.com> wrote:

> How many cores are on your servers? There are several thread counts
> you can change. Even +1 thread per server counts at some point if you have
> enough servers in the cluster.
>
> On Fri, May 20, 2016 at 2:54 AM, Mario Pastorelli <
> mario.pastore...@teralytics.ch> wrote:
>
>> You mean the BatchScanner number of threads? I've made it parametric
>> and usually I use 1 or 2 threads per tablet server. Going up doesn't seem
>> to do anything for the performance.
>>
>> On Thu, May 19, 2016 at 6:21 PM, David Medinets <
>> david.medin...@gmail.com> wrote:
>>
>>> Have you tuned thread counts?
>>> On May 19, 2016 11:08 AM, "Mario Pastorelli" <
>>> mario.pastore...@teralytics.ch> wrote:
>>>
 Hey people,
 I'm trying to tune a bit the query performance to see how fast it
 can go and I thought it would be great to have comments from the 
 community.
 The problem that I'm trying to solve in Accumulo is the following: we 
 want
 to store the entities that have been in a certain location in a certain
 day. The location is a Long and the entity id is a Long. I want to be 
 able
 to scan ~1M of rows in few seconds, possibly less than one. Right now, 
 I'm
 doing the following things:

1. I'm using a sharding byte at the start of the rowId to keep
the data in the same range distributed in the cluster
2. all the records are encoded, one single record is composed by
   1. rowId: 1 shard byte + 3 bytes for the day
   2. column family: 8 byte for the long corresponding to the
   hash of the location
   3. column qualifier: 8 byte corresponding to the identifier
   of the entity
   4. value: 2 bytes for some additional information
3. I use a batch scanner because I don't need sorting and it's
faster

 As expected, it takes few seconds to scan 1M rows but now I'm
 wondering if I can improve it. My ideas are the following:

1. set table.compaction.major.ration to 1 because I don't care
about the ingestion performance and this should improve the query
performance
2. pre-split tables to match the number of servers and then use
a byte of shard as first byte of the rowId. This should improve both
writing and reading the data because both should work in parallel 
 for what
I understood
3. enable bloom filter on the table

 Do you think those ideas make sense? Furthermore, I have two
 questions:

1. considering that a single entry is only 22 bytes but I'm
going to scan ~1M records per query, do you think I should change 
 the
BatchScanner buffers somehow?
2. anything else to improve the scan speed? Again, I don't care
about the ingestion time

 Thanks for the help!

 --
 Mario Pastorelli | TERALYTICS

 *software engineer*

 Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
 phone: +41794381682
 email: mario.pastore...@teralytics.ch
 www.teralytics.net

 Company registr

Re: I have a problem about change HDFS address

2016-05-25 Thread Josh Elser
While the context of 
http://accumulo.apache.org/1.7/accumulo_user_manual#_migrating_accumulo_from_non_ha_namenode_to_ha_namenode 
isn't quite the same as you, the steps of handling the address for a 
namenode changing is the same (more practical steps on how to configure 
instance.volumes.replacements).


Christopher wrote:

I believe you need to configure instance.volumes.replacements
http://accumulo.apache.org/1.7/accumulo_user_manual#_instance_volumes_replacements
to map your metadata from the old location to the new one.

On Wed, May 25, 2016 at 11:23 AM Keith Turner mailto:ke...@deenlo.com>> wrote:

Do you seen any errors in the Accumulo master log?

On Wed, May 25, 2016 at 11:17 AM, Lu Qin mailto:luq.j...@gmail.com>> wrote:


I only use new HDFS, I change the instance.volumes to the new
,and the instance.volumes.replcaements.
When accumulo start , I exec ./bin/accumulo
org.apache.accumulo.server.util.FindOfflineTablets ,it shows the
accumulo.root table UNASSIGNED



在 2016年5月25日,22:04,Keith Turner mailto:ke...@deenlo.com>> 写道:

Accumulo stores data in HDFS and Zookeeper.   Are you using
new zookeeper servers?  If so, did you copy zookeepers data?

On Wed, May 25, 2016 at 4:08 AM, Lu Qin mailto:luq.j...@gmail.com>> wrote:

I have a accumulo 1.7.1 work with a old HDFS 2.6. Now I
have a new HDFS 2.6,and change the accumulo volume to the new.

I have use distcp move the data from old HDFS to the new
HDFS.And start the accumulo up.

Now the ‘Accumulo Overview' shows the Tables is 0 and
Tablets is 0 with red background, but In 'Table Status’ I
can see all tables I have.
I use bin/accumulo shell and tables command,it also show
all tables,but I can not scan anyone.

How can I resolve it?Thanks







Re: Problem with IntersectingIterator and IndexDocIterator not returning results.

2016-05-26 Thread Josh Elser

Hi David,

Generally, I think you're confusing the type of table with what these 
Iterators are meant to run over.


Remember, "shard" or "sharded" refers to distributing some amount of 
data across many servers by some hash partitioning (commonly, at least). 
This involves setting some "salt" or bit in the rowId to distribute your 
records across many servers instead of on a single server.


The table you describe is what's referred to as an "inverted index". The 
term is the primary sort order. This makes it very quick to find all 
pointers to "documents" which contain the given term.


The iterators you're trying to use as designed to operate over what's 
referred to as a "local index". In this form, the index records are 
co-located with the data records in a separate column. So, for each 
rowId, one column (family) is devoted to storing index records, while 
another is devote to storing the actual data records. This structure is 
what the iterators are designed to work over. These iterators are novel 
because of some of the assumptions they can make on the physical data 
model of Accumulo tables, but let's ignore that for now :)


I know this isn't super helpful to you as-is. I'll see if I can find any 
time to make a better write-up for you.


Finally, as far as the iterator javadocs not being published was an 
intentional change, but one I believe we should revert. <-- **ping 
Chistopher**


- Josh

David Boyd wrote:

All:

I am using accumulo 1.6.1.  I am using a sharded index to search for
data
that matches the values of certain fields.  Here is the situation:

I have four fields EntityId, EntityIdType, EntityName, EntitySource.

Sometimes I need all records which match EntityId and EntityIdType
Othertimes I need all records which match all four fields.

The plan was use uses ranges in the scanner to determine which fields to
match against.  I have tried both subsets of ranges, setting all ranges, and
gotten the same result.

I created an Index as follows:
RowId = fieldname
ColumnFamily = fieldvalue
ColumnQualifier = the overall record id (RowID) of my main record in
another table.

Here is the output of a scan of my index table:


entityid
1707945d-34d8-455d-85b1-55610739ce62:1707945d-34d8-455d-85b1-55610739ce62
[]
entityidtype GUID:1707945d-34d8-455d-85b1-55610739ce62 []
name TestEntity:1707945d-34d8-455d-85b1-55610739ce62 []
source Unit Test:1707945d-34d8-455d-85b1-55610739ce62 []


NOTE:  While in this case the entityid equals the overall RowID from the
other table that is not always true

When I run the code below it does not return any rows in the scanner.
In the debugger when running the code below terms show as follows:
[1707945d-34d8-455d-85b1-55610739ce62, GUID, TestEntity, Unit Test]

I have tried both IntersectingIterator and IndexDocIterator both have
the same results.
For whatever reason the API docs for these classes is not showing up on
the Apache
Accumulo site.

Am I missing the purpose/function of this iterator?

Do I have to call IndexedDocIterator.setColfs with some values so I get
the column qualifiers back?

Below is my code:

public List getCoalesceEntityKeysForEntityId(String entityId,
  String
entityIdType,
  String entityName,
  String
entitySource) throws CoalescePersistorException
 {
 // Use are sharded term index to find the merged keys
 Connector dbConnector = null;

 ArrayList keys = new ArrayList();

 Text[] terms = {new Text(entityId), new Text(entityIdType),
 new Text(entityName), new Text(entitySource)};


 try {
 dbConnector = AccumuloDataConnector.getDBConnector();

 BatchScanner keyscanner =
dbConnector.createBatchScanner(AccumuloDataConnector.coalesceEntityIndex, 
Authorizations.EMPTY,
4);

 // Set up an IntersectingIterator for the values
 IteratorSetting iter = new IteratorSetting(1, "intersect",
IndexedDocIterator.class);
 IndexedDocIterator.setColumnFamilies(iter,terms);
 keyscanner.addScanIterator(iter);

 // Use ranges to limit the bins searched
 //ArrayList ranges = new ArrayList();
 // May not be necessary to restrict ranges but will do it
to be safe
 //ranges.add(new Range("entityid"));
 //ranges.add(new Range("entityitype"));
 //ranges.add(new Range("entityname"));
// ranges.add(new Range("source"));
 //keyscanner.setRanges(ranges);
 keyscanner.setRanges(Collections.singleton(new Range()));

 // Return the list of keys
 for(Entry entry : keyscanner) {
 keys.add(entry.getKey().getColumnQualifier().toString());
 }

 } catch (TableNotFoundException ex) {
 System.err.println(ex

Re: Problem with IntersectingIterator and IndexDocIterator not returning results.

2016-05-26 Thread Josh Elser
FWIW, you can take the same general approach on the client side to 
intersect results in an inverted index. This is pretty close to your 
standard sort-merge-join.


You can create a scanner over the row for each term you want to 
intersect. Every time the top element of the scanner have equal ID's 
(the cq), that's a match. If the top elements are not equal, advance the 
Scanner with the lowest (lexicographic sorting) ID.


The only difference is that you have to do this at your client instead 
of pushing it down in an SKVI to Accumulo (but this is still a very 
efficient approach).


David Boyd wrote:

Josh:

 Thanks for the reply.

As I thought through it I realized the incorrect assumption I made as
the "anding" only happens within a rowid.   So time to come up with
another approach.

FYI - Love to hear the reasoning on taking down documentation ;-) I did
find the javadoc at:
https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.6.1



On 5/26/16 11:57 AM, Josh Elser wrote:

Hi David,

Generally, I think you're confusing the type of table with what these
Iterators are meant to run over.

Remember, "shard" or "sharded" refers to distributing some amount of
data across many servers by some hash partitioning (commonly, at
least). This involves setting some "salt" or bit in the rowId to
distribute your records across many servers instead of on a single
server.

The table you describe is what's referred to as an "inverted index".
The term is the primary sort order. This makes it very quick to find
all pointers to "documents" which contain the given term.

The iterators you're trying to use as designed to operate over what's
referred to as a "local index". In this form, the index records are
co-located with the data records in a separate column. So, for each
rowId, one column (family) is devoted to storing index records, while
another is devote to storing the actual data records. This structure
is what the iterators are designed to work over. These iterators are
novel because of some of the assumptions they can make on the physical
data model of Accumulo tables, but let's ignore that for now :)

I know this isn't super helpful to you as-is. I'll see if I can find
any time to make a better write-up for you.

Finally, as far as the iterator javadocs not being published was an
intentional change, but one I believe we should revert. <-- **ping
Chistopher**

- Josh

David Boyd wrote:

All:

I am using accumulo 1.6.1.  I am using a sharded index to search for
data
that matches the values of certain fields.  Here is the situation:

I have four fields EntityId, EntityIdType, EntityName, EntitySource.

Sometimes I need all records which match EntityId and EntityIdType
Othertimes I need all records which match all four fields.

The plan was use uses ranges in the scanner to determine which fields to
match against.  I have tried both subsets of ranges, setting all
ranges, and
gotten the same result.

I created an Index as follows:
RowId = fieldname
ColumnFamily = fieldvalue
ColumnQualifier = the overall record id (RowID) of my main record in
another table.

Here is the output of a scan of my index table:


entityid
1707945d-34d8-455d-85b1-55610739ce62:1707945d-34d8-455d-85b1-55610739ce62

[]
entityidtype GUID:1707945d-34d8-455d-85b1-55610739ce62 []
name TestEntity:1707945d-34d8-455d-85b1-55610739ce62 []
source Unit Test:1707945d-34d8-455d-85b1-55610739ce62 []


NOTE:  While in this case the entityid equals the overall RowID from the
other table that is not always true

When I run the code below it does not return any rows in the scanner.
In the debugger when running the code below terms show as follows:
[1707945d-34d8-455d-85b1-55610739ce62, GUID, TestEntity, Unit Test]

I have tried both IntersectingIterator and IndexDocIterator both have
the same results.
For whatever reason the API docs for these classes is not showing up on
the Apache
Accumulo site.

Am I missing the purpose/function of this iterator?

Do I have to call IndexedDocIterator.setColfs with some values so I get
the column qualifiers back?

Below is my code:

public List getCoalesceEntityKeysForEntityId(String entityId,
  String
entityIdType,
  String
entityName,
  String
entitySource) throws CoalescePersistorException
 {
 // Use are sharded term index to find the merged keys
 Connector dbConnector = null;

 ArrayList keys = new ArrayList();

 Text[] terms = {new Text(entityId), new Text(entityIdType),
 new Text(entityName), new Text(entitySource)};


 try {
 dbConnector = AccumuloDataConnector.getDBConnector();

 BatchScanner keyscanner =
dbConnector.createBatchScanner(Ac

Re: walog consumes all the disk space on power failure

2016-05-31 Thread Josh Elser

Hi Jayesh,

Can you quantify some rough size numbers for us? Are you seeing 
exceptions in the Accumulo tserver/master logs?


One thought is that when Accumulo creates new WAL files, it sets the 
blocksize to be 1G (as a trick to force HDFS into making some 
"non-standard" guarantees for us). As a result, it will appear that 
there are a number of very large WAL files (but they're essentially empty).


If your instance is in some situation where Accumulo is repeatedly 
failing to write to a WAL, it might think the WAL is bad, abandon it, 
and try to create a new one. If this is happening each time, I could see 
it explain the situation you described. However, you should see the 
TabletServers complaining loudly that they cannot write to the WALs.


Jayesh Patel wrote:

We have a 3 node Accumulo 1.7 cluster running as VMWare VMs with minute
amount of data compared to Accumulo standards.

We have run into a situation multiple times now where all the nodes have
a power failure and when they are trying to recover from it
simultaneously, walog grows exponentially and fills up all the available
disk space. We have confirmed that the walog folder under /accumulo in
hdfs is consuming 99% of the disk space.

We have tried freeing enough space to be able to run Accumulo processes
in the hopes of it burning through walog without success. Walog just
grew to take up the freed space.

Given that we need to better manage the power situation, we’re trying to
understand what could be causing this and if there’s anything we can do
to avoid this situation.

We have some heartbeat data being written to a table at a very small
constant rate which is not sufficient to cause a such large write-ahead
log even if HDFS was pulled from under Accumulo’s feet, so to speak
during the power failure in case you’re wondering.

Thank you,

Jayesh



Re: walog consumes all the disk space on power failure

2016-06-01 Thread Josh Elser

Oh. Why do you only have 16GB of space...

You might be able to tweak some of the configuration properties so that 
Accumulo is more aggressive in removing files, but I think you'd just 
kick the can down the road for another ~30minutes.


Jayesh Patel wrote:

All 3 nodes have 16GB disk space which was 98% consumed when we looked at
them after few hours after the power failed and was restored.  Normally it's
only 33% or about 5GB.
Once it got into this state Zookeeper couldn't even start because it
couldn't create some logfiles that it needs to create.  So the disk space
usage was real, not sure if you meant that or not.  Ended up wiping away
hdfs data folder and reformatting it to reclaim the space.

Definitely didn't see complaints about writing to WALs.  Only exception is
the following that showed up because namenode wasn't in the right state due
to constrained resources:

2016-05-23 07:06:17,599 [recovery.HadoopLogCloser] WARN : Error recovering
lease on hdfs://instance-accumul
o:8020/accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e8fbb9b
55c2e
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
.SafeModeException): Cannot rec
over the lease of
/accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e8fbb9b55c2e.
Name node is
  in safe mode.
Resources are low on NN. Please add or free up more resources then turn off
safe mode manually. NOTE:  If y
ou turn off safe mode before adding resources, the NN will immediately
return to safe mode. Use "hdfs dfsad
min -safemode leave" to turn safe mode off.
 at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FS
Namesystem.java:1327
)
 at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesyste
m.java:2828)
 at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNo
deRpcServer.java:667
)
 at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslator
PB.recoverLease(Clie
ntNamenodeProtocolServerSideTranslatorPB.java:663)
 at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNam
enodeProtocol$2.call
BlockingMethod(ClientNamenodeProtocolProtos.java)
 at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto
bufRpcEngine.java:61
6)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Unknown Source)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1657)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

 at org.apache.hadoop.ipc.Client.call(Client.java:1476)
 at org.apache.hadoop.ipc.Client.call(Client.java:1407)
 at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.jav
a:229)
 at com.sun.proxy.$Proxy15.recoverLease(Unknown Source)
 at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.recover
Lease(ClientNamenode

-----Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Tuesday, May 31, 2016 6:54 PM
To: user@accumulo.apache.org
Subject: Re: walog consumes all the disk space on power failure

Hi Jayesh,

Can you quantify some rough size numbers for us? Are you seeing exceptions
in the Accumulo tserver/master logs?

One thought is that when Accumulo creates new WAL files, it sets the
blocksize to be 1G (as a trick to force HDFS into making some "non-standard"
guarantees for us). As a result, it will appear that there are a number of
very large WAL files (but they're essentially empty).

If your instance is in some situation where Accumulo is repeatedly failing
to write to a WAL, it might think the WAL is bad, abandon it, and try to
create a new one. If this is happening each time, I could see it explain the
situation you described. However, you should see the TabletServers
complaining loudly that they cannot write to the WALs.

Jayesh Patel wrote:

We have a 3 node Accumulo 1.7 cluster running as VMWare VMs with
minute amount of data compared to Accumulo standards.

We have run into a situation multiple times now where all the nodes
have a power failure and when they are trying to recover from it
simultaneously, walog grows exponentially and fills up all the
available disk space. We have confirmed that the walog folder under
/accumulo in hdfs is consuming 99% of the disk space.

We have tried freeing enough space to be able to run Accumulo
processes in the hopes of it burning through walog without success.
Walog just grew to take up the freed space.

Given that we need to better manage the power situation, we're trying
to understand what could be 

Re: walog consumes all the disk space on power failure

2016-06-03 Thread Josh Elser

It depends on how much data you're writing. I can't answer that for ya.

Generally for hadoop, you want to avoid that 80-90% utilization (HDFS 
will limit you to 90 or 95% capacity usage by default, IIRC).


If you're running things like MapReduce, you'll need more headroom to 
account for temporary output, jars being copied, etc. Accumulo has some 
lag in free'ing disk space (e.g. during compaction, you'll have double 
space usage for the files you're re-writing), as does HDFS in actually 
deleting the blocks for files that were deleted.


Jayesh Patel wrote:

So what would you consider a safe minimum amount of disk space in this case?

Thank you,
Jayesh

-----Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Thursday, June 02, 2016 1:08 AM
To: user@accumulo.apache.org
Subject: Re: walog consumes all the disk space on power failure

Oh. Why do you only have 16GB of space...

You might be able to tweak some of the configuration properties so that
Accumulo is more aggressive in removing files, but I think you'd just kick
the can down the road for another ~30minutes.

Jayesh Patel wrote:

All 3 nodes have 16GB disk space which was 98% consumed when we looked
at them after few hours after the power failed and was restored.
Normally it's only 33% or about 5GB.
Once it got into this state Zookeeper couldn't even start because it
couldn't create some logfiles that it needs to create.  So the disk
space usage was real, not sure if you meant that or not.  Ended up
wiping away hdfs data folder and reformatting it to reclaim the space.

Definitely didn't see complaints about writing to WALs.  Only
exception is the following that showed up because namenode wasn't in
the right state due to constrained resources:

2016-05-23 07:06:17,599 [recovery.HadoopLogCloser] WARN : Error
recovering lease on hdfs://instance-accumul
o:8020/accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e
8fbb9b
55c2e
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.na
menode
.SafeModeException): Cannot rec
over the lease of


/accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e8fbb9b55c2e.

Name node is
   in safe mode.
Resources are low on NN. Please add or free up more resources then
turn off safe mode manually. NOTE:  If y ou turn off safe mode before
adding resources, the NN will immediately return to safe mode. Use
"hdfs dfsad min -safemode leave" to turn safe mode off.
  at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeM
ode(FS
Namesystem.java:1327
)
  at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNam
esyste
m.java:2828)
  at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(
NameNo
deRpcServer.java:667
)
  at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTran
slator
PB.recoverLease(Clie
ntNamenodeProtocolServerSideTranslatorPB.java:663)
  at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Cli
entNam
enodeProtocol$2.call
BlockingMethod(ClientNamenodeProtocolProtos.java)
  at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call
(Proto
bufRpcEngine.java:61
6)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Unknown Source)
  at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat
ion.ja
va:1657)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

  at org.apache.hadoop.ipc.Client.call(Client.java:1476)
  at org.apache.hadoop.ipc.Client.call(Client.java:1407)
  at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngi
ne.jav
a:229)
  at com.sun.proxy.$Proxy15.recoverLease(Unknown Source)
  at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.r
ecover
Lease(ClientNamenode

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Tuesday, May 31, 2016 6:54 PM
To: user@accumulo.apache.org
Subject: Re: walog consumes all the disk space on power failure

Hi Jayesh,

Can you quantify some rough size numbers for us? Are you seeing
exceptions in the Accumulo tserver/master logs?

One thought is that when Accumulo creates new WAL files, it sets the
blocksize to be 1G (as a trick to force HDFS into making some

"non-standard"

guarantees for us). As a result, it will appear that there are a
number of very large WAL files (but they're essentially empty).

If your instance is in some situation where Accumulo is repeatedly
failing to write to a WAL, it might think the WAL is bad, abandon it,
and try to 

Re: Completed 2nd long run of Fluo

2016-06-03 Thread Josh Elser

Great stuff, Keith! Looking forward to picking it apart in more detail:

bq. In the previous run Fluo worker processes were constantly being 
killed by YARN for exceeding memory limits


I remember stumbling across some (surprising) recommendations which were 
to just completely disable the vmem/pmem monitoring to avoid this. Nice 
that you were actually able to work around it "properly".


Keith Turner wrote:

A 2nd multi-day test of Fluo was completed on EC2 a week or two ago.  I
just got around to finishing writing it up. The test was run using
Accumulo 1.7.1.

http://fluo.io/blog/2016/05/17/webindex-long-run-2


Re: Unbalanced tablets or extra rfiles

2016-06-07 Thread Josh Elser
re #1, you can try grep'ing over the Accumulo metadata table to see if 
there are references to the file. It's possible that some files might be 
kept around for table snapshots (but these should eventually be 
compacted per Mike's point in #3, I believe).


Mike Drob wrote:

1) Is your Accumulo Garbage Collector process running? It will delete
un-referenced files.
2) I've heard it said that 200 tablets per tserver is the sweet spot,
but it depends a lot on your read and write patterns.
3)
https://accumulo.apache.org/1.7/accumulo_user_manual#_table_compaction_major_everything_idle

On Tue, Jun 7, 2016 at 4:03 PM, Andrew Hulbert mailto:ahulb...@ccri.com>> wrote:

Hi all,

A few questions on behavior if you have any time...

1. When looking in accumulo's HDFS directories I'm seeing a
situation where "tablets" aka "directories" for a table have more
than the default 1G split threshold worth of rfiles in them. In one
large instance, we have 400G worth of rfiles in the default_tablet
directory (a mix of A, C, and F-type rfiles). We took one of these
tables and compacted it and now there are appropriately ~1G worth of
files in HDFS. On an unrelated table we have tablets with 100+G of
bulk imported rfiles in the tablet's HDFS directory.

These seems to be common across multiple clouds. All the ingest is
done via batch writing. Is anyone aware of why this would happen or
if it is even important? Perhaps these are leftover rfiles from some
process. Their timestamps cover large date ranges.

2. There's been some discussion on the number of files per tserver
for efficiency. Are there any limits on the size of rfiles for
efficiency? For instance, I assume that compacting all the files
into a single rfile per 1G split is more efficient bc it avoids
merging (but maybe decreases concurrency). However, would it be
better to have 500 tablets per node on a table with 1G splits versus
having 50 tablets with 10G splits. Assuming HDFS and Accumulo don't
mind 10G files!

3. Is there any way to force idle tablets to actually major compact
other than the shell? Seems like it never happens.

Thanks!

Andrew




Re: Unbalanced tablets or extra rfiles

2016-06-07 Thread Josh Elser



Keith Turner wrote:



On Tue, Jun 7, 2016 at 5:48 PM, Andrew Hulbert mailto:ahulb...@ccri.com>> wrote:

Yeah it looks like in both cases there tablets that have ~del
markers but are also referenced as entries for tablets. I assume
there's no problem with both? Most are many many months old.


Yeah, nothing inherently wrong with it. It's easier to create the ~del 
entry when we know one tablet is done with it. The GC still checks the 
tablet row-space to make sure no tablets still have a reference (to 
Keith's point about how multiple tablets can refer to the same file).



Many actually seem to have multiple file: assignments (multiple rows
in metadata table) ...which shouldn't happen, right?


Its ok for multiple tablets(rows in metadata table) to reference the
same file.  When a tablet splits, both children may reference some of
the parents files.  When a file is bulk imported, it may go to multiple
tablets.


I also assume that the files in the directory don't particularly
matter since they are assigned to other tablets in the metdata table.

Cool & thanks again. Fun to learn the internals.

-Andrew



    On 06/07/2016 05:34 PM, Josh Elser wrote:

re #1, you can try grep'ing over the Accumulo metadata table to
see if there are references to the file. It's possible that some
files might be kept around for table snapshots (but these should
eventually be compacted per Mike's point in #3, I believe).

Mike Drob wrote:

1) Is your Accumulo Garbage Collector process running? It
will delete
un-referenced files.
2) I've heard it said that 200 tablets per tserver is the
sweet spot,
but it depends a lot on your read and write patterns.
3)

https://accumulo.apache.org/1.7/accumulo_user_manual#_table_compaction_major_everything_idle


On Tue, Jun 7, 2016 at 4:03 PM, Andrew Hulbert
mailto:ahulb...@ccri.com>
<mailto:ahulb...@ccri.com <mailto:ahulb...@ccri.com>>> wrote:

 Hi all,

 A few questions on behavior if you have any time...

 1. When looking in accumulo's HDFS directories I'm seeing a
 situation where "tablets" aka "directories" for a table
have more
 than the default 1G split threshold worth of rfiles in
them. In one
 large instance, we have 400G worth of rfiles in the
default_tablet
 directory (a mix of A, C, and F-type rfiles). We took
one of these
 tables and compacted it and now there are appropriately
~1G worth of
 files in HDFS. On an unrelated table we have tablets
with 100+G of
 bulk imported rfiles in the tablet's HDFS directory.

 These seems to be common across multiple clouds. All
the ingest is
 done via batch writing. Is anyone aware of why this
would happen or
 if it is even important? Perhaps these are leftover
rfiles from some
 process. Their timestamps cover large date ranges.

 2. There's been some discussion on the number of files
per tserver
 for efficiency. Are there any limits on the size of
rfiles for
 efficiency? For instance, I assume that compacting all
the files
 into a single rfile per 1G split is more efficient bc
it avoids
 merging (but maybe decreases concurrency). However,
would it be
 better to have 500 tablets per node on a table with 1G
splits versus
 having 50 tablets with 10G splits. Assuming HDFS and
Accumulo don't
 mind 10G files!

 3. Is there any way to force idle tablets to actually
major compact
 other than the shell? Seems like it never happens.

 Thanks!

 Andrew






Re: Accumulo Ingest JMX metric

2016-06-08 Thread Josh Elser

Yeah, this looks wrong.

The numbers the monitor gets are from the TabletServer polling each Tablet

https://github.com/apache/accumulo/blob/1.6/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java#L3676-L3677

The JMX metrics reported by the TServer should also likely do the same.

Care to file a JIRA issue?

Roshan Punnoose wrote:

I am using Accumulo 1.6.5 and trying to monitor ingest rates through JMX
endpoints. The ingest reported through JMX does not seem to match with
the ingest rates reported on the accumulo monitor. Looking at the source
code, I noticed that the TabletServerMBean is counting all the entries
in memory for online tablets on that TabletServer:

result+=tablet.getNumEntriesInMemory()


Is this is a good metric to use for ingest rates? Just want a good way
to measure ingest rates as I change configurations and run the
continuous ingest.

Thanks!
Roshan


Re: Accumulo Ingest JMX metric

2016-06-08 Thread Josh Elser

Boss. Thanks, Roshan.

Roshan Punnoose wrote:

Just filed: https://issues.apache.org/jira/browse/ACCUMULO-4334

Thanks Josh!

On Wed, Jun 8, 2016 at 11:13 AM Josh Elser mailto:josh.el...@gmail.com>> wrote:

Yeah, this looks wrong.

The numbers the monitor gets are from the TabletServer polling each
Tablet


https://github.com/apache/accumulo/blob/1.6/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java#L3676-L3677

The JMX metrics reported by the TServer should also likely do the same.

Care to file a JIRA issue?

Roshan Punnoose wrote:
 > I am using Accumulo 1.6.5 and trying to monitor ingest rates
through JMX
 > endpoints. The ingest reported through JMX does not seem to match
with
 > the ingest rates reported on the accumulo monitor. Looking at the
source
 > code, I noticed that the TabletServerMBean is counting all the
entries
 > in memory for online tablets on that TabletServer:
 >
 > result+=tablet.getNumEntriesInMemory()
 >
 >
 > Is this is a good metric to use for ingest rates? Just want a
good way
 > to measure ingest rates as I change configurations and run the
 > continuous ingest.
 >
 > Thanks!
 > Roshan



Re: visibility constraint performance

2016-06-09 Thread Josh Elser



Keith Turner wrote:



On Thu, Jun 9, 2016 at 2:49 PM, Jonathan Wonders mailto:jwonder...@gmail.com>> wrote:

Hi All,

I've been tracking down some performance issues on a few 1.6.x
environments and noticed some interesting and potentially
undesirable behavior in the visibility constraint.  When the
associated visibility evaluator checks a token to see if the user
has a matching authorization, it uses the security operations from
the tablet server's constraint environment which ends up
authenticating the user's credentials on each call.  This will end
up flooding logs with Audit messages corresponding to these
authentications if the Audit logging is enabled.  It also consumes a
non-negligible amount of CPU, produces a lot of garbage (maybe
50-60% of that generated under a heavy streaming ingest load), and
can cause some contention between client pool threads when accessing
the ZooCache.

My initial measurements indicate a 25-30% decrease in ingest rate
(entries/s and MB/s) for my environement and workload when this
constraint is enabled.  This is with the Audit logging disabled.

Is this intended behavior?  It seems like the authentication is
redundant with the authentication that is performed at the beginning
of the update session.


No. It would be best to avoid that behavior.


Agreed. Want to open up something on JIRA? It sounds like there might be 
a few things we can investigate.


* Synchronization/concurrency on ZooCache
* Excessive object creation when using the VisibilityConstraint
* Noticeable time spent creating Audit messages which are not logged 
(Auditing is disabled)


I miss any points?


Re: visibility constraint performance

2016-06-09 Thread Josh Elser



Jonathan Wonders wrote:


On Thu, Jun 9, 2016 at 3:58 PM, Sean Busbey mailto:bus...@cloudera.com>> wrote:

On Thu, Jun 9, 2016 at 2:47 PM, Josh Elser mailto:josh.el...@gmail.com>> wrote:
>
>  Agreed. Want to open up something on JIRA? It sounds like there
might be a
>  few things we can investigate.


I'm happy to open up a JIRA issue for this.



Boss. Thanks!


>
>  * Synchronization/concurrency on ZooCache
>  * Excessive object creation when using the VisibilityConstraint
>  * Noticeable time spent creating Audit messages which are not logged
>  (Auditing is disabled)
>
>  I miss any points?

sounds like duplicative authentication checks

--
busbey


I believe eliminating the redundant authentication checks would fix all
of these symptoms.

--Jonathan


Excellent.


Re: Bulk Ingest

2016-06-16 Thread Josh Elser
There are two big things that are required to really scale up bulk 
loading. Sadly (I guess) they are both things you would need to be 
implement on your own:


1) Avoid lots of small files. Target as large of files as you can, 
relative to your ingest latency requirements and your max file size (set 
on your instance or table)


2) Avoid having to import one file to multiple tablets. Remember that 
the majority of the metadata update for Accumulo is updating the tablet 
row with the new file. When you have one file which spans many tablets, 
you are now create N metadata updates instead of just one. When you 
create the files, take into account the split points of your table, and 
use that try to target one file per tablet.


Roshan Punnoose wrote:

We are trying to perform bulk ingest at scale and wanted to get some
quick thoughts on how to increase performance and stability. One of the
problems we have is that we sometimes import thousands of small files,
and I don't believe there is a good way around this in the architecture
as of yet. Already I have run into an rpc timeout issue because the
import process is taking longer than 5m. And another issue where we have
so many files after a bulk import that we have had to bump the
tserver.scan.files.open.max to 1K.

Here are some other configs that we have been toying with:
- master.fate.threadpool.size: 20
- master.bulk.threadpool.size: 20
- master.bulk.timeout: 20m
- tserver.bulk.process.threads: 20
- tserver.bulk.assign.threads: 20
- tserver.bulk.timeout: 20m
- tserver.compaction.major.concurrent.max: 20
- tserver.scan.files.open.max: 1200
- tserver.server.threads.minimum: 64
- table.file.max: 64
- table.compaction.major.ratio: 20

(HDFS)
- dfs.namenode.handler.count: 100
- dfs.datanode.handler.count: 50

Just want to get any quick ideas for performing bulk ingest at scale.
Thanks guys

p.s. This is on Accumulo 1.6.5


Re: [ANNOUNCE] Timely - Secure Time Series Database

2016-06-22 Thread Josh Elser

Awesome!

dlmar...@comcast.net wrote:

Timely is a time series database application that provides secure access to 
time series data. It is designed to be used with Apache Accumulo for 
persistence and Grafana for visualization. Timely is located at 
https://github.com/NationalSecurityAgency/timely .



Re: no visibility parameter in TableOperations

2016-07-01 Thread Josh Elser

Hi Jayesh,

deleteRows(...) is a tablet-level operation, instead of issuing updates (deletes, in this 
case) to specific "cells" of the table. Entire portions of the tablets are 
dropped. deleteRows() can be very efficient to delete a large contiguous portion of your 
table. If you need to selectively delete a certain cell (key-value pair) based on the 
visibility, just use the putDelete() method with a Mutation and the BatchWriter. The 
BatchDeleter is also an option if you haven't seen it yet.

Does that make sense?

Jayesh Patel wrote:
Why is there no visibility parameter in TableOperations, especially 
something like deleteRows()?


All the reads and writes obviously require visibility, and 
Mutation.putDelete() does also, so it feels like we’re able to cheat 
and deleteRows() that I might not have visibility into.


Thank you,
Jayesh



Re: default for tserver.total.mutation.queue.max increased from 50M to 1M in 1.7

2016-07-07 Thread Josh Elser

I think there's an order-minutes property which would cause a flush on idle.

table.compaction.minor.idle IIRC

Jeff Kubina wrote:

Interesting, is it only flushed when the buffer is full or is there a
time limit on it also? For example, if 25M of mutations are written and
no more when is the buffer flushed?

--
Jeff Kubina
410-988-4436


On Thu, Jul 7, 2016 at 11:50 AM, Christopher mailto:ctubb...@apache.org>> wrote:

The change was introduced in
https://issues.apache.org/jira/browse/ACCUMULO-1950, and it's an
entirely new property. The old property was a per-session property.
The new one is per-tserver, and is a better strategy, because it
reduces the risk of multiple writers exhausting exhausting tserver
memory, while still giving the user control over how frequently
flushes/sync's occur.

On Thu, Jul 7, 2016 at 10:32 AM Jeff Kubina mailto:jeff.kub...@gmail.com>> wrote:

I noticed that the default value for
tserver.total.mutation.queue.max in 1.7 is 50M but in 1.6 it is
1M (tserver.mutation.queue.max). Is this increase to compensate
for the performance hit of moving the WALs to the HDFS or some
other factor?

Is there a way to compute the number of times the buffer is
flushed to calculate how this effects performance?


--
Jeff Kubina





Re: java.lang.NoClassDefFoundError with fields of custom Filter

2016-07-07 Thread Josh Elser
Beware using the HDFS classloader in any Accumulo release that does not 
contain commons-vfs-2.1 as a dependency.


Commons-vfs-2.0 has multiple known issues which prevent it from being 
usable in the more basic sense.


Presently, there is only one release which contains the fix already: 
Accumulo 1.7.2. The upcoming 1.6.6 and 1.8.0 releases will also have the 
updated dependency.


Massimilian Mattetti wrote:

Hi Jim,

the approach of using namespace from HDFS looks promising. I need to
investigate a little on how it works but I guess I will take your advice.
Thank you.

Cheers,
Massimiliano




From: James Hughes 
To: user@accumulo.apache.org
Date: 07/07/2016 08:28 PM
Subject: Re: java.lang.NoClassDefFoundError with fields of custom Filter




Hi Massimiliano,

I'm a fan of producing uber jars for this kind of thing; we do that for
GeoMesa. There is one gotcha which can come up: if you have several uber
jars in lib/ext, they can collide in rather unexpected ways.

There are two options to call out:

First, Accumulo has support for loading jars from HDFS into namespaces.
With that, you could have various namespaces for different versions or
different collections of iterator projects. If you are sharing a dev
cloud with other projects or co-workers working on the same project that
can be helpful since it would avoid restarts, etc. Big thumbs-up for
this approach!

Second, rather than having an uber jar, you could build up zip files
with the various jars you need for your iterators and unzip them in
lib/ext. If you did that for multiple competing iterator projects, you'd
avoid duplication of code inside uber-jars. Also, you'd be able to see
if there are 8 versions of Log4J and Guava in lib/ext...;) It wouldn't
be as powerful as the namespace, but there's something nice about having
a low-tech approach.

Others will likely have varied experiences; I'm not sure if there's an
established 'best practice' here.

Cheers,

Jim

On Thu, Jul 7, 2016 at 12:56 PM, Massimilian Mattetti
<_massi...@il.ibm.com_ > wrote:
Thanks for your prompt response, you are right Jim. There is a static
dependency to Log4J in my lexicoder. Adding the Log4J Jar to the
classpath solved the problem.
Would you suggest to use an Uber Jar to avoid this kind of problems?

Regards,
Massimiliano




From: James Hughes <_jnh5y@virginia.edu_ >
To: _user@accumulo.apache.org_ 
Date: 07/07/2016 06:25 PM
Subject: Re: java.lang.NoClassDefFoundError with fields of custom Filter





Hi Massimilian,

As a quick note, your error says that it could not initialize class
accumulo.lexicoders.MyLexicoder. Did you provide all the dependencies
for your class on Accumulo's classpath?

That exception (or similar) can occur if there is a static block in your
MyLexicoder class which can't run properly.

Cheers,

Jim


On Thu, Jul 7, 2016 at 11:19 AM, Massimilian Mattetti
<_massi...@il.ibm.com_ > wrote:
Hi,

I have implemented a custom filter and a custom lexicoder. Both this two
classes are packed in the same jar that has been deployed under the
directory $ACCUMULO_HOME/*lib*/*ext*of my Accumulo servers (version
1.7.1). The lexicoder is used by the filter to get the real object from
the accumulo value and test some conditions on it. When I tried to scan
the table applying this filter I got the following exception:

Caused by: java.lang.NoClassDefFoundError: Could not initialize class
accumulo.lexicoders.MyLexicoder
at accumulo.filters.MyFilter.(MyFilter.java:24)
at sun.reflect.GeneratedConstructorAccessor9.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at java.lang.Class.newInstance(Class.java:442)
at
org.apache.accumulo.core.iterators.IteratorUtil.loadIterators(IteratorUtil.java:261)
at
org.apache.accumulo.core.iterators.IteratorUtil.loadIterators(IteratorUtil.java:237)
at
org.apache.accumulo.core.iterators.IteratorUtil.loadIterators(IteratorUtil.java:218)
at
org.apache.accumulo.core.iterators.IteratorUtil.loadIterators(IteratorUtil.java:205)
at
org.apache.accumulo.tserver.tablet.ScanDataSource.createIterator(ScanDataSource.java:193)
at
org.apache.accumulo.tserver.tablet.ScanDataSource.iterator(ScanDataSource.java:127)
at
org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:180)
at org.apache.accumulo.tserver.tablet.Tablet.nextBatch(Tablet.java:880)
at org.apache.accumulo.tserver.tablet.Scanner.read(Scanner.java:98)
at org.apache.accumulo.tserver.scan.NextBatchTask.run(NextBatchTask.java:69)
at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
... 4 more


I do not undestand how it is possible th

Re: Unable to import RFile produced by AccumuloFileOutputFormat

2016-07-08 Thread Josh Elser

Interesting! I have not run into this one before.

You could use `accumulo rfile-info`, but I'd guess that would net the 
same exception you see below.


Let me see if I can dig a little into the code and come up with a 
plausible explanation.


Russ Weeks wrote:

Hi, folks,

Has anybody ever encountered a problem where the RFiles that are
generated by AccumuloFileOutputFormat can't be imported using
TableOperations.importDirectory?

I'm seeing this problem very frequently for small RFiles and
occasionally for larger RFiles. The errors shown in the monitor's log UI
suggest a corrupt file, to me. For instance, the stack trace below shows
a case where the BCFileVersion was incorrect, but sometimes it will
complain about an invalid length, negative offset, or invalid codec.

I'm using HDP Accumulo 1.7.0 (1.7.0.2.3.4.12-1) on an encrypted HDFS
volume, with Kerberos turned on. The RFiles are generated by
AccumuloFileOutputFormat from a Spark job.

A very small RFile that exhibits this problem is available here:
http://firebar.newbrightidea.com/downloads/bad_rfiles/Iwaz.rf

I'm pretty confident that the keys are being written to the RFile in
order. Are there any tools I could use to inspect the internal structure
of the RFile?

Thanks,
-Russ

Unable to find tablets that overlap file
hdfs://[redacted]/accumulo/data/tables/f/b-ze9/Izeb.rf
java.lang.RuntimeException: Incompatible BCFile fileBCFileVersion.
at
org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.(BCFile.java:828)
at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.init(CachableBlockFile.java:246)
at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:257)
at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$100(CachableBlockFile.java:137)
at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:209)
at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:313)
at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:368)
at
org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:137)
at org.apache.accumulo.core.file.rfile.RFile$Reader.(RFile.java:843)
at
org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79)
at
org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:69)
at
org.apache.accumulo.server.client.BulkImporter.findOverlappingTablets(BulkImporter.java:644)
at
org.apache.accumulo.server.client.BulkImporter.findOverlappingTablets(BulkImporter.java:615)
at
org.apache.accumulo.server.client.BulkImporter$1.run(BulkImporter.java:146)
at
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:745)


Re: Unable to import RFile produced by AccumuloFileOutputFormat

2016-07-08 Thread Josh Elser
Yeah, I'd lean towards something corrupting the file as well. We 
presently have two BCFile versions: 2.0 and 1.0. Both are presently 
supported by the code so it should not be possible to create a bad RFile 
using our APIs (assuming correctness from the filesystem, anyways)


I'm reminded of HADOOP-11674, but a quick check does show that is fixed 
in your HDP-2.3.4 version (sorry for injecting $vendor here).


Some other thoughts on how you could proceed:

* Can Spark write the file to local fs? Maybe you can rule out HDFS w/ 
encryption as a contributing issue by just writing directly to local 
disk and then upload them to HDFS after the fact (as a test)
* `accumulo rfile-info` should fail in the same way if the metadata is 
busted as a way to verify things
* You can use rfile-info on both files in HDFS and local fs (tying into 
the first point)
* If you can share one of these files that is invalid, we can rip it 
apart and see what's going on.


William Slacum wrote:

I wonder if the file isn't being decrypted properly. I don't see why it
would write out incompatible file versions.

On Fri, Jul 8, 2016 at 3:02 PM, Josh Elser mailto:josh.el...@gmail.com>> wrote:

Interesting! I have not run into this one before.

You could use `accumulo rfile-info`, but I'd guess that would net
the same exception you see below.

Let me see if I can dig a little into the code and come up with a
plausible explanation.


Russ Weeks wrote:

Hi, folks,

Has anybody ever encountered a problem where the RFiles that are
generated by AccumuloFileOutputFormat can't be imported using
TableOperations.importDirectory?

I'm seeing this problem very frequently for small RFiles and
occasionally for larger RFiles. The errors shown in the
monitor's log UI
suggest a corrupt file, to me. For instance, the stack trace
below shows
a case where the BCFileVersion was incorrect, but sometimes it will
complain about an invalid length, negative offset, or invalid codec.

I'm using HDP Accumulo 1.7.0 (1.7.0.2.3.4.12-1) on an encrypted HDFS
volume, with Kerberos turned on. The RFiles are generated by
AccumuloFileOutputFormat from a Spark job.

A very small RFile that exhibits this problem is available here:
http://firebar.newbrightidea.com/downloads/bad_rfiles/Iwaz.rf

I'm pretty confident that the keys are being written to the RFile in
order. Are there any tools I could use to inspect the internal
structure
of the RFile?

Thanks,
-Russ

Unable to find tablets that overlap file
hdfs://[redacted]/accumulo/data/tables/f/b-ze9/Izeb.rf
java.lang.RuntimeException: Incompatible BCFile fileBCFileVersion.
at

org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.(BCFile.java:828)
at

org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.init(CachableBlockFile.java:246)
at

org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:257)
at

org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$100(CachableBlockFile.java:137)
at

org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:209)
at

org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:313)
at

org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:368)
at

org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:137)
at
org.apache.accumulo.core.file.rfile.RFile$Reader.(RFile.java:843)
at

org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79)
at

org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:69)
at

org.apache.accumulo.server.client.BulkImporter.findOverlappingTablets(BulkImporter.java:644)
at

org.apache.accumulo.server.client.BulkImporter.findOverlappingTablets(BulkImporter.java:615)
at

org.apache.accumulo.server.client.BulkImporter$1.run(BulkImporter.java:146)
at

org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at
org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
a

Re: Configuring batch writers

2016-07-19 Thread Josh Elser
It's very dependent on the requirements of your application and the
amount of data your application is serving. A general recommendation
which should be universal is try to limit each server to hundreds of
tablets. This, like everything else, is also a loose recommendation.

Likely, this will require experimentation on your end. If you can
share more details about the specifics of your data set and
requirements, we might be able to give you some more direction.

On Tue, Jul 19, 2016 at 12:35 PM, Jamie Johnson  wrote:
> Thank you, this was helpful.  What about the number of splits for a table.
> Is there a general rule of thumb for how many splits and what size they
> should be when trying to balance ingest/query performance?
>
> On Fri, Jul 15, 2016 at 2:38 PM, Emilio Lahr-Vivaz 
> wrote:
>>
>> Another thing to consider is how many tablet servers the mutations are
>> being sent to - if they're all going to a single split, that's going to
>> reduce your throughput a lot.
>>
>>
>> On 07/15/2016 02:33 PM, dlmar...@comcast.net wrote:
>>
>> The batch writer has several knobs (latency time, memory buffer, etc) that
>> you can tune to meet your requirements. The values for those settings will
>> depend on a lot of variables, to include:
>>
>>   - number of tablet servers
>>   - size of mutations
>>   - desired latency
>>   - memory buffer
>>   - configuration settings on the table(s) and tablet servers.
>>
>>  Suggest picking a starting point and see how it works for you, such as
>>
>>   threads - equal to the number of tablet servers (unless you have a
>> really large number of tablet servers)
>>   buffer - 100MB
>>   latency - 10 seconds
>>
>>  If you are hitting a wall with those settings, you could increase the
>> buffer and latency and/or change some settings on the server side that have
>> to do with the write ahead logs.
>>
>> 
>> From: "Jamie Johnson" 
>> To: user@accumulo.apache.org
>> Sent: Friday, July 15, 2016 2:16:40 PM
>> Subject: Configuring batch writers
>>
>> Is there any documentation that outlines reasonable settings for batch
>> writers given a known ingest rate?  For instance if I have a source that is
>> producing in the neighborhood of 15MB of mutations per second, what would a
>> reasonable configuration for the batch writer be to handle an ingest at this
>> rate? What are reasonable rules of thumb to follow to ensure that the
>> writers don't block, etc?
>>
>>
>


Re: Configuring batch writers

2016-07-19 Thread Josh Elser

Hundreds of tablets per server.

The numbers of tablets in your system divided by the number of 
tabletservers should likely be in the hundreds.


Jamie Johnson wrote:

hundreds of tablets for a particular table or all tables?

On Tue, Jul 19, 2016 at 1:43 PM, Josh Elser mailto:josh.el...@gmail.com>> wrote:

It's very dependent on the requirements of your application and the
amount of data your application is serving. A general recommendation
which should be universal is try to limit each server to hundreds of
tablets. This, like everything else, is also a loose recommendation.

Likely, this will require experimentation on your end. If you can
share more details about the specifics of your data set and
requirements, we might be able to give you some more direction.

On Tue, Jul 19, 2016 at 12:35 PM, Jamie Johnson mailto:jej2...@gmail.com>> wrote:
 > Thank you, this was helpful.  What about the number of splits for
a table.
 > Is there a general rule of thumb for how many splits and what
size they
 > should be when trying to balance ingest/query performance?
 >
 > On Fri, Jul 15, 2016 at 2:38 PM, Emilio Lahr-Vivaz
mailto:elahrvi...@ccri.com>>
 > wrote:
 >>
 >> Another thing to consider is how many tablet servers the
mutations are
 >> being sent to - if they're all going to a single split, that's
going to
 >> reduce your throughput a lot.
 >>
 >>
 >> On 07/15/2016 02:33 PM, dlmar...@comcast.net
<mailto:dlmar...@comcast.net> wrote:
 >>
 >> The batch writer has several knobs (latency time, memory buffer,
etc) that
 >> you can tune to meet your requirements. The values for those
settings will
 >> depend on a lot of variables, to include:
 >>
 >>   - number of tablet servers
 >>   - size of mutations
 >>   - desired latency
 >>   - memory buffer
 >>   - configuration settings on the table(s) and tablet servers.
 >>
 >>  Suggest picking a starting point and see how it works for you,
such as
 >>
 >>   threads - equal to the number of tablet servers (unless you have a
 >> really large number of tablet servers)
 >>   buffer - 100MB
 >>   latency - 10 seconds
 >>
 >>  If you are hitting a wall with those settings, you could
increase the
 >> buffer and latency and/or change some settings on the server
side that have
 >> to do with the write ahead logs.
 >>
 >> 
 >> From: "Jamie Johnson" mailto:jej2...@gmail.com>>
 >> To: user@accumulo.apache.org <mailto:user@accumulo.apache.org>
 >> Sent: Friday, July 15, 2016 2:16:40 PM
 >> Subject: Configuring batch writers
 >>
 >> Is there any documentation that outlines reasonable settings for
batch
 >> writers given a known ingest rate?  For instance if I have a
source that is
 >> producing in the neighborhood of 15MB of mutations per second,
what would a
 >> reasonable configuration for the batch writer be to handle an
ingest at this
 >> rate? What are reasonable rules of thumb to follow to ensure
that the
 >> writers don't block, etc?
 >>
 >>
 >




Re: Testing Spark Job that uses the AccumuloInputFormat

2016-08-03 Thread Josh Elser
MockAccumulo is also on its way out. It predates MiniAccumuloCluster 
and, for a while, was a very useful tool for running simple tests 
against Accumulo. However, there are numerous edge cases where 
MockAccumulo doesn't actually act like "real" Accumulo (whereas 
MiniAccumuloCluster does -- because it's the same exact code).


Would recommend focusing on MAC. Let us know how we can help further -- 
if you have examples for us to try that exhibit the problems you're 
facing, that would be very helpful.


Keith Turner wrote:

Mario

A little bit of background, miniaccumulo cluster launches external
Accumulo processes.   To launch these processes it needs a classpath
for the JVM.  It tries to obtain that from the current classloader,
but thats failing.  The following code is causing your problem.   We
should open a bug about it not working w/ sbt.

https://github.com/apache/accumulo/blob/rel/1.7.2/minicluster/src/main/java/org/apache/accumulo/minicluster/impl/MiniAccumuloClusterImpl.java#L250

One possible work around is to use the accumulo maven plugin.  This
will launch mini accumulo outside of your test, then your test just
use that launched instance. Do you think this might work for you?  If
not maybe we can come up with another workaround.

http://accumulo.apache.org/release_notes/1.6.0#maven-plugin

As for the MiniDFSCluster issue, that should be ok.   We use mini
accumulo cluster to test Accumulo itself.  Some of this ends up
bringing in a dependency on MiniDFSCluster, even thought the public
API for mini cluster does not support using it.  We need to fix this,
so that there is no dependency on MiniDFSCluster.


Keith

On Wed, Aug 3, 2016 at 6:51 AM, Mario Pastorelli
  wrote:

I'm trying to test a spark job that uses the AccumuloInputFormat but I'm
having many issues with both MockInstance and MiniAccumuloCluster.

1) MockInstance doesn't work with Spark jobs in my environment because it
looks like every task has a different instance of the MockInstance in
memory; if I add records from the driver, the executors can't find this
data. Is there a way to fix this?

2) MiniAccumuloCluster keeps giving strange errors. Two of them I can't
really fix:
   a. using sbt to run the tests throws  IllegalArgumentException Unknown
classloader type : sbt.classpath.NullLoader when MiniAccumuloClister is
instantiated. Anybody knows how to fix this? It's basically preventing me
from using Spark and Accumulo together.
   b. there is a warn that MiniDFSCluster is not found and a stub is used. I
have all the dependencies needed, included hdfs test. Is this warn ok?

Thanks for the help,
Mario

--
Mario Pastorelli | TERALYTICS

software engineer

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone: +41794381682
email: mario.pastore...@teralytics.ch
www.teralytics.net

Company registration number: CH-020.3.037.709-7 | Trade register Canton
Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
de Vries

This e-mail message contains confidential information which is for the sole
attention and use of the intended recipient. Please notify us at once if you
think that it may not be intended for you and delete it immediately.


Re: Persistent outstanding migrations message

2016-08-04 Thread Josh Elser
FWIW, migrations that never go away have been a symptom of bugs in the 
Master before. The master gets into a state where it either stops 
processing migrations or it doesn't realize that there is a migration to 
process. You might be able to grep over the Master log and find 
information about migrations. Sorry I don't have anything more specific.


The lock without a FATE op also seems problematic, but might be 
unrelated to the migration? You might be able to find more information 
in the master log about that FATE transaction ID.


Michael Wall wrote:

Are you currently experiencing 1 outstanding migration?  Does it go away
on it's own?  Unless servers are going down, tablets will migrate when
their split threshold is reached.  Is it possible you are constantly
splitting a table?

If all the tservers appear to be in good shape, maybe it is an issue
with the master.  What does the jstack look like for that?

On Thu, Aug 4, 2016 at 12:06 PM, Tim I mailto:t...@timisrael.com>> wrote:

Hi Mike,

Thanks for the direction.

Empty result set from the scan you suggested

There was a lock without an associated FATE operation.

The following locks did not have an associated FATE operation
txid: 667becf32c0fe544  locked: [R:+default]


No recoveries stuck currently, and no long running scans.

Otherwise, the system seems fine.

Is it possible this is just benign?  Should we monitor for locks
that don't have FATE operations and delete them from time to time?

Thanks,

Tim

On Thu, Aug 4, 2016 at 11:44 AM, Michael Wall mailto:mjw...@gmail.com>> wrote:

Hi Tim,

You can try scanning the metadata table for a future colfam.
Something like

scan -t accumulo.metadata -c fut

If you find one, look at the tabletserver that is slated to host
that tablet.  There could be an issue with that server
preventing assignment from completing.  Get a jstack and save
the logs so you can further troubleshoot.  Killing that tserver
will cause the assignment to go elsewhere, but make sure you get
as much info as you can before killing it.

What else is going on with the system?  Do you have any
recoveries that are stuck?  Are there any fate transactions that
have been running for a while?  Any long running scans?

HTH

Mike

On Thu, Aug 4, 2016 at 11:04 AM, Tim I mailto:t...@timisrael.com>> wrote:

Hi all,

We're running accumulo 1.6.5

One of the issues we're seeing on a consistent basis is this
message:

"Not balancing due to 1 outstanding migrations".


Is there a simple way to see the number of outstanding
migrations?  Based on what we've read and experienced, it
eventually means we have to bounce the master to get things
to a better state, however the message comes back within
about 1 hour.

Any thoughts and suggestions would be greatly appreciated.

Thanks,

Tim






Re: reboot autostart

2016-08-05 Thread Josh Elser
Most of the time, your operating system can do this for you via init.d 
scripts (chkconfig on RHEL6, I forget if they moved to systemd in RHEL7).


Most mechanisms also have some sort of "rc.local" script which you can 
provide your own commands to that is automatically run when the OS boots.


Michael Wall wrote:

What do you mean auto reboot sequence?  Are you asking about the service
start order?  Start dfs, then yarn, then zookeeper, then accumulo's
start-all.  Shutdown is the reverse.

On Fri, Aug 5, 2016 at 4:20 PM, Kevin Cho mailto:kcho...@gmail.com>> wrote:

Thanks again for helping on last ticket.  I'm trying to create auto
reboot sequence for Accumulo but it's not working right.  Did anyone
did this before? Tried googling but couldn't find much resource.




  1   2   3   4   5   6   7   8   9   >