Re: Merging smaller/empty tablets [SEC=UNOFFICIAL]

2017-01-16 Thread Mike Drob
I can't find this in the docs, but IIRC the merge command can take a
start/end range for what to merge. So the best option might be to try it on
a smaller slice and see what happens. At a guess, queries won't block but
indexing will.

Mike

On Mon, Jan 16, 2017 at 5:23 PM, Dickson, Matt MR <
matt.dick...@defence.gov.au> wrote:

> *UNOFFICIAL*
> That looks like a great option.  Before using it, whats the cost/impact
> of running this on a massive table in a system with other large bulk
> ingests/queries running?  In the past when I have used that (which was in
> 2013 so things may have changed) all ingests were blocked and it took days
> to complete.
>
> With 1.07T tablets to work on this may take some time?
>
>
> ------
> *From:* Mike Drob [mailto:md...@mdrob.com]
> *Sent:* Tuesday, 17 January 2017 09:37
> *To:* user@accumulo.apache.org
> *Subject:* Re: Merging smaller/empty tablets [SEC=UNOFFICIAL]
>
> http://accumulo.apache.org/1.8/accumulo_user_manual.html#_merging_tablets
>
> In order to merge small tablets, you can ask Accumulo to merge sections of
> a table smaller than a given size.
>
> root@myinstance> merge -t myTable -s 100M
>
>
>
> On Mon, Jan 16, 2017 at 4:31 PM, Dickson, Matt MR <
> matt.dick...@defence.gov.au> wrote:
>
>> *UNOFFICIAL*
>> I have a table that has evolved to have 1.07T tablets and I fairly
>> confident a large portion of these are now empty or very small.  I'd like
>> to merge smaller tablets and delete empty tablets, is there a smart way to
>> do this?
>>
>> My thought was to query the metadata table for all tablets under a
>> certain size for the table and then merge these tablets.
>>
>> Is the first number in thevalue the size of the tablet, ie
>>
>> >  scan -b 1xk -e 1xk\xff -c file
>> 1xk;34234 file:hdfs//name/accumulo/tables/1xk/t-er23423/M423432.rf []
>> *213134*,234234
>>
>> Also, are there any side effects of this that I need to be aware of when
>> doing this on a massive table?
>>
>> Thanks in advance,
>> Matt
>>
>
>


Re: Merging smaller/empty tablets [SEC=UNOFFICIAL]

2017-01-16 Thread Mike Drob
http://accumulo.apache.org/1.8/accumulo_user_manual.html#_merging_tablets

In order to merge small tablets, you can ask Accumulo to merge sections of
a table smaller than a given size.

root@myinstance> merge -t myTable -s 100M



On Mon, Jan 16, 2017 at 4:31 PM, Dickson, Matt MR <
matt.dick...@defence.gov.au> wrote:

> *UNOFFICIAL*
> I have a table that has evolved to have 1.07T tablets and I fairly
> confident a large portion of these are now empty or very small.  I'd like
> to merge smaller tablets and delete empty tablets, is there a smart way to
> do this?
>
> My thought was to query the metadata table for all tablets under a
> certain size for the table and then merge these tablets.
>
> Is the first number in thevalue the size of the tablet, ie
>
> >  scan -b 1xk -e 1xk\xff -c file
> 1xk;34234 file:hdfs//name/accumulo/tables/1xk/t-er23423/M423432.rf []
> *213134*,234234
>
> Also, are there any side effects of this that I need to be aware of when
> doing this on a massive table?
>
> Thanks in advance,
> Matt
>


Re: [ANNOUNCE] Apache Accumulo 1.7.2 Released

2016-06-23 Thread Mike Drob
Whoops, meant to say that we are proud to announce the release of Accumulo
version 1.7.2!

On Thu, Jun 23, 2016 at 10:47 AM, Mike Drob <md...@apache.org> wrote:

> The Accumulo team is proud to announce the release of Accumulo version
> 1.7.1!
>
> This release contains over 30 bugfixes and improvements over 1.7.1, and is
> backwards-compatible with 1.7.0 and 1.7.1. Existing users of 1.7.1 are
> encouraged to
> upgrade immediately.
>
> This version is now available in Maven Central, and at:
> https://accumulo.apache.org/downloads/
>
> The full release notes can be viewed at:
> https://accumulo.apache.org/release_notes/1.7.2.html
>
> The Apache Accumulo™ sorted, distributed key/value store is a robust,
> scalable, high performance data storage system that features cell-based
> access control and customizable server-side processing. It is based on
> Google's BigTable design and is built on top of Apache Hadoop, Apache
> ZooKeeper, and Apache Thrift.
>
> --
> The Apache Accumulo Team
>


Re: Accumulo GC and Hadoop trash settings

2015-08-17 Thread Mike Drob
If something goes wrong (i.e. somebody accidentally issues a big delete),
then having the Trash around makes recovery plausible.

On Mon, Aug 17, 2015 at 2:57 PM, James Hughes jn...@virginia.edu wrote:

 Hi all,

 From reading about the Accumulo GC, it sounds like temporary files are
 routinely deleted during GC cycles.  In a small testing environment, I've
 the HDFS Accumulo user's .Trash folder have 10s of gigabytes of data.

 Is there any reason that the default value for gc.trash.ignore is false?
 Is there any downside to deleting GC'ed files completely?

 Thanks in advance,

 Jim

 http://accumulo.apache.org/1.6/accumulo_user_manual.html#_gc_trash_ignore



Re: TSDB on Accumulo row key

2015-07-20 Thread Mike Drob
Our very own Eric Newton has a port of OpenTSDB running on Accumulo, might
be what you're looking for.

https://github.com/ericnewton/accumulo-opentsdb

On Mon, Jul 20, 2015 at 5:25 PM, Ranjan Sen ranjan_...@hotmail.com wrote:

 Hi All,
 Is there something like TSDB (Time series database) on Accumulo?

 Thanks
 Ranjan



Re: How to generate UUID in real time environment for Accumulo

2015-06-23 Thread Mike Drob
This sounds super close to a type 1 UUID -
https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_1_.28MAC_address_.26_date-time.29

On Tue, Jun 23, 2015 at 8:14 AM, Keith Turner ke...@deenlo.com wrote:

 Would something like the following work?

 row=time_client id_client counter

 Where the client id is a unique id per client instance, it would be
 allocated once using Zookeeper or an Accumulo Conditional writer when the
 client starts.   The client counter would be an AtomicLong in the client.

 On Tue, Jun 23, 2015 at 8:08 AM, mohit.kaushik mohit.kaus...@orkash.com
 wrote:

  Hi All,

 I have an application which can index data at very high rate from
 multiple clients. I need to generate a unique id to store documents.
 It Should
 (1) use the current system time in millies.
 (2) it should be designed to sort lexicographically on the bases of time.
 (3) if I just store the currentTimeInMillies than i can just index 1000
 unique docs per sec. It should be able to generate millions of UUID's per
 sec.

 I am searching for the best possible approach to implement, any help?
 Regards

 * Mohit Kaushik*
 Software Engineer
 A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India
 *Tel:* +91 (124) 4969352 | *Fax:* +91 (124) 4033553

  http://politicomapper.orkash.cominteractive social intelligence at
 work...

  https://www.facebook.com/Orkash2012
 http://www.linkedin.com/company/orkash-services-private-limited
 https://twitter.com/Orkash  http://www.orkash.com/blog/
 http://www.orkash.com
  http://www.orkash.com ... ensuring Assurance in complexity and
 uncertainty

 *This message including the attachments, if any, is a confidential
 business communication. If you are not the intended recipient it may be
 unlawful for you to read, copy, distribute, disclose or otherwise use the
 information in this e-mail. If you have received it in error or are not the
 intended recipient, please destroy it and notify the sender immediately.
 Thank you *






Re: CHANGES files

2015-06-10 Thread Mike Drob
Why not ask committers add a line to the CHANGES file about the change when
committing? Good place to highlight contributors too. Instead of an
auto-generated one or sticking the RM with it, we build it up over the
course of development.

Individual subtasks could be ignored, larger tasks could be included by
discretion of the author.

Ex: ACCUMULO-21224 Add more metrics to the monitor. (Jim Contributor via
Mike Drob)

On Wed, Jun 10, 2015 at 1:43 PM, Keith Turner ke...@deenlo.com wrote:



 On Wed, Jun 10, 2015 at 2:32 PM, Christopher ctubb...@apache.org wrote:

 Okay Accumulators, I have a minor rant about the CHANGES files in
 Accumulo, and I want to get feedback on this file from the user@ and
 dev@ lists.

 The summary is:

 * I think this CHANGES file is nearly worthless, and a release manager
 shouldn't have to bother with it. We should just delete it.


 +1

 We could drop the file from releases and have a link to a jira query in
 the release notes on the web site.



 The justification is:

 * The CHANGES file is tedious to prepare (requires manual copy/paste
 from JIRA, after clicking the right buttons in the right order).
 * We now have release notes which compliment the full JIRA search and
 git history, to highlight particular changes, which is far more
 useful.
 * The file is just so big and contains material of questionable
 utility (do we really need to enumerate all sub-tasks for each issue,
 especially when they aren't even grouped with the parent issue?)
 * It's very easy for the CHANGES file to be wrong, by either including
 a JIRA issue which was incorrectly marked, or by omitting an issue
 which was inadvertently left open. The release manager can triage
 these things, but that's a lot of extra work, and it doesn't seem to
 matter whether it is actually wrong or not (it has been wrong in the
 past, and nobody has ever voiced a complaint or indicated any concern
 at all).
 * The CHANGES file is ugly. It follows no markup standard to render it
 in a presentable way (Markdown, APT, asciidoc, etc.). Any
 prettification must be done manually.
 * Issue numbers and subject lines rarely convey adequate information
 to satisfy curious readers wishing to inform themselves of what
 changed. Looking at the actual JIRA issues is necessary to do that,
 and these links are not clickable.
 * Because it is generated from the fixVersion in JIRA, it's often the
 case that we must omit useful fixVersions from JIRA in order to avoid
 confusing inclusions in the CHANGES file (like the JIRA pertaining to
 the release itself). And sometimes people add/remove the wrong
 fixVersion. We can fix this later when we discover it, but it's
 usually too late for the CHANGES file already bundled in a release.
 * Updating the CHANGES file creates unnecessary commits which are
 tedious and painful to merge forward (and usually risky, because it
 would involve -sours type merges) and pollute the git history without
 much use.
 * The convention for a CHANGES file seems to be born of an era prior
 to ubiquitous version control, and I don't think having one is
 required in any way.

 Sure, we could automate generating this file (maybe?), which would
 alleviate some of the burden. However, many of these problems would
 still exist, and in the end, I'm not really sure what the benefits
 are. It doesn't seem to be that useful, and especially not compared to
 the amount of work it takes to maintain it. Instead of deleting it, we
 could leave it in place with a generic comment referring the user to
 JIRA and git. But, even that seems to be unnecessary (these resources
 are already prominently linked on the Accumulo site and in the project
 pom.xml in the official source release, and it is already well
 understood that a project is going to have an SCM history and an issue
 tracker).

 But, what do you think? Is this file really useful to anybody? Does
 its utility outweigh the burden it places on release managers, which
 can slow down and complicate the release process?

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii





Re: admin and web dashboard

2015-06-08 Thread Mike Drob
Also might need to run a 'flush -t $table -w'

On Mon, Jun 8, 2015 at 1:39 PM, Josh Elser josh.el...@gmail.com wrote:

 Since 1.5, All of Accumulo files are stored in HDFS: RFiles and WALs.

 Tables have the name you provide, but also maintain an internal unique ID
 to make operations like renaming easy. You can see this mapping via `tables
 -l` in the Accumulo shell.

 Given the ID for a table, you should be able to find all rfiles for a
 table /accumulo/tables/$id/**/*.rf. If you don't see any rfiles there,
 run a `compact -t $table -w` and then check HDFS again.


 z11373 wrote:

 That makes sense, thanks Josh!
 Btw, where can I find the .rf files? I looked at under Accumulo install
 folder and also /tmp, and couldn't find them. I also look at hdfs, and
 only
 found the folder, i.e. /accumulo/tables/n/default_tablet (where 'n' is a
 number), and no files under that hdfs dir. I want to try the command
 'accumulo rfile-info' you mentioned earlier.

 Thanks again,
 zainal



 --
 View this message in context:
 http://apache-accumulo.1065345.n5.nabble.com/admin-and-web-dashboard-tp14347p14351.html
 Sent from the Users mailing list archive at Nabble.com.




Re: Possible information leak

2015-05-08 Thread Mike Drob
Value will contain whatever the user provided on the command line, so
printing it back out to them shouldn't result in exposing something secret.

On Fri, May 8, 2015 at 12:29 PM, Rodrigo Andrade rodrigo...@gmail.com
wrote:

 Hi,

 In this commit:

 https://github.com/apache/accumulo/commit/27d79c2651277c465a497c68ec238771692a6fa0

 Does value contain private information?

 Regards,
 Rodrigo



Re: Unassigned, but not offline, tablets

2015-05-05 Thread Mike Drob
What version?

Could be
https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/troubleshooting.txt#L314

On Tue, May 5, 2015 at 8:54 AM, Bill Slacum wsla...@gmail.com wrote:

 After a catasrophic failure, the Master Server section of the monitor =
 will report that there are 16 unassigned tablets (out of thousands), but =
 no table shows any offline tablets.=20

 There were corrup files under the recovery directory. These were =
 removed.

 Otherwise, things seem fine with the cluster (we are having ingest =
 processes hang, which may or may not be related).

 What should I do, as an operator, when Accumulo is in this state?

 I have no logs provide, unfortunately.



Re: Q4A Project

2015-04-27 Thread Mike Drob
Andrew,

This is a cool thing to work on, I hope you have great success!

A couple of questions about the motivations behind this, if you don't mind -
- There are several SQL implementations already in the Hadoop ecosystem. In
what ways do you expect this to improve upon
Hive/Impala/Phoenix/Presto/Spark SQL? I haven't looked at the code, so it
is quite possible you're already using one of those technologies.
- In a conversation with some HP engineers earlier this year, they
mentioned that building a SQL-92 layer is the easy part, and that a mature
optimization engine is the really hard part. This is where Oracle may still
be leaps and bounds ahead of its nearest competitors. Do you have plans for
a query planner? If not, you might be back to writing MapReduce jobs sooner
than you think.

Look forward to seeing more!

Mike

On Mon, Apr 27, 2015 at 7:37 PM, Andrew Wells awe...@clearedgeit.com
wrote:

 I have been working on a project, tentatively called Q4A (Query for
 Accumulo). Another possible name is ASQ (Accumulo Streaming Query) [discus].

 This is a streaming query as the query is completed via a stream, should
 never group data in memory. To batch, intermediate results would be written
 back to Accumulo temporarily.


 The *primary goal* is to have a complete SQL implementation native to
 Accumulo.

 *Why do this?*
 I am getting tired of writing bad java code to query a database. I would
 rather write bad SQL code. Also, people should be able to get queries out
 faster and it shouldn't take a developer.


 *Native To Accumulo*:

- There should be no special format to read a database created by Q4A
- There should be no special format for Q4A to query a table
- All tables are tables available to Q4A
- Any special tables, are stored away from the users databases
(indexes, column definitions, etc)

 *Other Goals*:

- Implement the entire SQL definition (currently all of SQLite)
- Create JDBC Driver/Server
- Push down Expressions to the Tablet Servers
- Install-less queries, use Q4A jar directly against any Accumulo
Cluster ( less push-down expressions)
- documentation :o
- testing ;)

 *Does it work?*
 Not yet, the project is still a work in progress. and I will be working on
 it at the Accumulo Summit this year. Progress is slow as I am getting
 married in about a month and some change.

 *Questions:*
 If you have questions about Q4A as here, I will be at the Accumulo Summit
 @ ClearEdgeIT Table and Hackathon.

 *WHERE IS TEH LINK?!1!*
 Oh here: https://github.com/agwells0714/q4a

 --
 *Andrew George Wells*
 *Software Engineer*
 *awe...@clearedgeit.com awe...@clearedgeit.com*




Re: Approach to hold the output of an iterator in memory to do further operations

2015-04-27 Thread Mike Drob
Check out the MinCombiner

https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/MinCombiner.java

On Mon, Apr 27, 2015 at 12:19 PM, vaibhav thapliyal 
vaibhav.thapliyal...@gmail.com wrote:

 Hello everyone.

 I am trying to carry out max and min kind of operations using accumulo.

 But since the Accumulo iterators only operate on the entries that are
 lovally hosted I get the local max and local min of the instead of a global
 max and min.

 To get this global max and min, I have to calculate this client side.  I
 want to ask if there is some way to store this local max and min in memory
 using iterator. So that a global max and min can be calculated server side
 only.

 I tried to this by writing the result in another table and using another
 iterator to return me the global max and min.

 I want to ask if there if a way to store this in memory so as to avoid
 writing it to a table?

 Thanks
 Vaibhav



Re: Accumulo 1.6.2 with Hadoop 2.2.0 Installation issues

2015-03-12 Thread Mike Drob
Can you verify that once the processes started, they stayed up?

ps -C java -fww | grep accumulo

Also check your log directory for .err files

On Thu, Mar 12, 2015 at 9:53 AM, Madabhattula Rajesh Kumar 
mrajaf...@gmail.com wrote:

 Hi Team,

 I'm not able to login into the accumlo shell. It is giving  There are no
 tablet servers: check that zookeeper and accumulo are running. Could you
 please help me how to resolve this issue.

 *rajesh@rajesh-VirtualBox:~/accumulo-1.6.2$ ./bin/start-all.sh *
 Starting monitor on localhost
 WARN : Max open files on localhost is 1024, recommend 32768
 Starting tablet servers  done
 Starting tablet server on localhost
 WARN : Max open files on localhost is 1024, recommend 32768
 OpenJDK 64-Bit Server VM warning: You have loaded library
 /home/rajesh/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have
 disabled stack guard. The VM will try to fix the stack guard now.
 It's highly recommended that you fix the library with 'execstack -c
 libfile', or link it with '-z noexecstack'.
 2015-03-12 18:30:31,722 [util.NativeCodeLoader] WARN : Unable to load
 native-hadoop library for your platform... using builtin-java classes where
 applicable
 2015-03-12 18:30:35,779 [fs.VolumeManagerImpl] WARN :
 dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is
 possible on hard system reset or power loss
 2015-03-12 18:30:35,791 [server.Accumulo] INFO : Attempting to talk to
 zookeeper
 2015-03-12 18:30:36,036 [server.Accumulo] INFO : ZooKeeper connected and
 initialized, attempting to talk to HDFS
 2015-03-12 18:30:36,328 [server.Accumulo] INFO : Connected to HDFS
 Starting master on localhost
 WARN : Max open files on localhost is 1024, recommend 32768
 Starting garbage collector on localhost
 WARN : Max open files on localhost is 1024, recommend 32768
 Starting tracer on localhost
 WARN : Max open files on localhost is 1024, recommend 32768
 *rajesh@rajesh-VirtualBox:~/accumulo-1.6.2$ ./bin/accumulo shell -u root*
 OpenJDK 64-Bit Server VM warning: You have loaded library
 /home/rajesh/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have
 disabled stack guard. The VM will try to fix the stack guard now.
 It's highly recommended that you fix the library with 'execstack -c
 libfile', or link it with '-z noexecstack'.
 2015-03-12 18:32:43,567 [util.NativeCodeLoader] WARN : Unable to load
 native-hadoop library for your platform... using builtin-java classes where
 applicable
 Password: **
 2015-03-12 18:32:52,533 [impl.ServerClient] WARN : There are no tablet
 servers: check that zookeeper and accumulo are running.

 Regards,
 Rajesh



Re: Failed to connect to zookeeper within 2x ZK timeout period 30000

2015-02-02 Thread Mike Drob
Can you verify that zookeeper is running and accepting connections?

nc [zk-host] [zk-port]
 stat

And see that it does not result in error.

On Mon, Feb 2, 2015 at 2:58 PM, Wyatt Frelot wyatt.fre...@altamiracorp.com
wrote:

  Good afternoon all,

  I just literally started having this problem on Friday. My code worked
 previously (1mo ago) but I came back to it and I have not been able to
 resolve this problem since I have started experiencing it. So, I am seeking
 guidance and assistance.

  I have a Vagrant cluster setup with the following environment:
 Hadoop 2.4.1, ZK 3.3.6, Accumulo 1.6.1, and Java 7

  I am able to ping the zookeeper node and there appears to be nothing in
 the ZK logs nor the Accumulo logs…I am not sure where to go from here.

  This is the only error that I can find:

  Exception in thread main
 org.apache.accumulo.core.client.AccumuloException:
 java.lang.RuntimeException: Failed to connect to zookeeper (mnode) within
 2x zookeeper timeout period 3
  at org.apache.accumulo.core.client.impl.ServerClient.execute(
 ServerClient.java:67)
  at org.apache.accumulo.core.client.impl.ConnectorImpl.init(
 ConnectorImpl.java:70)
  at org.apache.accumulo.core.client.ZooKeeperInstance.getConnector(
 ZooKeeperInstance.java:240)
  at accumulo101.solutions.writing.TableAdministration.main(
 TableAdministration.java:66)
  Caused by: java.lang.RuntimeException: Failed to connect to zookeeper
 (mnode) within 2x zookeeper timeout period 3
  at org.apache.accumulo.fate.zookeeper.ZooSession.connect(
 ZooSession.java:117)
  at org.apache.accumulo.fate.zookeeper.ZooSession.getSession(
 ZooSession.java:161)
  at org.apache.accumulo.fate.zookeeper.ZooReader.getSession(
 ZooReader.java:35)
  at org.apache.accumulo.fate.zookeeper.ZooReader.getZooKeeper(
 ZooReader.java:39)
  at org.apache.accumulo.fate.zookeeper.ZooCache.getZooKeeper(
 ZooCache.java:58)
  at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:150)
  at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:277)
  at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:224)
  at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(
 ZooKeeperInstance.java:161)
  at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:38)
  at org.apache.accumulo.core.client.impl.ServerClient.getConnection(
 ServerClient.java:128)
  at org.apache.accumulo.core.client.impl.ServerClient.getConnection(
 ServerClient.java:118)
  at org.apache.accumulo.core.client.impl.ServerClient.getConnection(
 ServerClient.java:113)
  at org.apache.accumulo.core.client.impl.ServerClient.executeRaw(
 ServerClient.java:95)
  at org.apache.accumulo.core.client.impl.ServerClient.execute(
 ServerClient.java:61)
  ... 3 more




Re: why a error about replicated

2015-01-22 Thread Mike Drob
Has this error come up before? Is there room for us to intercept that stack
trace and provide a check that HDFS has space left message? This might be
especially relevant after we;ve removed the hadoop info box on the monitor.

On Thu, Jan 22, 2015 at 8:30 AM, Josh Elser josh.el...@gmail.com wrote:

 How much free space do you still have in HDFS? If hdfs doesn't have enough
 free space to make the file, I believe you'll see the car that you have
 outlined. The way we create the file will also end up requiring at least
 one GB with the default configuration.

 Also make sure to take into account any reserved percent of hdfs when
 considering the hdfs usage.
 On Jan 22, 2015 1:46 AM, Lu.Qin luq.j...@gmail.com wrote:


 Hi,I have a Accumulo clusters and it run 10 days ,but it show me many
 errors now.

 2015-01-22 13:04:21,161 [hdfs.DFSClient] WARN : Error while syncing
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
 /accumulo/wal/+9997/226dce4f-4e14-4704-b811-532afe0b0fb3 could only be
 replicated to 0 nodes instead
  of minReplication (=1).  There are 3 datanode(s) running and no node(s)
 are excluded in this operation.
 at
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1471)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2791)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:606)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

 at org.apache.hadoop.ipc.Client.call(Client.java:1411)
 at org.apache.hadoop.ipc.Client.call(Client.java:1364)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:368)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1449)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1270)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526)

 I use hadoop fs to put a file into hadoop ,and it works good,and the file
 has 2 replicates.Why accumulo can not work ?

 And I see there are so many file only 0B in /accumulo/wal/***/,why?

 Thanks.




Re: TableOperations.setProperty() not setting property

2015-01-08 Thread Mike Drob
Ara,

There is sometimes propogation delay in setting the properties, since they
have to go through zookeeper and then out to the tablet servers.

Try waiting 30 or 60 seconds before checking, and see if that changes
things.

Mike

On Thu, Jan 8, 2015 at 6:07 PM, Ara Ebrahimi ara.ebrah...@argyledata.com
wrote:

 Hi,

 I’m trying to set a few properties for a table “programmatically” right
 after I create that table (using accumulo shell). I use
 TableOperations.setProperty(). But then when I use the config command from
 accumulo shell I don’t see anything reflected there. It still holds the old
 default/site values. If I pass invalid values it does fail, so seems like
 it actually receives the request and validates it. But it does’t seem to
 persist it. I’ve tried a few different properties and none seem to stick.
 Do I need to flush the config somehow?

 I assume I can set these properties right after creating table only. I
 mean, what happens if I set a property like table.split.threshold after
 populating the table? There’s no command for setting these properties from
 accumulo shell other than the option to copy config from another table when
 executing create table.

 Thanks,
 Ara.



 

 This message is for the designated recipient only and may contain
 privileged, proprietary, or otherwise confidential information. If you have
 received it in error, please notify the sender immediately and delete the
 original. Any other use of the e-mail by you is prohibited. Thank you in
 advance for your cooperation.

 



Re: Recursive Import Directory

2014-11-25 Thread Mike Drob
Ariel,

There is not an easy way to do this recursively. Your best option is going
to be writing your own wrapper around the import command. If you're using
shell commands, this could be as easy as feeding the results of 'find .
-type d' into a script, or in Java you might want to look at
DirectoryWalker in Apache Commons as possible solutions.

Mike

On Tue, Nov 25, 2014 at 10:22 AM, Ariel Valentin ar...@arielvalentin.com
wrote:

 Hello!

 We are running a couple of experiments using importDirectory and are
 curious if there is a simple way to import directories recursively. Based
 on looking at the source code it does not look like it currently supports
 that feature:

 (
 https://github.com/apache/accumulo/blob/1835c27ca41426ddd570cde14f9612c45680b917/core/src/main/java/org/apache/accumulo/core/client/admin/TableOperationsImpl.java
 )

 Are there plans to add it in the future? Or is there a simple way to do
 this right now?

 Thanks,
 Ariel Valentin
 e-mail: ar...@arielvalentin.com
 website: http://blog.arielvalentin.com
 skype: ariel.s.valentin
 twitter: arielvalentin
 linkedin: http://www.linkedin.com/profile/view?id=8996534
 ---
 *simplicity *communication
 *feedback *courage *respect



Re: Recursive Import Directory

2014-11-25 Thread Mike Drob
Name collision of failures and I think name collision of successes might
cause problems sometimes too. Or maybe that's just with older versions.
Regardless, having to write your own code puts it out of the realm of easy
into at least middling territory - if import directory could natively
handle recursion then it would become easy.

On Tue, Nov 25, 2014 at 10:44 AM, Josh Elser josh.el...@gmail.com wrote:

 What's the difficulty, Mike? Handling name collision of failures?

 Mike Drob wrote:

 Ariel,

 There is not an easy way to do this recursively. Your best option is
 going to be writing your own wrapper around the import command. If
 you're using shell commands, this could be as easy as feeding the
 results of 'find . -type d' into a script, or in Java you might want to
 look at DirectoryWalker in Apache Commons as possible solutions.

 Mike

 On Tue, Nov 25, 2014 at 10:22 AM, Ariel Valentin
 ar...@arielvalentin.com mailto:ar...@arielvalentin.com wrote:

 Hello!

 We are running a couple of experiments using importDirectory and are
 curious if there is a simple way to import directories recursively.
 Based on looking at the source code it does not look like it
 currently supports that feature:

 (https://github.com/apache/accumulo/blob/
 1835c27ca41426ddd570cde14f9612c45680b917/core/src/main/java/
 org/apache/accumulo/core/client/admin/TableOperationsImpl.java)

 Are there plans to add it in the future? Or is there a simple way to
 do this right now?

 Thanks,
 Ariel Valentin
 e-mail: ar...@arielvalentin.com mailto:ar...@arielvalentin.com
 website: http://blog.arielvalentin.com
 skype: ariel.s.valentin
 twitter: arielvalentin
 linkedin: http://www.linkedin.com/profile/view?id=8996534
 ---
 *simplicity *communication
 *feedback *courage *respect





Re: comparing different rfile densities

2014-11-11 Thread Mike Drob
I'm not sure how to quantify this and give you a way to verify, but in my
experience you want to be producing rflies that load into a single tablet.
Typically, this means number of reducers equal to the number of tablets in
the table that you will be importing and perhaps a custom partitioner. I
think your intuition is spot on, here.


Of course, if that means that you have a bunch of tiny files, then maybe
it's time to rethink your split strategy.

On Tue, Nov 11, 2014 at 5:56 AM, Jeff Turner sjtsp2...@gmail.com wrote:

 is there a good way to compare the overall system effect of
 bulk loading different sets of rfiles that have the same data,
 but very different densities?

 i've been working on a way to re-feed a lot of data in to a table,
 and have started to believe that our default scheme for creating
 rfiles - mapred in to ~100-200 splits, sampled from 50k tablets -
 is actually pretty bad.  subjectively, it feels like rfiles that span
 300 or 400 tablets is bad in at least two ways for the tservers -
 until the files are compacted, all of the potential tservers have
 to check the file, right?  and then, during compaction, do portions
 of that rfile get volleyed around the cloud until all tservers
 have grabbed their portion?  (so, there's network overhead, repeatedly
 reading files and skipping most of the data, ...)

 if my new idea works, i will have a lot more control over the density
 of rfiles, and most of them will span just one or two tablets.

 so, is there a way to measure/simulate overall system benefit or cost
 of different approaches to building bulk-load data (destined for an
 established table, across N tservers, ...)?

 i guess that a related question would be are 1000 smaller and denser
 bulk files better than 100 larger bulk files produced under a typical
 getSplits() scheme?

 thanks,
 jeff



Re: Accumulo version at runtime?

2014-10-23 Thread Mike Drob
Unfortunately, I don't think we have a way to do this. Are you trying to
check for the existence of a particular feature, or what is your goal?

On Thu, Oct 23, 2014 at 6:44 PM, Dylan Hutchison dhutc...@stevens.edu
wrote:

 Easy question Accumulators:

 Is there an easy way to grab the version of a running Accumulo instance
 programmatically from Java code in a class that connects to the instance?

 Something like:

 Instance instance = new
 ZooKeeperInstance(instanceName,zookeeper_address);
 String version = instance.getInstanceVersion();


 Thanks, Dylan

 --
 www.cs.stevens.edu/~dhutchis



Re: Removing 'accumulo' from Zookeeper

2014-10-02 Thread Mike Drob
Michael,

These are great ZK instructions. Have you considered contributing them to
the project upstream? We can converse about this off-list if you'd prefer,
since it's not particularly germane to this topic.

Mike

On Thu, Oct 2, 2014 at 12:50 PM, Michael Allen mich...@sqrrl.com wrote:

 I cut and paste a little fast there at the end, so obviously no one
 outside of Sqrrl has the zk-digest.sh script.  Here's that in all its
 gory detail:

 #!/bin/bash

 if [ -z ${ZOOKEEPER_HOME} ]; then
 echo Set \$ZOOKEEPER_HOME before running this script
 exit 4747
 fi

 if [ -z ${JAVA_HOME} ]; then
 echo Set \$JAVA_HOME before running this script
 exit 4747
 fi

 if [ $# -eq 0 ]; then
 echo usage: zk-digest.sh digest string
 echo 
 echo   Utility to produce authentication digests, such as you might see
 in ZooKeeper node ACL entries
 echo 
 echo   Example: zk-digest.sh sqrrl:secret
 exit 4747
 fi

 ZK_CLASSPATH=\
 ${ZOOKEEPER_HOME}/build/classes:\
 ${ZOOKEEPER_HOME}/build/lib/*.jar:\
 ${ZOOKEEPER_HOME}/lib/slf4j-log4j12-1.6.1.jar:\
 ${ZOOKEEPER_HOME}/lib/slf4j-api-1.6.1.jar:\
 ${ZOOKEEPER_HOME}/lib/netty-3.2.2.Final.jar:\
 ${ZOOKEEPER_HOME}/lib/log4j-1.2.15.jar:\
 ${ZOOKEEPER_HOME}/lib/jline-0.9.94.jar:\
 ${ZOOKEEPER_HOME}/zookeeper-3.4.5.jar:\
 ${ZOOKEEPER_HOME}/src/java/lib/*.jar:\
 ${ZOOKEEPER_HOME}/conf\
 

 ${JAVA_HOME}/bin/java -Dzookeeper.log.dir=. \
 -Dzookeeper.root.logger=INFO,CONSOLE \
 -cp ${ZK_CLASSPATH} \
 -Dcom.sun.management.jmxremote \
 -Dcom.sun.management.jmxremote.local.only=false \
 org.apache.zookeeper.server.auth.DigestAuthenticationProvider $*

 On Thu, Oct 2, 2014 at 1:48 PM, Michael Allen mich...@sqrrl.com wrote:

 Hi Ranjan.  If you're doing this on your own development node, or a
 production node you're in full control of, you can add a root password to
 ZooKeeper in order to blow away any nodes you like. Here's a little writeup
 I did about it:

 ZooKeeper has security features built into it by way of access control
 lists (ACLs) on nodes.  Once set, these ACLs can be very hard to get rid
 of, especially if errant code has set up nodes that you no longer have any
 password for.  This how-to guide shows you how to set up a root user inside
 of ZooKeeper that can wipe out any ACLed node.
 Step-by-step guide



1. Stop your currently running ZooKeeper.  This is either a direct 
 $ZOOKEEPER_HOME/bin/zkServer.sh
stop command or a sudo service zookeeper-server stop command on some
systest boxes.
2.

Edit zkServer.sh and in the following section:

start)
echo  -n Starting zookeeper ... 
if [ -f $ZOOPIDFILE ]; then
  if kill -0 `cat $ZOOPIDFILE`  /dev/null 21; then
 echo $command already running as process `cat $ZOOPIDFILE`.
 exit 0
  fi
fi
nohup $JAVA -Dzookeeper.log.dir=${ZOO_LOG_DIR} 
 -Dzookeeper.root.logger=${ZOO_LOG4J_PROP} \
-cp $CLASSPATH $JVMFLAGS $ZOOMAIN $ZOOCFG  $_ZOO_DAEMON_OUT 
 21  /dev/null 

Add the line 
 -Dzookeeper.DigestAuthenticationProvider.superDigest=super:lK75jTNcA+U9vtVEw5vB51mj/w4=
\ within the $JAVA invocation such that the resulting section looks
like this:

start)
echo  -n Starting zookeeper ... 
if [ -f $ZOOPIDFILE ]; then
  if kill -0 `cat $ZOOPIDFILE`  /dev/null 21; then
 echo $command already running as process `cat $ZOOPIDFILE`.
 exit 0
  fi
fi
nohup $JAVA -Dzookeeper.log.dir=${ZOO_LOG_DIR} 
 -Dzookeeper.root.logger=${ZOO_LOG4J_PROP} \

 -Dzookeeper.DigestAuthenticationProvider.superDigest=super:lK75jTNcA+U9vtVEw5vB51mj/w4=
  \
-cp $CLASSPATH $JVMFLAGS $ZOOMAIN $ZOOCFG  $_ZOO_DAEMON_OUT 
 21  /dev/null 

3. Start ZooKeeper again.
4. Log into ZooKeeper via zkCli.sh
5. Declare yourself the root user with the following addauth command:

addauth digest super:secret

6. You should now be able to delete any node and/or change any ACL
within the ZooKeeper system.


 Note that you should *NOT* set this setting up on any production
 system.  If you need to set up a root user on a production system, you need
 to create a different digest (the super:lK75jTNcA+U9vtVEw5vB51mj/w4=stuff
 above is a digest) linked to a better password than secret.  To make
 your own digest, use the $SQRRL_HOME/tools/useful-scripts/zk-digest.sh
  script.

 On Thu, Oct 2, 2014 at 11:39 AM, Keith Turner ke...@deenlo.com wrote:

 Accumulo will work properly if you do not clean it before installing,
 because each time you init Accumulo it stores the information for the new
 instance under a new random uuid.  For the purpose of cleaning out old
 UUIDs, its possible each old UUID could have been created with a different
 password.   Maybe thats what happening in your case?  I can not remember if
 the syntax of your addauth command is correct.


 On Wed, Oct 1, 2014 at 11:06 PM, Ranjan Sen ranjan_...@hotmail.com
 wrote:

 Let me describe the scenario. Accumulo was installed 

Re: rf_tmp file [SEC=UNOFFICIAL]

2014-08-07 Thread Mike Drob
Which version of Accumulo are you seeing these files in?

They should be getting cleaned up automatically after
https://issues.apache.org/jira/browse/ACCUMULO-1452 was added to 1.4.5,
1.5.1, and 1.6.0.

The brief explanation of their purpose is that they are the temporary files
for minor/major compactions while the data is still being written and then
they are moved in to place.

Mike


On Wed, Aug 6, 2014 at 11:34 PM, Dickson, Matt MR 
matt.dick...@defence.gov.au wrote:

  *UNOFFICIAL*
 What purpose does the .rf_tmp file on hdfs serve?  Its often 0kb size
 and there appear to be a lot of these older than the table ageoff filter.

 Can we safely remove these?

 Thanks in advance,
 Matt



Re: accumulo 1.6 and HDFS non-HA conversion to HDFS HA

2014-08-05 Thread Mike Drob
Hi Craig!

Part of the HA transition is described at
https://issues.apache.org/jira/browse/ACCUMULO-2793 although you'll have to
read through the comments to get the actual steps. I don't have a concise
summary of what needs to be done because I haven't had a chance to try it
myself.

Mike


On Tue, Aug 5, 2014 at 12:06 PM, craig w codecr...@gmail.com wrote:

 I've setup an Accumulo 1.6 cluster with Hadoop 2.4.0 (with a secondary
 namenode). I wanted to convert the secondary namenode to be a standby
 (hence HDFS HA).

 After getting HDFS HA up and making sure the hadoop configuration files
 were accessible by Accumulo, I started up Accumulo. I noticed some reports
 of tablet servers failing to connect, however, they were failing to connect
 to HDFS over port 9000. That port is not configured/used with HDFS HA so
 I'm unsure why they are still trying to talk to HDFS using the old
 configuration.

 Any thoughts ideas? I know Accumulo 1.6 works with HDFS HA, but I'm
 curious if the tests have ever been run against a non-HA cluster that was
 converted to HA (with data in it).

 --

 https://github.com/mindscratch
 https://www.google.com/+CraigWickesser

 https://twitter.com/mind_scratch

 https://twitter.com/craig_links




Re: Do Accumulo 1.5.1 and 1.4.4 work with ZooKeeper 3.4.5?

2014-07-31 Thread Mike Drob
I've seen several vendors offering newer versions of zookeeper with
Accumulo without issue. Cloudera has tested versions not too far off from
Accumulo 1.4.5 and Accumulo 1.6.0 with CDH4, which uses ZK 3.4.5.
Similarly, I just checked Hortonworks' documents on HDP 2.1 and that
includes both Accumulo 1.5.1 and ZK 3.4.5.

I think the general answer is that yes, it will work. From what I've seen
the ZK team does a good job of semantic versions, and minor releases should
be backwards compatible. I've not seen any testing on ZK 3.5, but that will
likely be fine too.

Mike


On Thu, Jul 31, 2014 at 9:42 AM, Hunter Provyn f...@ccri.com wrote:

 We are contemplating upgrading to ZooKeeper 3.4.5 for a project that
 depends on Accumulo 1.5.1 and 1.4.4.

 In 1.5.1 pom there is the note:
 !-- ZooKeeper 3.4.x works also, but we're not using new features yet;
 this ensures 3.3.x compatibility. --

 In the 1.4.4 pom there is the dependency on 3.3.1 with no note.

 Are there are any known issues with using ZooKeeper 3.4.x with 1.4.4?
 Is it known to work or not work given that no new ZK features are being
 used?

 Thanks!



Re: loaded family in MEATADATA table [SEC=UNOFFICIAL]

2014-07-31 Thread Mike Drob
Filed a JIRA to update the docs, thanks for pointing this out to us, Matt!

https://issues.apache.org/jira/browse/ACCUMULO-3032


On Thu, Jul 31, 2014 at 1:32 AM, Sean Busbey bus...@cloudera.com wrote:

 those are the markers that a tablet server has bulk loaded:


 https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/metadata/schema/MetadataSchema.java#L138


 On Wed, Jul 30, 2014 at 11:05 PM, Dickson, Matt MR 
 matt.dick...@defence.gov.au wrote:

  *UNOFFICIAL*
 Investigating the metadata table I have noticed a family of 'loaded' and
 looking at all the doco, including the new OReilly Accumulo book, there is
 no description of it.

 The column looks like

 *table_id*;*row **loaded*;*rfile_path value*

 Any insights would be much appreciated.

 Thanks in advance,
 Matt




 --
 Sean



Re: Request for Configuration Help for basic test. Tservers dying and only one tablet being used

2014-07-29 Thread Mike Drob
You should double-check your data, you might find that it's null padded or
something like that which would screw up the splits. You can do a scan from
the shell which might give you hints.


On Tue, Jul 29, 2014 at 3:53 PM, Pelton, Aaron A. aaron.pel...@gd-ais.com
wrote:

 I agree with the idea of pooling the writers.

 As for the discussion of the keys. I get what you are saying with choosing
 better keys for distribution based on frequency of the chars in the English
 language. But, for this test I'm just using apache RandomStringUtils to
 create a 2 char random alpha sequence to prepend, so it should be a
 moderately distributed sampling of chars. However, let me emphasize that I
 mean I'm seeing 1 tablet getting millions of entries in it, compared to the
 remaining 35 tablets having no entries or just like 1k. To me that says
 something isn't right.


 -Original Message-
 From: Josh Elser [mailto:josh.el...@gmail.com]
 Sent: Tuesday, July 29, 2014 4:20 PM
 To: user@accumulo.apache.org
 Subject: Re: Request for Configuration Help for basic test. Tservers dying
 and only one tablet being used

 On 7/29/14, 3:20 PM, Pelton, Aaron A. wrote:
  To followup to two of your statements/questions:
 
  1. Good, pre-splitting your table should help with random data, but if
 you're only writing data to one tablet, you're stuck (very similar to
 hot-spotting reducers in MapReduce jobs).
 
  - OK so its good that the data is presplitting, but maybe this is
 conceptually something that I'm not grasping about accumulo yet, but I
 thought specifying the pre-splits is what causes the table to span multiple
 tablets on the various tserver initially.  However, the core of the data
 appears to be in one specific tablet on on tserver. Each tserver appears to
 have a few tablets allocated to it for the table I'm working out of. So,
 I'm confused as to how to get the data to write to more than just the one
 tablet/partition.  I would almost think my keys I specified aren't being
 matched correctly against incoming data then?

 No, it sounds like you have the idea correctly. Many tablets make up a
 table, the split points for a table are what defines those tablet
 boundaries. Consider you have a table where the rowID are English words (
 http://en.wikipedia.org/wiki/Letter_frequency#Relative_frequencies_of_the_first_letters_of_a_word_in_the_English_language
 ).

 If you split your table on each letter (a-z), you would still see much
 more activity to the tablets which host words starting with 'a', 't', and
 's' because you have significantly more data being ingested into those
 tablets.

 When designing a table (specifically the rowID of the key), it's desirable
 to try to make the rowID as distributed as possible across the entire
 table. This helps ensure even processing across all of your nodes. Does
 that make sense?

  2. What do you actually do when you receive an HTTP request to write to
 Accumulo. It sounds like you're reading data and then writing? Is each HTTP
 request creating its own BatchWriter? More insight to what a write looks
 like in your system (in terms of Accumulo API calls) would help us make
 recommendations about more efficient things you can do.
 
  Yes each http request gets its own reference to a writer or scanner,
 which is closed when thre result is returned from the http request.  There
 are two rest services. One transforms the data and preforms some indexes
 based on it and then sends both data and index to a BatchWriter. The sample
 code for the data being written is below. The indexes being written are
 similar but use different family and qualifier values.
 
   Text rowId = new Text(id + : + time);
   Text fam = new Text(COLUMN_FAMILY_KLV);
   Text qual = new Text();
   Value val = new Value(data.getBytes());
 
   Mutation mut = new Mutation(rowId);
   mut.put(fam, qual, val);
 
   long memBuf = 1_000_000L;
   long timeout = 1000L;
   int numThreads = 10;
 
   BatchWriter writer = null;
   try
   {
   writer = conn.createBatchWriter(TABLE_NAME, memBuf,
 timeout, numThreads);
   writer.addMutation(mut);
   }
   catch (Exception x)
   {
   // x.printStackTrace();
   logger.error(x.toString(), x);
   result = ERROR;
   }
   finally
   {
   try
   {
   if (writer != null)
   {
   writer.close();
   }
   }
   catch (Exception x)
   {
   // x.printStackTrace();
   logger.error(x.toString(), x);
   result = ERROR;
   }
   }

 You could try to make a threadpool for BatchWriters instead of creating a
 new one for each HTTP thread. This might help amortize the RPC cost by
 sending more than one mutation 

Re: Forgot SECRET, how to delete zookeeper nodes?

2014-07-13 Thread Mike Drob
Another option would have been to pick a different instance name when
rebuilding your cluster. Not that it helps you much now...


On Sun, Jul 13, 2014 at 11:28 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:

 Thanks for the help. I think I might better re-ingest the data I need. :(

 Jianshi


 On Mon, Jul 14, 2014 at 12:05 PM, Sean Busbey bus...@cloudera.com wrote:

 If you want to recover the data stored in tables from the old instance,
 it'll be more straightforward to follow the advanced troubleshooting
 section of the user manual.

 In there is a what if zookeeper fails section:

 http://accumulo.apache.org/1.6/accumulo_user_manual.html#zookeeper_failure

 Take note of the caveats in that section about potential data issues.

 --
 Sean
 On Jul 13, 2014 11:02 PM, Vicky Kak vicky@gmail.com wrote:

 Here is the example about the import/export
 http://accumulo.apache.org/1.6/examples/export.html


 On Mon, Jul 14, 2014 at 9:27 AM, William Slacum 
 wilhelm.von.cl...@accumulo.net wrote:

 If the zookeeper data is gone, your best bet is try and identify which
 directories under /accumulo/tables points to which tables you had. You can
 then bulk import the files into a new instance's tables.


 On Sun, Jul 13, 2014 at 11:54 PM, Vicky Kak vicky@gmail.com
 wrote:

 I am not sure if the tables could be recovered seamlessly, the tables
 are stored in undelying hdfs.
 I was thinking of using
 http://accumulo.apache.org/1.6/examples/bulkIngest.html to recover
 the tables, the better would be if we could update the zookeeper data
 pointing to the existing hdfs table data.
 I don't have more information about it as of now, we need someone else
 to help us here.


 On Mon, Jul 14, 2014 at 9:06 AM, Jianshi Huang 
 jianshi.hu...@gmail.com wrote:

 It's too deleted... so the only option I have is to delete the
 zookeeper nodes and reinitialize accumulo.

 You're right, I deleted the zk nodes and now Accumulo complains
 nonode error.

 Can I recover the tables for a new instance?

 Jianshi


 On Mon, Jul 14, 2014 at 11:28 AM, Vicky Kak vicky@gmail.com
 wrote:

 Can't you get the secret from the corresponding accumulo-site.xml or
 this is too deleted?

 Deletion from the zookeeper should be done using the rmr /accumulo
 command, you will have to use zkCli.sh to use zookeeper client. I have 
 been
 doing this sometime back, have not used it recently.
 I would not recommend to delete the information in zookeeper unless
 there is not other option, you may loose the data IMO.



 On Mon, Jul 14, 2014 at 8:40 AM, Jianshi Huang 
 jianshi.hu...@gmail.com wrote:

 Clusters got updated and user home files lost... I tried to
 reinstall accumulo but I forgot the secret I put before.

 So how can I delete /accumulo in Zookeeper?

 Or is there a way to rename instance_id?

 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/





 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/







 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/



Re: How does Accumulo compare to HBase

2014-07-10 Thread Mike Drob
At the risk of derailing the original thread, I'll take a moment to explain
my methodology. Since each entry can be thought of as a 6 dimensional
vector (r, cf, cq, vis, ts, val) there's a lot of room for fiddling with
the specifics of it. YCSB gives you several knobs, but unfortunately, not
absolutely everything was tunable.

The things that are configurable:
- # of rows
- # of column qualifiers
- length of value
- number of operations per client
- number of clients

Things that are not configurable:
- row length (it's a long int)
- # of column families (constant at one per row)
- length of column qualifier (basically a one-up counter per row)
- visibilities

In all of my experiments, the goal was to keep data size constant. This can
be approximated by (number of entries * entry size). Number of entries is
intuitively rows (configurable) * column families (1) * columns qualifiers
per family (configurable), while entry size is key overhead (about 40
bytes) + configured length of value. So to keep total size constant, we
have three easy knobs. However, tweaking three values at a time produces
really messy data where you're not always going to be sure where the
causality arrow lies. Even doing two at a time can cause issues but then
the choice is between tweaking two properties of the data, or one property
of the data and the total size (which is also a relevant attribute).

Whew. So why did I use two different independent variables between the two
halves?

Partly, because I'm not comparing the two tests to each other, so they
don't have to be duplicative. I ran them on different hardware from each
other, with different number of clients, disks, cores, etc. There's no
meaningful comparisons to be drawn, so I wanted to remove the temptation to
compare results against each other. I'll admit that I might be wrong in
this regard.

The graphs are not my complete data sets. For the Accumulo v Accumulo
tests, we have about ten more data points varying rows and data size as
well. Trying to show three independent variables on a graph was pretty
painful, so they didn't make it into the presentation. The short version of
the story is that nothing scaled linearly (some things were better, some
things were worse) but the general trend lines were approximately what you
would expect.

Let me know if you have more questions, but we can probably start a new
thread for future search posterity! (This applies to everybody).

Mike


On Thu, Jul 10, 2014 at 9:26 AM, Kepner, Jeremy - 0553 - MITLL 
kep...@ll.mit.edu wrote:

 Mike Drob put together a great talk at the Accumulo Summit (
 http://www.slideshare.net/AccumuloSummit/10-30-drob) discussing Accumulo
 performance and HBase performance.  This exactly the kind of work the
 entire Hadoop community needs to continue to move forward.

 I had one question about the talk which I was wondering if someone might
 be able to shed light on.  In the Accumulo part of the talk the experiments
 varied #rows while keeping the #cols fixed, while in the Accumulo/HBase
 part of the the experiments varied #cols while keeping #rows fixed?


Re: Table entries in Accumulo removed ?

2014-06-25 Thread Mike Drob
More likely: Are you inserting data with visibility labels that your scan
user does not have?
Less likely, but possible: Are you pushing any kind of deletes? Do you have
an AgeOffIterator configured?

Mike


On Wed, Jun 25, 2014 at 2:21 PM, Sivan sivan...@gmail.com wrote:

 I'm using storm to push data into Accumulo and the number of emitted data
 is getting populated to Accumulo in real time . The table entities matches
 with the emitter count. But in course of time the entries are getting
 removed from Accumulo ? A scan on the table returns none .. And the UI
 shows 0 entries in the table ! What would be the possible reason ?
 Accumulo 1.5.1
 Storm 0.8.2
 Cdh4.5

 Thanks
 Sent from my iPhone


Re: Updating Metadata of a Table

2014-05-27 Thread Mike Drob
I'm not sure I understand what you are trying to do. Can you give us an
example and a use case?

The metadata table is just like any other table where you can do
inserts/deletes/etc.


On Tue, May 27, 2014 at 4:49 PM, Tiffany Reid tr...@eoir.com wrote:

  Or even via the Java API?   I haven’t found any examples to update the
 MetaData for a specific table, only read the current entries.



 *From:* Tiffany Reid
 *Sent:* Tuesday, May 27, 2014 5:39 PM
 *To:* user@accumulo.apache.org
 *Subject:* Updating Metadata of a Table



 Hi,



 Does anyone know how to go about updating the metadata for a table in
 Accumulo via shell command tool?  I’m using 1.4 and I cannot upgrade to the
 latest due to project requirements.



 Thanks,

 Tiffany





Re: RFiles not referenced in !METADATA [SEC=UNOFFICIAL]

2014-05-22 Thread Mike Drob
Is your GC running? It should be catching the unreferenced files.

I think you are safe to manually delete any files not references in the
!METADATA table.

What version of Accumulo are you running?


On Wed, May 21, 2014 at 9:00 PM, Dickson, Matt MR 
matt.dick...@defence.gov.au wrote:

  *UNOFFICIAL*
 I've run scan on hdfs under /accumulo/tables/table_id for all rfiles
 older than our ageoff filter on that table.  When I then scan for these
 rfiles in the metadata table most are not listed.

 Should all rfiles be referenced in the metadata table?  My goal had been
 to get the rowid from the metadata and then force a compaction on that
 range.  Eg for row   4n;234234234 file:/fdi-2342/234234.rfrun a
 compaction for 234234234 to 234234234~

 Thanks in advance.
 Matt





Re: Embedded Mutations: Is this kind of thing done?

2014-04-25 Thread Mike Drob
Large rows are only an issue if you are going to try to put the entire row
in memory at once. As long as you have small enough entries in the row, and
can treat them individually, you should be fine.

The qualifier is anything that you want to use to determine uniqueness
across keys. So yes, this sounds fine, although possibly not fine grain
enough.

Mike


On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts threadedb...@gmail.comwrote:

 Interesting, multiple mutations that is.  Are we talking multiples on the
 same row id?

 Upon reflection, I realized the embedded thing is nothing special.  I
 think I'll keep adding columns to a single mutation.  This will make for a
 wide row, but I'm not seeing that as a problem.  I am I being naive?

 Another question if I may.  As I walk my graph, I must keep track of the
 type of the value being persisted.  I am using the qualifier for this,
 putting in it a URI that indicates the type.  Is this a proper use for the
 qualifier?

 Thanks for the discussion


 On Thu, Apr 24, 2014 at 11:23 PM, William Slacum 
 wilhelm.von.cl...@accumulo.net wrote:

 Depending on your table schema, you'll probably want to translate an
 object graph into multiple mutations.


 On Thu, Apr 24, 2014 at 8:40 PM, David Medinets david.medin...@gmail.com
  wrote:

 If the sub-document changes, you'll need to search the values of every
 Accumulo entry?


 On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts threadedb...@gmail.com
  wrote:

 The use case is, I am walking a complex object graph and persisting
 what I find there.  Said object graph in my case is always EMF (eclipse
 modeling framework) compliant.  An EMF graph can have in if references
 to--brace yourself--a non-cross document containment reference.  When using
 Mongo, these were persisted as a DBObject embedded into a containing
 DBObject.  I'm trying to decide whether I want to follow suit.

 Any thoughts?


 On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey bus...@cloudera.comwrote:

 Can you describe the use case more? Do you know what the purpose for
 the embedded changes are?


 On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts 
 threadedb...@gmail.com wrote:

 All,

 I am in the throws of converting some(else's) code from MongoDB to
 Accumulo.  I am seeing a situation where one DBObject if being embedded
 into another DBObject.  I see that Mutation supports a method called
 getRow()  that returns a byte array.  I gather I can use this to achieve 
 a
 similar result if I were so inclined.

 Am I so inclined?  i.e. Is this the way we do things in Accumulo?

 DBObject, roughly speaking, is Mongo's counterpart to Mutation.

 Thanks mucho

 --
 There are ways and there are ways,

 Geoffry Roberts




 --
 Sean




 --
 There are ways and there are ways,

 Geoffry Roberts






 --
 There are ways and there are ways,

 Geoffry Roberts



Re: Write to table from Accumulo iterator

2014-04-25 Thread Mike Drob
Can you share a little more about what you are trying to achieve? My first
thought would be to try looking at the Conditional Mutations present in
1.6.0 (not yet released) as either a ready implementation our a starting
point for your own code.
On Apr 25, 2014 10:13 PM, BlackJack76 justin@gmail.com wrote:

 I am trying to figure out the best way to write to the table from inside
 the
 seek method of a class that implements SortedKeyValueIterator.  I
 originally
 tried to create a BatchWriter and just use that to write data.  However, if
 the tablet moved during a flush then it would hang.

 Any other recommendations on how to write back to the table?  Thanks!



 --
 View this message in context:
 http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412.html
 Sent from the Users mailing list archive at Nabble.com.



Re: Accumulo and OSGi

2014-04-23 Thread Mike Drob
Geoffry,

Fixing our logging libraries is an open issue -
https://issues.apache.org/jira/browse/ACCUMULO-1242

I hope to see it resolved soon. It's a pretty big task, so if you feel
inspired to help, it would be appreciated as well!

Thanks,
Mike


On Wed, Apr 23, 2014 at 9:39 AM, Geoffry Roberts threadedb...@gmail.comwrote:

 I thought I'd check in.

 After some encouragement from this group, I found some time and now have
 an Accumulo client running in OSGi (Felix).  It's rather primitive, at this
 juncture, in that it is little more than a wrap job.  I was, however,
 forced to hack Zookeeper to get things to work.  Zookeeper needed to import
 an additional package.  I used the servicemix bundle for Hadoop.

 Josh, You asked if there was anything that could be done upstream to make
 osgification go better.  One thing, and it's not a huge deal, but getting
 everything on the same logging library would be nice.  So far, I see both
 log4j and slf4j.  Are there more?



 On Thu, Apr 10, 2014 at 12:49 PM, Russ Weeks rwe...@newbrightidea.comwrote:

 On Thu, Apr 10, 2014 at 7:18 AM, Geoffry Roberts 
 threadedb...@gmail.comwrote:

 You say the community would be well-accepting of bundling up the
 Accumulo client.  If that's the case, I'd like to hear from them.


 +1!




 --
 There are ways and there are ways,

 Geoffry Roberts



Re: Accumulo not starting anymore

2014-04-16 Thread Mike Drob
Can you verify that the accumulo files are still present in HDFS?

hdfs -ls /accumulo


On Wed, Apr 16, 2014 at 4:15 PM, Geoffry Roberts threadedb...@gmail.comwrote:

 All,

 Suddenly, Accumulo will no longer start.  Log files are not helpful.  Is
 there a way to troubleshoot this?

 The back story is I upgraded from OSX 10.7 to 10.9.  Everything was
 working with 10.7.  But with 10.9 Accumulo began to complain of
 insufficient file limits and recommended setting maxfiles to 65536, which I
 did.

 Hadoop starts -- version 2.3.0
 Zookeeper starts -- version 3.4.6
 Java -- version 1.7.55

 I've included part of a log file just in case.

 Thanks mucho

 From:  master_abend.home.debug.log

 t2014-04-16 15:59:07,250 [server.Accumulo] INFO : master starting
 2014-04-16 15:59:07,251 [server.Accumulo] INFO : Instance
 d9f3a06a-ef06-4860-a08d-9cff805a9249
 2014-04-16 15:59:07,254 [server.Accumulo] INFO : Data Version 5
 2014-04-16 15:59:07,254 [server.Accumulo] INFO : Attempting to talk to
 zookeeper
 2014-04-16 15:59:07,264 [zookeeper.ZooSession] DEBUG: Connecting to
 localhost:2181 with timeout 3 with auth
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:
 host.name=abend.home
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:java.version=1.7.0_55
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:java.vendor=Oracle Corporation
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:java.class.path=/usr/local/accumulo/conf:/usr/local/accumulo/lib/accumulo-start.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:java.library.path=/usr/local/hadoop/lib/native
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:java.io.tmpdir=/var/folders/sb/g6bpj4cd401c1sw566x2r41mgn/T/
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:java.compiler=NA
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:
 os.name=Mac OS X
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:os.arch=x86_64
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:os.version=10.9.2
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:
 user.name=gcr
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:user.home=/Users/gcr
 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client
 environment:user.dir=/Users/gcr
 2014-04-16 15:59:07,272 [zookeeper.ZooKeeper] INFO : Initiating client
 connection, connectString=localhost:2181 sessionTimeout=3
 watcher=org.apache.accumulo.fate.zookeeper.ZooSession$ZooWatcher@14731467
 2014-04-16 15:59:07,288 [zookeeper.ClientCnxn] INFO : Opening socket
 connection to server localhost/127.0.0.1:2181. Will not attempt to
 authenticate using SASL (unknown error)
 2014-04-16 15:59:07,289 [zookeeper.ClientCnxn] INFO : Socket connection
 established to localhost/127.0.0.1:2181, initiating session
 2014-04-16 15:59:07,294 [zookeeper.ClientCnxn] INFO : Session
 establishment complete on server localhost/127.0.0.1:2181, sessionid =
 0x1456c089e9b0008, negotiated timeout = 3
 2014-04-16 15:59:07,394 [watcher.MonitorLog4jWatcher] INFO : Set watch for
 Monitor Log4j watcher
 2014-04-16 15:59:07,399 [server.Accumulo] INFO : Waiting for accumulo to
 be initialized
 2014-04-16 15:59:08,401 [server.Accumulo] INFO : Waiting for accumulo to
 be initialized

 --
 There are ways and there are ways,

 Geoffry Roberts



Re: How to delete all rows in Accumulo table?

2014-04-14 Thread Mike Drob
All commands are from memory, so typos might exist. Deleting all rows can
be a very lengthy operation. It will likely be much faster to delete the
table and create a new one.

 droptable foo
 createtable foo

If you had configuration settings on the table that you wanted to keep,
then it might be easier to create, delete, and rename.

 createtable temp -cs foo -cc foo
 droptable foo
 renametable temp foo

If you really need to delete all the rows, you can do it using

 deleterows -t foo -f



On Mon, Apr 14, 2014 at 3:13 PM, Tiffany Reid tr...@eoir.com wrote:

 Hi,



 How do I delete all rows in a table via Accumulo Shell?



 Thanks,

 Tiffany







[ANN] Accumulo 1.4.5 Released

2014-04-07 Thread Mike Drob
Users,

I am pleased to announce that Accumulo 1.4.5 has been released. The bits
are available on our downloads page [1].

Notable improvements of this release include:
* Support for Hadoop 2
* Resilience to zookeeper node failure
* Provide static utility for resource cleanup for web containers
* Automatic cleanup of files resulting from failed flush/compaction

Happy Accumulating!
Mike

[1]: http://accumulo.apache.org/downloads/index.html


Re: NOT operator in visibility string

2014-03-19 Thread Mike Drob
Wait, I'm really confused by what you are describing, Jeff. Sorry if these
are obvious questions, but can you help me get a better grasp of your use
case?

You have a large amount of data, that is generally readable by all users.
Users create their own sandbox, from which they can later exclude portions
of the global data set.
User can share their sandbox with others, so really we are talking about
sandbox permissions and not so much user permissions.
Sandboxes are created often. Or, at least much more often than the data
changes.

Are those all accurate statements? If so, can you clarify the following
points:

Do users typically remove large amounts of data from their sandbox? 1%?
10%? 99%?
Assuming data is removed via rules, are the rules applied automatically to
new data under ingest?

Thanks,
Mike


On Wed, Mar 19, 2014 at 12:54 PM, Jeff Kunkle kunkl...@gmail.com wrote:

 Hi John,

 Yes it's accurate that the system controls the label and who is associated
 with it; there are no Accumulo-internal user accounts. But I don't think
 it's feasible to remove a sandbox label from something that should be
 hidden. Such a scenario would imply that all data is tagged with the
 labels of every sandbox that is allowed to see the data, which would be
 most. It would also imply that the creation of a new sandbox would
 necessitate changing the visibility of everything in Accumulo to include
 the new sandbox label, effectively rewriting the entire database. Sanboxes
 are created and deleted all the time in our application, so it doesn't seem
 like a feasible solution to me.

 -Jeff

 On Mar 19, 2014, at 12:16 PM, Josh Elser josh.el...@gmail.com wrote:

  It kind of sounds like you could manage this much easier by controlling
 the authorizations a user gets (notably the workspace name) and the
 grant/revoke above the Accumulo level.
 
  A sandbox has a unique label and the external system controls which
 users are granted that label. This way, each sandbox can be modified
 individually (using authorizations that contain the data visibility and the
 sandbox label) or the original data set could be modified (by omitting a
 sandbox label in the authorizations used).
 
  Is that accurate?
 
  On 3/19/14, 12:05 PM, Jeff Kunkle wrote:
  I attempted to simplify the scenario to facilitate discussion, which on
  second thought may have been a mistake. Here's the whole scenario:
 
  Different users have access to different subsets of the data depending
  on their authorizations and the visibility of the data. Users work
  with the data in what we call a sandbox. Sanboxes can be shared with
  other users (this is the group creation I was talking about earlier).
  Deletes to the data would be scoped to the sandbox by changing the
  visibility to add  !workspace_name so that people viewing the
  workspace wouldn't see the data but everyone else would.
 
  On Mar 19, 2014, at 11:48 AM, Sean Busbey busbey+li...@cloudera.com
  mailto:busbey+li...@cloudera.com wrote:
 
  On Wed, Mar 19, 2014 at 10:43 AM, Jeff Kunkle kunkl...@gmail.com
  mailto:kunkl...@gmail.com wrote:
 
 New groups are created on the fly by our application when needed.
 Under the scenario you describe we'd have to go through all the
 data in Accumulo whenever a group is created so that users in the
 group can see the existing data.
 
 
 
 
  Ah! So your use case is that all data defaults to world readable and
  then users have the option of opting out of seeing subsets. Right?
 
  In your scenario user groups also get to opt-out of seeing data on the
  fly, yes? Both require rewriting the data. Does the group creation
  happen more often?
 




Re: Filters and ScannerBase.fetchColumn

2014-03-19 Thread Mike Drob
Yes, you are running into the same issue described in
https://issues.apache.org/jira/browse/ACCUMULO-1801


On Wed, Mar 19, 2014 at 6:41 PM, John Vines vi...@apache.org wrote:

 Yes, column level filtering happens before any client iterators get a
 chance to touch the results.


 On Wed, Mar 19, 2014 at 6:36 PM, Russ Weeks rwe...@newbrightidea.comwrote:

 Sorry for the flood of e-mails... I'm not trying to spam the list, I'm
 just getting deeper into accumulo, and loving it, and I'm kind of stumped
 by it at the same time.

 Is it true that if a scanner restricts the column families/qualifiers to
 be returned, that these columns are not visible to any iterators? ie. that
 this restriction is applied at a higher priority than any of the
 iterators?

 I have some rows that look like this:

 00021cdaac30 meta:size []656
 00021cdaac30 meta:source []data2
 00021cfaac30 meta:filename []doc04484522
 00021cfaac30 meta:size []565
 00021cfaac30 meta:source []data2
 00021dcaac30 meta:filename []doc03342958

 I have a couple of RowFilters chained together to filter based on source
 and filename. If I just run scan --columns meta:size I get no results. I
 have to specify scan --columns meta:size,meta:source,meta:filename to get
 any results, which implies that I need to know beforehand which columns are
 required for any active iterators.

 Is this expected behaviour?

 Thanks,
 -Russ





Re: Installing with Hadoop 2.2.0

2014-03-18 Thread Mike Drob
   $HADOOP_PREFIX/share/hadoop/common/.*.jar,
   $HADOOP_PREFIX/share/hadoop/common/lib/.*.jar,
   $HADOOP_PREFIX/share/hadoop/hdfs/.*.jar,
   $HADOOP_PREFIX/share/hadoop/mapreduce/.*.jar,
   $HADOOP_PREFIX/share/hadoop/yarn/.*.jar,
   /usr/lib/hadoop/.*.jar,
   /usr/lib/hadoop/lib/.*.jar,
   /usr/lib/hadoop-hdfs/.*.jar,
   /usr/lib/hadoop-mapreduce/.*.jar,
   /usr/lib/hadoop-yarn/.*.jar,
   $ACCUMULO_HOME/server/target/classes/,
   $ACCUMULO_HOME/lib/accumulo-server.jar,
   $ACCUMULO_HOME/core/target/classes/,
   $ACCUMULO_HOME/lib/accumulo-core.jar,
   $ACCUMULO_HOME/start/target/classes/,
   $ACCUMULO_HOME/lib/accumulo-start.jar,
   $ACCUMULO_HOME/fate/target/classes/,
   $ACCUMULO_HOME/lib/accumulo-fate.jar,
   $ACCUMULO_HOME/proxy/target/classes/,
   $ACCUMULO_HOME/lib/accumulo-proxy.jar,
   $ACCUMULO_HOME/lib/[^.].*.jar,
   $ZOOKEEPER_HOME/zookeeper[^.].*.jar,
   $HADOOP_CONF_DIR,
   $HADOOP_PREFIX/[^.].*.jar,
   $HADOOP_PREFIX/lib/[^.].*.jar,
 /value
 descriptionClasspaths that accumulo checks for updates and class
 files.
   When using the Security Manager, please remove the
 .../target/classes/ values.
 /description
   /property
 /configuration


 On Sun, Mar 16, 2014 at 9:06 PM, Josh Elser josh.el...@gmail.comwrote:

 Posting your accumulo-site.xml (filtering out instance.secret and
 trace.password before you post) would also help us figure out what exactly
 is going on.


 On 3/16/14, 8:41 PM, Mike Drob wrote:

 Which version of Accumulo are you using?

 You might be missing the hadoop libraries from your classpath. For
 this,
 you would check your accumulo-site.xml and find the comment about
 Hadoop
 2 in the file.


 On Sun, Mar 16, 2014 at 8:28 PM, Benjamin Parrish
 benjamin.d.parr...@gmail.com mailto:benjamin.d.parr...@gmail.com
 wrote:

 I have a couple of issues when trying to use Accumulo on Hadoop
 2.2.0

 1) I start with accumulo init and everything runs through just
 fine,
 but I can find '/accumulo' using 'hadoop fs -ls /'

 2) I try to run 'accumulo shell -u root' and it says that that
 Hadoop and ZooKeeper are not started, but if I run 'jps' on the
 each
 cluster node it shows all the necessary processes for both in the
 JVM.  Is there something I am missing?

 --
 Benjamin D. Parrish
 H: 540-597-7860 tel:540-597-7860





 --
 Benjamin D. Parrish
 H: 540-597-7860




 --
 Benjamin D. Parrish
 H: 540-597-7860





 --
 Benjamin D. Parrish
 H: 540-597-7860



Re: Installing with Hadoop 2.2.0

2014-03-18 Thread Mike Drob
 and
   limitations under the License.
 --
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 configuration
   !-- Put your site-specific accumulo configurations here. The
 available configuration values along with their defaults are documented 
 in
 docs/config.html Unless
 you are simply testing at your workstation, you will most
 definitely need to change the three entries below. --

   property
 nameinstance.zookeeper.host/name

 valuehadoop-node-1:2181,hadoop-node-2:2181,hadoop-node-3:2181,hadoop-node-4:2181,hadoop-node-5:2181/value
 descriptioncomma separated list of zookeeper
 servers/description
   /property

   property
 namelogger.dir.walog/name
 valuewalogs/value
 descriptionThe property only needs to be set if upgrading from
 1.4 which used to store write-ahead logs on the local
   filesystem. In 1.5 write-ahead logs are stored in DFS.  When
 1.5 is started for the first time it will copy any 1.4
   write ahead logs into DFS.  It is possible to specify a
 comma-separated list of directories.
 /description
   /property

   property
 nameinstance.secret/name
 value/value
 descriptionA secret unique to a given instance that all servers
 must know in order to communicate with one another.
   Change it before initialization. To
   change it later use ./bin/accumulo
 org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new
 [newpasswd],
   and then update this file.
 /description
   /property

   property
 nametserver.memory.maps.max/name
 value1G/value
   /property

   property
 nametserver.cache.data.size/name
 value128M/value
   /property

   property
 nametserver.cache.index.size/name
 value128M/value
   /property

   property
 nametrace.token.property.password/name
 !-- change this to the root user's password, and/or change the
 user below --
 value/value
   /property

   property
 nametrace.user/name
 valueroot/value
   /property

   property
 namegeneral.classpaths/name
 value
   $HADOOP_PREFIX/share/hadoop/common/.*.jar,
   $HADOOP_PREFIX/share/hadoop/common/lib/.*.jar,
   $HADOOP_PREFIX/share/hadoop/hdfs/.*.jar,
   $HADOOP_PREFIX/share/hadoop/mapreduce/.*.jar,
   $HADOOP_PREFIX/share/hadoop/yarn/.*.jar,
   /usr/lib/hadoop/.*.jar,
   /usr/lib/hadoop/lib/.*.jar,
   /usr/lib/hadoop-hdfs/.*.jar,
   /usr/lib/hadoop-mapreduce/.*.jar,
   /usr/lib/hadoop-yarn/.*.jar,
   $ACCUMULO_HOME/server/target/classes/,
   $ACCUMULO_HOME/lib/accumulo-server.jar,
   $ACCUMULO_HOME/core/target/classes/,
   $ACCUMULO_HOME/lib/accumulo-core.jar,
   $ACCUMULO_HOME/start/target/classes/,
   $ACCUMULO_HOME/lib/accumulo-start.jar,
   $ACCUMULO_HOME/fate/target/classes/,
   $ACCUMULO_HOME/lib/accumulo-fate.jar,
   $ACCUMULO_HOME/proxy/target/classes/,
   $ACCUMULO_HOME/lib/accumulo-proxy.jar,
   $ACCUMULO_HOME/lib/[^.].*.jar,
   $ZOOKEEPER_HOME/zookeeper[^.].*.jar,
   $HADOOP_CONF_DIR,
   $HADOOP_PREFIX/[^.].*.jar,
   $HADOOP_PREFIX/lib/[^.].*.jar,
 /value
 descriptionClasspaths that accumulo checks for updates and
 class files.
   When using the Security Manager, please remove the
 .../target/classes/ values.
 /description
   /property
 /configuration


 On Sun, Mar 16, 2014 at 9:06 PM, Josh Elser josh.el...@gmail.comwrote:

 Posting your accumulo-site.xml (filtering out instance.secret and
 trace.password before you post) would also help us figure out what 
 exactly
 is going on.


 On 3/16/14, 8:41 PM, Mike Drob wrote:

 Which version of Accumulo are you using?

 You might be missing the hadoop libraries from your classpath. For
 this,
 you would check your accumulo-site.xml and find the comment about
 Hadoop
 2 in the file.


 On Sun, Mar 16, 2014 at 8:28 PM, Benjamin Parrish
 benjamin.d.parr...@gmail.com mailto:benjamin.d.parr...@gmail.com
 wrote:

 I have a couple of issues when trying to use Accumulo on Hadoop
 2.2.0

 1) I start with accumulo init and everything runs through just
 fine,
 but I can find '/accumulo' using 'hadoop fs -ls /'

 2) I try to run 'accumulo shell -u root' and it says that that
 Hadoop and ZooKeeper are not started, but if I run 'jps' on the
 each
 cluster node it shows all the necessary processes for both in
 the
 JVM.  Is there something I am missing?

 --
 Benjamin D. Parrish
 H: 540-597-7860 tel:540-597-7860





 --
 Benjamin D. Parrish
 H: 540-597-7860




 --
 Benjamin D. Parrish
 H: 540-597-7860





 --
 Benjamin D. Parrish
 H: 540-597-7860





 --
 Benjamin D. Parrish
 H: 540-597-7860



Re: HDFS caching w/ Accumulo?

2014-02-25 Thread Mike Drob
First instinct is to use it for the root/metadata tablets.


On Tue, Feb 25, 2014 at 10:49 AM, Donald Miner dmi...@clearedgeit.comwrote:

 HDFS caching is part of the new Hadoop 2.3 release. From what I
 understand, it allows you to mark specific files to be held in memory for
 faster reads.

 Has anyone thought about how Accumulo could leverage this?



Re: Error stressing with pyaccumulo app

2014-02-11 Thread Mike Drob
For uuid4 keys, you might want to do [00, 01, 02, ..., 0e, 0f, 10, ..., fd,
fe, ff] to cover the full range.


On Tue, Feb 11, 2014 at 9:16 AM, Josh Elser josh.el...@gmail.com wrote:

 Ok. Even so, try adding some split points to the tables before you begin
 (if you aren't already) as it will *greatly* smooth the startup.

 Something like [00, 01, 02, ... 10, 11, 12, .. 97, 98, 99] would be good.
 You can easily dump this to a file on local disk and run the `addsplits`
 command in the Accumulo shell and provide it that file with the -sf (I
 think) option.


 On 2/11/14, 12:00 PM, Diego Woitasen wrote:

 I'm using random keys for this tests. They are uuid4 keys.

 On Tue, Feb 11, 2014 at 1:04 PM, Josh Elser josh.el...@gmail.com wrote:

 The other thing I thought about.. what's the distribution of Key-Values
 that you're writing? Specifically, do many of the Keys sort near each
 other. Similarly, do you notice excessive load on some tservers, but not
 all
 (the Tablet Servers page on the Monitor is a good check)?

 Consider the following: you have 10 tservers and you have 10 proxy
 servers.
 The first thought is that 10 tservers should be plenty to balance the
 load
 of those 10 proxy servers. However, a problem arises when if the data
 that
 each of those proxy servers is writing happens to reside on a _small
 number
 of tablet servers_. Thus, your 10 proxy servers might only be writing to
 one
 or two tabletservers.

 If you notice that you're getting skew like this (or even just know that
 you're apt to have a situation where multiple clients might write data
 that
 sorts close to one another), it would be a good idea to add splits to
 your
 table before starting your workload.

 e.g. if you consider that your Key-space is the numbers from 1 to 10, and
 you have ten tservers, it would be a good idea to add splits 1, 2, ...
 10,
 so that each tservers hosts at least one tablet (e.g. [1,2), [2,3)...
 [10,+inf)). Having at least 5 or 10 tablets per tserver per table (split
 according to the distribution of your data) might help ease the load.


 On 2/11/14, 10:47 AM, Diego Woitasen wrote:


 Same results with 2G tserver.memory.maps.max.

 May be we just reached the limit :)

 On Mon, Feb 10, 2014 at 7:08 PM, Diego Woitasen
 diego.woita...@vhgroup.net wrote:


 On Mon, Feb 10, 2014 at 6:21 PM, Josh Elser josh.el...@gmail.com
 wrote:


 I assume you're running a datanode along side the tserver on that
 node?
 That
 may be stretching the capabilities of that node (not to mention ec2
 nodes
 tend to be a little flakey in general). 2G for the
 tserver.memory.maps.max
 might be a little safer.

 You got an error in a tserver log about that IOException in
 internalReader.
 After that, the tserver was still alive? And the proxy client was
 dead -
 quit normally?



 Yes, everything is still alive.


 If that's the case, the proxy might just be disconnecting in a noisy
 manner?



 Right!

 I'll try with 2G  tserver.memory.maps.max.




 On 2/10/14, 3:38 PM, Diego Woitasen wrote:



 Hi,
 I tried increasing the tserver.memory.maps.max to 3G and failed
 again, but with other error. I have a heap size of 3G and 7.5 GB of
 total ram.

 The error that I've found in the crashed tserver is:

 2014-02-08 03:37:35,497 [util.TServerUtils$THsHaServer] WARN : Got
 an
 IOException in internalRead!

 The tserver haven't crashed, but the client was disconnected during
 the
 test.

 Another hint is welcome :)

 On Mon, Feb 3, 2014 at 3:58 PM, Josh Elser josh.el...@gmail.com
 wrote:



 Oh, ok. So that isn't quite as bad as it seems.

 The commits are held exception is thrown when the tserver is
 running
 low
 on memory. The tserver will block new mutations coming in until it
 can
 process the ones it already has and free up some memory. This makes
 sense
 that you would see this more often when you have more proxy servers
 as
 the
 total amount of Mutations you can send to your Accumulo instance is
 increased. With one proxy server, your tserver had enough memory to
 process
 the incoming data. With many proxy servers, your tservers would
 likely
 fall
 over eventually because they'll get bogged down in JVM garbage
 collection.

 If you have more memory that you can give the tservers, that would
 help.
 Also, you should make sure that you're using the Accumulo native
 maps
 as
 this will use off-JVM-heap space instead of JVM heap which should
 help
 tremendously with your ingest rates.

 Native maps should be on by default unless you turned them off using
 the
 property 'tserver.memory.maps.native.enabled' in accumulo-site.xml.
 Additionally, you can try increasing the size of the native maps
 using
 'tserver.memory.maps.max' in accumulo-site.xml. Just be aware that
 with
 the
 native maps, you need to ensure that total_ram  JVM_heap +
 tserver.memory.maps.max

 - Josh


 On 2/3/14, 1:33 PM, Diego Woitasen wrote:




 I've launched the cluster again and I was able to reproduce the
 error:

 In the proxy I had the same error 

Re: force tablet assignment to tablet server?

2014-02-04 Thread Mike Drob
You can implement your own Balancer. Or kill all the other tablet servers.
:)


On Tue, Feb 4, 2014 at 10:47 AM, Donald Miner dmi...@clearedgeit.comwrote:

 Is there a way to force a particular tablet to be hosted off of a
 particular tablet server?

 There is some tricky stuff I want to do with data locality alongside of
 another system and I think this would help.




Re: force tablet assignment to tablet server?

2014-02-04 Thread Mike Drob
For future search indexing - we are referring to creating a custom
implementation of
http://accumulo.apache.org/1.5/apidocs/org/apache/accumulo/server/master/balancer/TabletBalancer.htmland
loading it onto your cluster.


On Tue, Feb 4, 2014 at 11:07 AM, Donald Miner dmi...@clearedgeit.comwrote:

 Balancer is going to do exactly what i need. The second option sounds much
 more fun though.  Thanks!

 On Feb 4, 2014, at 10:49 AM, Mike Drob mad...@cloudera.com wrote:

 You can implement your own Balancer. Or kill all the other tablet servers.
 :)


 On Tue, Feb 4, 2014 at 10:47 AM, Donald Miner dmi...@clearedgeit.comwrote:

 Is there a way to force a particular tablet to be hosted off of a
 particular tablet server?

 There is some tricky stuff I want to do with data locality alongside of
 another system and I think this would help.





Re: Using Java7, fetch instance name or uuid WITHOUT Connector class?

2014-01-28 Thread Mike Drob
Tangential note - In Java 7, I thought that Swing was deprecated in favour
of JavaFX[1][2]?

[1]: http://www.oracle.com/technetwork/java/javafx/overview/faq-1446554.html
[2]: http://docs.oracle.com/javafx/2/swing/jfxpub-swing.htm


On Tue, Jan 28, 2014 at 1:59 PM, Ott, Charles H.
charles.h@leidos.comwrote:

 That'll work.  I'm already prompting the user to enter the zookeeper host
 when adding the 'server' to the tool, so I'll just add a field to support
 the instance name as well.  Thanks.



 *From:* user-return-3667-CHARLES.H.OTT=leidos@accumulo.apache.org[mailto:
 user-return-3667-CHARLES.H.OTT=leidos@accumulo.apache.org] *On Behalf
 Of *Keith Turner
 *Sent:* Tuesday, January 28, 2014 1:41 PM
 *To:* user@accumulo.apache.org
 *Subject:* Re: Using Java7, fetch instance name or uuid WITHOUT Connector
 class?







 On Tue, Jan 28, 2014 at 12:26 PM, Ott, Charles H. 
 charles.h@leidos.com wrote:

 I am making a more user friendly (Swing) tool for performing
 import/exports of table data via hadoop.io.sequencefile. (Currently using
 Accumulo 1.5.0 w/ cdh4u5)

 The first thing I do is load a list of tables into a swing component using
 the http://monitorURL/xml URL and JAXB:

 private void loadTables() {
 try {
 jaxbContext = JAXBContext.newInstance(Stats.class);
 jaxbUnmarshaller = jaxbContext.createUnmarshaller();
 jaxbMarshaller = jaxbContext.createMarshaller();

 Stats stats = (Stats) jaxbUnmarshaller.unmarshal(
 new URL(http://;
 + associatedHost.getHostname()
 + :
 + associatedHost.getUi_port()
 + /xml));
 String results = new String();
 for (Table t : stats.getTables().getTable()) {
 results = results.concat(t.getTablename() + \r\n);
 }
 jEditorPane1.setText(results);
 } catch (Exception err) {
 err.printStackTrace();
 }
 }

 Then I create a ZooKeeperInstance, and call the 'getConnector' method to
 get a connector used for scanning:
 try {
 connector = zooInstance.getConnector(username,
 password.getBytes());
 getUserAuths();
 } catch (Exception err) {
 err.printStackTrace();
 }

 Since, I now have the connector, I can get the 'user' Authorizations class
 for the export tool's client.Scanner:

 this.authorizations =
 connector.securityOperations().getUserAuthorizations(username);

 The part I am not sure how to do is automatically determine the 'instance
 name' or 'instance uuid' when constructing the ZooKeeperInstance.  I can
 see both strings displayed on the header of the Accumulo Monitor:

 div
 id='subheader'Instancenbsp;Name:nbsp;gmnbsp;nbsp;nbsp;Version:nbsp;1.5.0
 brspan
 class='smalltext'Instancenbsp;ID:nbsp;a85286bf-031c-4e24-9b47-f6aca34401b8/span
 brspan
 class='smalltext'Tuenbsp;Jannbsp;28nbsp;12:15:41nbsp;ESTnbsp;2014/span/div
 /div

 But I do not see any 'clean' way to retrieve it using the Java API,
 without doing a parse of the monitor's HTML.  Which feels dirty.

 This leaves me with one option, for the user to specify the instance name
 before clicking 'Export Tables'.  Which I think is a bit silly considering
 the user has already entered and saved the MonitorURL, dbUsername and
 dbPassword within the tool.  Thoughts?



 Maybe start off asking the user for the instance name and zookeepers
 instead of the monitor URL.  Once you create a connector you can use
 connector.tableOperations() to get a list of tables w/o accessing the
 monitor.






 Thanks in advance to anyone who read this far!





Re: accumulo startup issue: Accumulo not initialized, there is no instance id at /accumulo/instance_id

2014-01-16 Thread Mike Drob
The tracer does performance metrics logging, and stores the data internally
in accumulo. It needs a tablet server running to persist everything and
will complain until it finds one.

Are your tablet servers and loggers running? I would stop your tracer app
until you have everything else up.


On Thu, Jan 16, 2014 at 1:41 PM, Steve Kruse skr...@adaptivemethods.comwrote:

 Straightened out HDFS, now having problem getting accumulo to start.  Get
 the following exception in my tracer log repeatedly.



 2014-01-16 13:06:49,131 [impl.ServerClient] DEBUG: ClientService request
 failed null, retrying ...

 org.apache.thrift.transport.TTransportException: Failed to connect to a
 server

 at
 org.apache.accumulo.core.client.impl.ThriftTransportPool.getAnyTransport(ThriftTransportPool.java:437)

 at
 org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:152)

 at
 org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:128)

 at
 org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:123)

 at
 org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerClient.java:105)

 at
 org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:71)

 at
 org.apache.accumulo.core.client.impl.ConnectorImpl.init(ConnectorImpl.java:64)

 at
 org.apache.accumulo.server.client.HdfsZooInstance.getConnector(HdfsZooInstance.java:154)

 at
 org.apache.accumulo.server.client.HdfsZooInstance.getConnector(HdfsZooInstance.java:149)

 at
 org.apache.accumulo.server.trace.TraceServer.init(TraceServer.java:185)

 at
 org.apache.accumulo.server.trace.TraceServer.main(TraceServer.java:260)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at org.apache.accumulo.start.Main$1.run(Main.java:101)

 at java.lang.Thread.run(Thread.java:619)



 When I look at my processes, I have a gc and tracer app running.  I also
 can’t seem to run accumulo init again because it says that it’s already
 been initialized.



 Steve



 *From:* Sean Busbey [mailto:busbey+li...@cloudera.com]
 *Sent:* Wednesday, January 15, 2014 4:43 PM

 *To:* Accumulo User List
 *Subject:* Re: accumulo startup issue: Accumulo not initialized, there is
 no instance id at /accumulo/instance_id



 On Wed, Jan 15, 2014 at 3:36 PM, Steve Kruse skr...@adaptivemethods.com
 wrote:

 Sean,



 The classpath for HDFS was incorrect and that definitely helped when I
 corrected it.  Now it seems I’m having a hadoop issue where the datanodes
 are not running.  I’m going to keep plugging away.







 Glad to hear you made progress. Generally, I recommend people run through
 teragen / terasort to validate their HDFS and MR set up before the move on
 to installing Accumulo.



 Let us know when you get back to trying to get Accumulo going.
 --


 Spamhttps://antispam.roaringpenguin.com/canit/b.php?i=0aLelHuyFm=25623c521f75t=20140115c=s
 Not 
 spamhttps://antispam.roaringpenguin.com/canit/b.php?i=0aLelHuyFm=25623c521f75t=20140115c=n
 Forget previous 
 votehttps://antispam.roaringpenguin.com/canit/b.php?i=0aLelHuyFm=25623c521f75t=20140115c=f



Re: accumulo startup issue: Accumulo not initialized, there is no instance id at /accumulo/instance_id

2014-01-15 Thread Mike Drob
What do you get when you try to run accumulo init?


On Wed, Jan 15, 2014 at 2:39 PM, Steve Kruse skr...@adaptivemethods.comwrote:

 Hello,



 I'm new to accumulo and I am trying to get it up and running.  I currently
 = have hadoop 2.2.0 and zookeeper 3.4.5 installed and running.  I have gone
 t= hrough the installation steps on the following page and I now am running
 in= to a problem when I try to start accumulo up.  The error I receive is
 the f=

 ollowing:



 Thread org.apache.accumulo.server.master.state.SetGoalState died null
 java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
 Meth=

 od)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethod=

 AccessorImpl.java:39)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(Delegati=

 ngMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at org.apache.accumulo.start.Main$1.run(Main.java:101)

 at java.lang.Thread.run(Thread.java:619)

 Caused by: java.lang.RuntimeException: Accumulo not initialized, there is
 n= o instance id at /accumulo/instance_id

 at
 org.apache.accumulo.core.client.ZooKeeperInstance.getIns=

 tanceIDFromHdfs(ZooKeeperInstance.java:293)

 at
 org.apache.accumulo.server.client.HdfsZooInstance._getIn=

 stanceID(HdfsZooInstance.java:126)

 at
 org.apache.accumulo.server.client.HdfsZooInstance.getIns=

 tanceID(HdfsZooInstance.java:119)

 at
 org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUt=

 il.java:24)

 at
 org.apache.accumulo.server.master.state.SetGoalState.mai=

 n(SetGoalState.java:46)

 ... 6 more



 I have tried to run accumulo init several times but I still get the same
 re= sult every single time.  Any help would be much appreciated.



 Thanks,



 Steve



 *H. Stephen Kruse*

 Software Engineer

 Adaptive Methods

 5860 Trinity Parkway, Suite 200

 Centreville, VA 20121

 phone: (703) 968-6132

 email: skr...@adaptivemethods.com





Re: Multiple masters in which version?

2014-01-06 Thread Mike Drob
Joe,

Stand-by master functionality has existed for a while now, (since before
1.4), so you should be good!

Let us know if you run into any issues.

Mike
On Jan 6, 2014 3:13 AM, Joe Gresock jgres...@gmail.com wrote:

 I seem to remember reading in one of the user guides that you can
 configure multiple Masters in your cluster by adding more than one IP to
 the masters file.  Is this available in Accumulo 1.4.3, or only later
 versions?

 Thanks!
 Joe

 --
 I know what it is to be in need, and I know what it is to have plenty.  I
 have learned the secret of being content in any and every situation,
 whether well fed or hungry, whether living in plenty or in want.  I can
 do all this through him who gives me strength.*-Philippians 4:12-13*



Re: Re-applying split file to a table?

2014-01-06 Thread Mike Drob
Aaron,

If you attempt to apply the same splits file, then you are attempting to
add already existing splits. Since the data is already split on those
points, there are no changes, and nothing happens, exactly as you observed.

If you apply a different split file to the existing data (after it already
had the initial and natural splits), then you will likely get more split
points. The data might not split immediately, but you can prompt it to do
so by issuing a major compaction. Your underlying data will not change, but
you should see more tablets in your table via the monitor interface.

Mike


On Mon, Jan 6, 2014 at 10:47 AM, Aaron aarongm...@gmail.com wrote:

 To set the stage:

 We create a table and pre-split it..then we start to ingest some data.
  During the ingest, the table splits a few more times maybe, and after the
 ingest is done the table balances itself out across the tablet severs.

 What happens if we apply the spilt file again  to the same table?  From
 what I can tell, nothing appears to change, but, just wanted to double
 check..make sure I wasn't missing anything.

 Same question, but if we use a completely different spilt file, with
 different splits?  Same result..nothing changes?



Re: Is accumulo supported on centos (6.x)

2013-12-23 Thread Mike Drob
Which version of the Cloudera Quickstart VM are you running?

To install a 1.4.4 Accumulo RPM, you will indeed have to build it from
source. 1.5.0 RPMs are available as downloads on the site, like Josh said.

Thanks,
Mike


On Sat, Dec 21, 2013 at 10:28 AM, ashili kash...@yahoo.com wrote:



 I see from downloads accumulo is supported on debian.
 From following readme link, I see accumulo is supported on linux

 https://git-wip-us.apache.org/repos/asf?p=accumulo.git;a=blob_plain;f=README;hb=419aacc45279a3cd6b3b5bf61baf486f082a450a

 My question: are accumulo binaries available for centos? or should I build
 it from the source/git?

 I am trying to integrate  accumulo with cloudera demo VM (centos, X64);



Re: 1.5 on cdh4u5

2013-12-11 Thread Mike Drob
It looks like you are running with an improperly configured Java Security
Policy. In the example accumulo-env.sh files there are some lines that look
like:



if [ -f ${ACCUMULO_CONF_DIR}/accumulo.policy ]
then
   POLICY=-Djava.security.manager
-Djava.security.policy=${ACCUMULO_CONF_DIR}/accumulo.policy
fi
test -z $ACCUMULO_MONITOR_OPTS  export
ACCUMULO_MONITOR_OPTS=${POLICY} -Xmx1g -Xms256m

Does $ACCUMULO_CONF_DIR/accumulo.policy exist on your system? If so, it
looks like you're missing PropertyPermission for the accumulo code. Compare
to line 112 of the example policy file. [2]

Mike

[1]:
https://github.com/apache/accumulo/blob/master/conf/examples/3GB/standalone/accumulo-env.sh?source=c
[2]:
https://github.com/apache/accumulo/blob/master/conf/accumulo.policy.example?source=cc




On Wed, Dec 11, 2013 at 7:32 AM, Ott, Charles H.
charles.h@leidos.comwrote:

 I am having a few issues getting 1.5 to run with cdh4u5 parcels
 installation.

 The baseline Accumulo-site.xml did not seem to point to a proper
 classpath, so I have made some modifications to the configuration.

 I was able to initialize the database (./accumulo init) and did not
 receive any errors when doing so.



 # vars from my accumulo-env.sh

 HADOOP_PREFIX=/opt/cloudera/parcels/CDH/lib/hadoop

 HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop

 ZOOKEEPER_HOME=/opt/cloudera/parcels/CDH/lib/zookeeper



 # cdh4 stuff I added to accumulo-env.sh

 export HADOOP_HDFS_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-hdfs

 export HADOOP_MAPREDUCE_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce



 # ACCUMULO_HOME is set as an env var for the ‘hdfs’ user.  Which has
 ownership of the Accumulo home and walogs folder.



 # the accumulo-site.xml general.classpath info

 property

 namegeneral.classpaths/name

 value

   $ACCUMULO_HOME/server/target/classes/,

   $ACCUMULO_HOME/lib/accumulo-server.jar,

   $ACCUMULO_HOME/core/target/classes/,

   $ACCUMULO_HOME/lib/accumulo-core.jar,

   $ACCUMULO_HOME/start/target/classes/,

   $ACCUMULO_HOME/lib/accumulo-start.jar,

   $ACCUMULO_HOME/fate/target/classes/,

   $ACCUMULO_HOME/lib/accumulo-fate.jar,

   $ACCUMULO_HOME/proxy/target/classes/,

   $ACCUMULO_HOME/lib/accumulo-proxy.jar,

   $ACCUMULO_HOME/lib/[^.].*.jar,

   $ZOOKEEPER_HOME/zookeeper[^.].*.jar,

   $HADOOP_CONF_DIR,

   $HADOOP_PREFIX/[^.].*.jar,

   $HADOOP_PREFIX/lib/[^.].*.jar,

   $HADOOP_HDFS_HOME/.*.jar,

   $HADOOP_HDFS_HOME/lib/.*.jar,

   $HADOOP_MAPREDUCE_HOME/.*.jar,

   $HADOOP_MAPREDUCE_HOME/lib/.*.jar

 /value

 descriptionClasspaths that accumulo checks for updates and class
 files.

   When using the Security Manager, please remove the
 .../target/classes/ values.

 /description

   /property



 I have also disabled ipv6, selinux, and dfs.permissions.

 Also increases ulimit to 65536, swapiness set to 10, ntpd installed and
 running.

 Trying to start Accumulo as the ‘hdfs’ user as my current 1.4.4 cluster is
 running on cdh3u6.



 But, when I run ./start-all.sh I have the following issues:



 1.)Monitor thread dies:

 *Thread monitor died null*

 *java.lang.reflect.InvocationTargetException*

 *at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*

 *at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)*

 *at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*

 *at java.lang.reflect.Method.invoke(Method.java:606)*

 *at org.apache.accumulo.start.Main$1.run(Main.java:101)*

 *at java.lang.Thread.run(Thread.java:744)*

 *Caused by: java.security.AccessControlException: access denied
 (java.util.PropertyPermission * read,write)*

 *at
 java.security.AccessControlContext.checkPermission(AccessControlContext.java:372)*

 *at
 java.security.AccessController.checkPermission(AccessController.java:559)*

 *at
 java.lang.SecurityManager.checkPermission(SecurityManager.java:549)*

 *at
 java.lang.SecurityManager.checkPropertiesAccess(SecurityManager.java:1269)*

 *at java.lang.System.getProperties(System.java:624)*

 *at
 org.apache.commons.configuration.SystemConfiguration.init(SystemConfiguration.java:38)*

 *at
 org.apache.accumulo.core.conf.Property.getDefaultValue(Property.java:384)*

 *at
 org.apache.accumulo.core.conf.DefaultConfiguration.iterator(DefaultConfiguration.java:52)*

 *at
 org.apache.accumulo.core.conf.ConfigSanityCheck.validate(ConfigSanityCheck.java:29)*

 *at
 org.apache.accumulo.core.conf.DefaultConfiguration.getInstance(DefaultConfiguration.java:37)*

 *at
 org.apache.accumulo.core.conf.AccumuloConfiguration.getDefaultConfiguration(AccumuloConfiguration.java:153)*

 *at
 org.apache.accumulo.core.conf.AccumuloConfiguration.getSiteConfiguration(AccumuloConfiguration.java:163)*

 *at
 

Re: HBase rowkey design guidelines

2013-12-03 Thread Mike Drob
Well, yes and no.

Smaller keys still mean less network traffic, potentially less IO, and
maybe faster operations if you're trying to do application logic. Using
data or default or just d probably doesn't matter in the long term
(although there are certainly cases where it might).
On Dec 3, 2013 11:57 PM, David Medinets david.medin...@gmail.com wrote:

 http://hbase.apache.org/book/rowkey.design.html - unless I am
 misunderstanding much of the advice given for HBase simply doesn't apply to
 Accumulo. For example Try to keep the ColumnFamily names as small as
 possible, preferably one character (e.g. d for data/default).



Re: How to reduce number of entries in memory

2013-10-28 Thread Mike Drob
What are you trying to accomplish by reducing the number of entries in
memory? A tablet server will not minor compact (flush) until the native map
fills up, but keeping things in memory isn't really a performance concern.

You can force a one-time minor compaction via the shell using the 'flush'
command.


On Mon, Oct 28, 2013 at 5:19 PM, Terry P. texpi...@gmail.com wrote:

 Greetings all,
 For a growing table that currently from zero to 70 million entries this
 weekend, I'm seeing 4.4 million entries still in memory, though the client
 programs are supposed to be flushing their entries.

 Is there a server-side setting to help reduce the number of entries that
 are in memory (not yet flushed to disk)?  Our system has fairly light
 performance requirements, so I'm okay if a tweak may result in reduced
 ingest performance.

 Thanks in advance,
 Terry



Re: Deleting many rows that match a given criterion

2013-10-23 Thread Mike Drob
Thanks for the feedback, Aru and Keith.

I've had some more time to play around with this, and here's some
additional observations.

My existing process is very slow. I think this is due to each deletemany
command starting up a new scanner and batchwriter, and creating a lot of
rpc overhead. I didn't initially think that it would be a significant
amount of data, but maybe I just had the wrong idea of what significant
is in this case.

I'm not sure the RowDeletingIterator would work in this case because I do
use empty rows for other purposes. The RowFilter at compaction is a great
option, except I had hoped to avoid writing actual java code. Looking back
at this, I might have to bite that bullet.

Again, thanks both for the suggestions!

Mike


On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner ke...@deenlo.com wrote:

 If its a significant amount of data, you could create a class that extends
 row filter and set it as a compaction iterator.


 On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob md...@mdrob.com wrote:

 I'm attempting to delete all rows from a table that contain a specific
 word in the value of a specified column. My current process looks like:

 accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN
 {print table tab}; {print deletemany -f -np -r $1}; END {print exit}'
  rows.out
 accumulo shell -f rows.out

 I tried playing around with scan iterators and various options on
 deletemany and deleterows but wasn't able to find a more straightforward
 way to do this. Does anybody have any suggestions?

 Mike





Re: Cancel a compact [SEC=UNOFFICIAL]

2013-10-01 Thread Mike Drob
Depending on the version that you are running, compactions can be cancelled
with varying degrees of difficulty and perseverance (and tablet server
restarts).


On Tue, Oct 1, 2013 at 10:09 PM, Dickson, Matt MR 
matt.dick...@defence.gov.au wrote:

 **

 *UNOFFICIAL*
 Can a compact process be cancelled?



Re: Tunneling over SSH

2013-09-05 Thread Mike Drob
There is some development going on as part of
ACCUMULO-1585https://issues.apache.org/jira/browse/ACCUMULO-1585[1]
to allow tservers to store the hostname instead of the ip address.
That
seems like a good place to start, although I'm not sure if this is the same
problem that you're seeing.

[1]: https://issues.apache.org/jira/browse/ACCUMULO-1585

Mike
https://issues.apache.org/jira/browse/ACCUMULO-1585


On Thu, Sep 5, 2013 at 7:14 PM, stbil...@gmail.com wrote:

 I'm trying to tunnel via SSH to a single Hadoop,Zoo, Accumulo stand-alone
 installation. The internal IP of the machine is on a local subnet behind a
 SSH-only firewall - 192.168.182.22.. I use static host names in all of the
 config files (Accumulo, Zoo, Hadoop) that resolve to 192.168.182.22 for all
 the servers. There is no problem connecting when I'm directly connected to
 the subnet inside the firewall.

 However, when I try to connect via the JAVA API from outside the firewall,
 I get an error: Failed to find an available server in the list of servers:
 [192.168.182.22:9997:9997 (12)]. I've created a Windows Loopback
 interface that allows me to forward unlimited ports directly through the
 SSH tunnel to the internal network - there is no issue with connecting to
 Hadoop via Java or the web interface, and I can view the Accumuoo status
 page at 50095 by just setting my Windows box to resolve the hostname to the
 loopback local IP - SSH - 192.168.182.22:50095.

 I think the problem is that Zookeeper is telling my Java process to try
 and make a connection directly to 192.168.22.9997. If Zoo would use the
 hostname, there'd be no problem as it'd resolve to the loopback, and get
 tunneled along with everything else. But since it uses the actual IP, the
 Windows box won't route that back through the SSH tunnel as it considers it
 a local subnet outside of the firewall.

 Anyone experienced this issue and have a solution? I guess one solution
 might be to 'trick' Windows into forwarding the 192.168.x.y subnet back
 through the loopback (- SSH), but I'm not seeing a good way to do that.

 Thanks



Re: Using Accumulo shell to add column visibility to cells containing Unicode values

2013-08-26 Thread Mike Drob
What version are you using? According to ACCUMULO-241, you should be able
to quote any UTF-8 characters for visibility using the Java API. The shell
will likely have parsing issues, however.

[1]: https://issues.apache.org/jira/browse/ACCUMULO-241


On Mon, Aug 26, 2013 at 3:56 PM, John Vines vi...@apache.org wrote:

 The java API is the most feature rich way of interfacing with Accumulo.
 The shell is a utility built on it, but occasionally issues get hit with
 parsing user input. It seems you have hit one of these cases. You may be
 able to quote your fields, etc.

 However, it is more important to note that the visibilities are very
 strict for the character set allowed. Only a-z, A-Z, 0-9, and a few
 additional characters are allowed (- and _ if I remember correctly). So
 unicode won't work, and you'd get an error indicating that if you could get
 the shell to accept them.


 On Mon, Aug 26, 2013 at 3:52 PM, Celeste Hofer celesteho...@gmail.comwrote:


 Hello,

 I am trying to add column visibility (a label) to cells containing
 Unicode values, using an Accumulo shell.
 However, I receive this
 ERROR: java.lang.IllegalArgumentException: Expected 4 arguments.  There
 was 6.


 Is the use of the Accumulo shell supported for applying column visibility
 when the value is Unicode?

 If it is supported, please provide a simple example, or more information.

 If it is not supported via the Accumulo shell, is there another supported
 approach, for example, using the Java API?

 Thanks,
 Celeste H





Re: Okay to purge trace table?

2013-06-06 Thread Mike Drob
David,

I already created a ticket for it -
https://issues.apache.org/jira/browse/ACCUMULO-1501

-Mike


On Thu, Jun 6, 2013 at 9:00 PM, David Medinets david.medin...@gmail.comwrote:

 Does it make sense to create a JIRA ticket asking for an age-off iterator
 to be the default on the trace table? Maybe set for something like two
 weeks? If we don't add a default age-off iterator, where should the
 documentation be changed to talk about this topic? Does the user manual
 talk about the trace table?


 On Thu, Jun 6, 2013 at 3:54 PM, Eric Newton eric.new...@gmail.com wrote:

 You could put an age-off iterator on it, or just purge it from
 time-to-time.

 I probably should have configured the trace table with an age-off filter
 by default.  But for now, you need to manually manage the data.

 You can use delete rows to wipe the table efficiently:

  shell  deleterows -f -t trace

 -Eric


 On Thu, Jun 6, 2013 at 2:35 PM, Terry P. texpi...@gmail.com wrote:

 Greetings all,
 We have a million entries in the trace table in one of our Accumulo
 clusters and a million and a half in another.  We haven't manually enabled
 any tracing activities, and in looking at the entries, they seem to be
 generated by Accumulo does on its own (compact, wal, getFileStatus,
 minorCompactionStarted, minorCompaction, prep, commit, etc).

 Does Accumulo maintain this table or do I / should I manually purge it
 from time to time?  If it's on us to maintain it, are there any guidelines
 or a procedure for doing so?

 Thanks in advance,
 Terry






Re: master fails to start

2013-05-20 Thread Mike Drob
Looks like you might be running with a Java Security Policy in place.


On Mon, May 20, 2013 at 4:28 PM, Chris Retford chris.retf...@gmail.comwrote:

 Accumulo 1.4.3. Hadoop is CDH3u6 (0.20.2). I can manually list files in
 Hadoop. Accumulo was able to run the init script. All accumulo directories
 in HDFS are world readable and executable.


 On Mon, May 20, 2013 at 2:20 PM, Christopher ctubb...@apache.org wrote:

 What version of Accumulo are you running?
 Can you manually query HDFS as the same user Accumulo is running as?

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii


 On Mon, May 20, 2013 at 4:14 PM, Chris Retford chris.retf...@gmail.com
 wrote:
  I searched the archive before posting and didn't find anything. I have
 a new
  system with 12 nodes (3 ZK), and a single user in the hadoop group. The
  master fails to start. It looks to me like it is unable to read
  /accumulo/instance_id in HDFS, but I can't think why that would be.
 Thanks
  in advance for any advice on how to run this down. Here are the
 contents of
  master.err log:
 
  Thread master died null
  java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
  at org.apache.accumulo.start.Main$1.run(Main.java:89)
  at java.lang.Thread.run(Thread.java:722)
  Caused by: java.lang.ExceptionInInitializerError
  at
 
 org.apache.hadoop.security.UserGroupInformation.clinit(UserGroupInformation.java:469)
  at
  org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1757)
  at
  org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1750)
  at
 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1618)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:255)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124)
  at
  org.apache.accumulo.core.file.FileUtil.getFileSystem(FileUtil.java:554)
  at
 
 org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceIDFromHdfs(ZooKeeperInstance.java:258)
  at
 
 org.apache.accumulo.server.conf.ZooConfiguration.getInstance(ZooConfiguration.java:65)
  at
 
 org.apache.accumulo.server.conf.ServerConfiguration.getZooConfiguration(ServerConfiguration.java:49)
  at
 
 org.apache.accumulo.server.conf.ServerConfiguration.getSystemConfiguration(ServerConfiguration.java:58)
  at
 
 org.apache.accumulo.server.client.HdfsZooInstance.init(HdfsZooInstance.java:62)
  at
 
 org.apache.accumulo.server.client.HdfsZooInstance.getInstance(HdfsZooInstance.java:70)
  at org.apache.accumulo.server.Accumulo.init(Accumulo.java:132)
  at
 org.apache.accumulo.server.master.Master.init(Master.java:534)
  at
 org.apache.accumulo.server.master.Master.main(Master.java:2190)
  ... 6 more
  Caused by: java.security.AccessControlException: access denied
  (java.lang.RuntimePermission getenv.HADOOP_JAAS_DEBUG)
  at
 
 java.security.AccessControlContext.checkPermission(AccessControlContext.java:366)
  at
 
 java.security.AccessController.checkPermission(AccessController.java:560)
  at
  java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
  at java.lang.System.getenv(System.java:883)
  at
 
 org.apache.hadoop.security.UserGroupInformation$HadoopConfiguration.clinit(UserGroupInformation.java:392)





Cancelling queued compactions in Accumulo 1.4

2013-05-15 Thread Mike Drob
Somebody (totally not me) accidentally kicked off a full table compaction
using Accumulo 1.4.3.

There's a large number of them waiting and the queue is decreasing very
slowly - what are my options for improving the situation. Ideally, I would
be able to just cancel everything and then come back with a more precise
approach.

Thanks,
Mike


Re: Cancelling queued compactions in Accumulo 1.4

2013-05-15 Thread Mike Drob
Can I leave the ones that are already running and just dispose of the
queued compactions? If not, that seems like a pretty serious limitation.


On Wed, May 15, 2013 at 2:51 AM, John Vines vi...@apache.org wrote:

 I'm not sure if it's possible. Scheduling a compaction is an entry in the
 metadata table. But once it gets triggered, there are then compactions
 scheduled locally for the tserver. You might be able to delete the flag and
 bounce all the tservers to stop it, but I can't say for certain.

 Sent from my phone, please pardon the typos and brevity.
 On May 14, 2013 11:48 PM, Mike Drob md...@mdrob.com wrote:

 Somebody (totally not me) accidentally kicked off a full table compaction
 using Accumulo 1.4.3.

 There's a large number of them waiting and the queue is decreasing very
 slowly - what are my options for improving the situation. Ideally, I would
 be able to just cancel everything and then come back with a more precise
 approach.

 Thanks,
 Mike




Re: Cancelling queued compactions in Accumulo 1.4

2013-05-15 Thread Mike Drob
Some progress on this issue -

If I stop the master then I can delete the fate transaction from zookeeper.
First I used accumulo org.apache.accumulo.server.fate.Admin print | grep
CompactRange to find the transactions and then accumulo o.a.a.s.f.Admin
delete id to delete it. Started the master back up, manually peeked in
zookeeper, and the transaction was gone. That said, looking at the monitor
page there are still all of my compactions queued up, so I don't think that
actually did anything. Is there another place that I need to look?

I saw that zk has /accumulo/id/tables/tid/compact-id entry, but I don't
know how that relates.

Mike



On Wed, May 15, 2013 at 2:56 AM, John Vines vi...@apache.org wrote:

 I do not believe there is a way to tell a tserver to cancel all
 compactions. It would be a nice feature though. Mind putting on a ticket?

 Sorry for the dupe mike, missed hitting reply all

 Sent from my phone, please pardon the typos and brevity.
 On May 14, 2013 11:54 PM, Mike Drob md...@mdrob.com wrote:

 Can I leave the ones that are already running and just dispose of the
 queued compactions? If not, that seems like a pretty serious limitation.


 On Wed, May 15, 2013 at 2:51 AM, John Vines vi...@apache.org wrote:

 I'm not sure if it's possible. Scheduling a compaction is an entry in
 the metadata table. But once it gets triggered, there are then compactions
 scheduled locally for the tserver. You might be able to delete the flag and
 bounce all the tservers to stop it, but I can't say for certain.

 Sent from my phone, please pardon the typos and brevity.
 On May 14, 2013 11:48 PM, Mike Drob md...@mdrob.com wrote:

 Somebody (totally not me) accidentally kicked off a full table
 compaction using Accumulo 1.4.3.

 There's a large number of them waiting and the queue is decreasing very
 slowly - what are my options for improving the situation. Ideally, I would
 be able to just cancel everything and then come back with a more precise
 approach.

 Thanks,
 Mike





Re: [VOTE] 1.5.0-RC2

2013-05-10 Thread Mike Drob
I noticed that ACCUMULO-970 still has 8 open issues. I would like to see
those all resolved before 1.5 is actually released.


On Thu, May 9, 2013 at 5:36 PM, Keith Turner ke...@deenlo.com wrote:




 On Thu, May 9, 2013 at 5:23 PM, Christopher ctubb...@apache.org wrote:

 Keith, I assume you mean the docs/apidocs directory is missing? Or did
 you mean the javadoc jars (which were intentionally omitted)?


 I was refering to docs/apidocs.  The documentation available through the
 monitor references this.



 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii


 On Thu, May 9, 2013 at 4:05 PM, Keith Turner ke...@deenlo.com wrote:
 
 
 
  On Thu, May 9, 2013 at 3:42 PM, Christopher ctubb...@apache.org
 wrote:
 
  On Thu, May 9, 2013 at 2:34 PM, Keith Turner ke...@deenlo.com wrote:
   Are you thinking of maunually renaming the tar, rpm, and debs,
 replacing
   accumulo-assemble w/ accumulo, when these are pushed out to mirrors?
   For
   the tar this would require untar, rename, tar and recomputing the
 sigs
   and
   hashes.
 
  I was thinking about renaming the RPM and DEBs to conform to their
  respective naming conventions, but I see no reason to change the
  tarball names, or contents. No recalculations of sigs or hashes would
  be necessary for just a filename change.
 
 
 
  When I untar accumulo-assemble-1.5.0-bin.tar.gz and end up with a dir
 named
  accumulo-assemble-1.5.0, I find that really screwy.  I understand how
 this
  came to be.  But the name does not make sense from the perspective of an
  outsider.  I would be happy reroll this tar ball with a dir name of
  accumulo-1.5.0.  That would change the tars contents and require
 resigning.
  I can do this, post it, and we can include that in the vote.
 
  Also, the bin.tar.gz does not include the javadocs.
 
  Voting -1 based on the javadocs.
 
 
  If there is an artifact that should be built in a different way, with
  a different naming convention, please let me know, and I'll make Maven
  do it (though I think the docs current specify the names as they are
  right now).
 
   On Wed, May 8, 2013 at 8:31 PM, Christopher ctubb...@apache.org
 wrote:
  
   1.5.0-RC2 for review. Might as well vote, also, as it's easily
   recalled if it's not up to par.
  
  
  
 https://repository.apache.org/content/repositories/orgapacheaccumulo-024/
  
   --
   Christopher L Tubbs II
   http://gravatar.com/ctubbsii
  
  
  
   -- Forwarded message --
   From: Nexus Repository Manager ne...@repository.apache.org
   Date: Wed, May 8, 2013 at 8:26 PM
   Subject: Nexus: Staging Completed.
   To: Christopher Tubbs ctubb...@gmail.com
  
  
   Description:
  
   1.5.0-RC2
  
   Details:
  
   The following artifacts have been staged to the
   org.apache.accumulo-024 (u:ctubbsii, a:173.66.3.39) repository.
  
   archetype-catalog.xml
   accumulo-1.5.0-source-release.zip
   accumulo-1.5.0-source-release.tar.gz.asc
   accumulo-1.5.0.pom
   accumulo-1.5.0-site.xml
   accumulo-1.5.0.pom.asc
   accumulo-1.5.0-source-release.zip.asc
   accumulo-1.5.0-source-release.tar.gz
   accumulo-1.5.0-site.xml.asc
   accumulo-examples-1.5.0.pom.asc
   accumulo-examples-1.5.0.pom
   accumulo-core-1.5.0.pom.asc
   accumulo-core-1.5.0-javadoc.jar
   accumulo-core-1.5.0-sources.jar
   accumulo-core-1.5.0-javadoc.jar.asc
   accumulo-core-1.5.0.pom
   accumulo-core-1.5.0.jar
   accumulo-core-1.5.0-sources.jar.asc
   accumulo-core-1.5.0.jar.asc
   accumulo-examples-simple-1.5.0.jar
   accumulo-examples-simple-1.5.0.jar.asc
   accumulo-examples-simple-1.5.0-javadoc.jar.asc
   accumulo-examples-simple-1.5.0.pom.asc
   accumulo-examples-simple-1.5.0-sources.jar
   accumulo-examples-simple-1.5.0-javadoc.jar
   accumulo-examples-simple-1.5.0-sources.jar.asc
   accumulo-examples-simple-1.5.0.pom
   accumulo-test-1.5.0-sources.jar.asc
   accumulo-test-1.5.0.pom
   accumulo-test-1.5.0.jar.asc
   accumulo-test-1.5.0.pom.asc
   accumulo-test-1.5.0-javadoc.jar.asc
   accumulo-test-1.5.0-sources.jar
   accumulo-test-1.5.0.jar
   accumulo-test-1.5.0-javadoc.jar
   accumulo-assemble-1.5.0.pom
   accumulo-assemble-1.5.0-test.deb
   accumulo-assemble-1.5.0-test.deb.asc
   accumulo-assemble-1.5.0-native.deb
   accumulo-assemble-1.5.0-bin.rpm.asc
   accumulo-assemble-1.5.0-bin.tar.gz.asc
   accumulo-assemble-1.5.0-bin.deb
   accumulo-assemble-1.5.0.pom.asc
   accumulo-assemble-1.5.0-bin.deb.asc
   accumulo-assemble-1.5.0-bin.rpm
   accumulo-assemble-1.5.0-native.deb.asc
   accumulo-assemble-1.5.0-bin.tar.gz
   accumulo-assemble-1.5.0-native.rpm.asc
   accumulo-assemble-1.5.0-native.rpm
   accumulo-proxy-1.5.0-javadoc.jar.asc
   accumulo-proxy-1.5.0-sources.jar
   accumulo-proxy-1.5.0.pom.asc
   accumulo-proxy-1.5.0-javadoc.jar
   accumulo-proxy-1.5.0.jar.asc
   accumulo-proxy-1.5.0.jar
   accumulo-proxy-1.5.0-sources.jar.asc
   accumulo-proxy-1.5.0.pom
   accumulo-trace-1.5.0.jar.asc
   accumulo-trace-1.5.0-javadoc.jar
   accumulo-trace-1.5.0.pom.asc
   accumulo-trace-1.5.0-sources.jar.asc
 

Re: Using supervisor to monitor Accumulo

2013-05-08 Thread Mike Drob
I've seen people use puppet to achieve the same goal with reasonable
amounts of success.


On Wed, May 8, 2013 at 6:33 PM, Phil Eberhardt p...@sqrrl.com wrote:

 Hello,

 I was looking into using supervisor (http://supervisord.org/index.html)
 to monitor a daemon running on top of Accumulo. I heard that Jason Trost
 may have mentioned using supervisor to monitor Accumulo and restart it if
 it stopped running in a presentation at Hadoop World. I was wondering if
 anyone was monitoring the Accumulo daemon and restarted it successfully
 using supervisor so I could do something similar.

 Thanks,

 Phil Eberhardt



Re: 807 tablets, for the same table, on one tserver?

2013-04-11 Thread Mike Drob
Grep the master logs for balance usually gives some clue.
On Apr 11, 2013 7:45 AM, David Medinets david.medin...@gmail.com wrote:

 From behaviour that I've witnessed before, on v1.4.1, Accumulo spreads
 tablets across the cluster. However, this morning I am seeing 807 tablets
 for the same table on one tserver which was unexpected. What affects the
 movement of tablets? Or perhaps more importantly, what might prevent the
 movement?



Re: Custom Iterators - behavior when switching tablets

2013-01-22 Thread Mike Drob
David,

This doesn't answer your design questions, but it might help shed some
light on how to properly handle losing the sort order. Brian did a lot of
work on this in https://issues.apache.org/jira/browse/ACCUMULO-956 so I
highly recommend looking there and comparing to what you've developed.

It's a tricky problem, but I think it is good to get your insights and
experience from it added to the collective knowledge.

Mike


On Tue, Jan 22, 2013 at 11:55 AM, Slater, David M.
david.sla...@jhuapl.eduwrote:

 In designing some of my own custom iterators, I was noticing some
 interesting behavior. Note: my iterator does not return the original key,
 but instead returns a computed value that is not necessarily in
 lexicographic order.

 ** **

 So far as I can tell, when the Scanner switches between tablets, it checks
 the key that is returned in the new tablet and compares it (I think it
 compares key.row()) with the last key from the previous tablet. If the new
 key is greater than the previous one, then it proceeds normally. If,
 however, the new key is less than or equal to the previous key, then the
 Scanner does not return the value. It does, however, continue to iterate
 through the tablet, continuing to compare until it finds a key greater than
 the last one. Once it finds one, however, it progresses through the rest of
 that tablet without doing a check. (It implicitly assumes that everything
 in a tablet will be correctly ordered). 

 ** **

 Now if I was to return the original key, it would work fine (since it
 would always be in order), but that also limits the functionality of my
 custom iterator. 

 ** **

 My primary question is: why would it be designed this way? When switching
 between tablets, are there potential problems that might crop up if this
 check isn’t done?

 ** **

 Thanks,
 David



Re: Satisfying Zookeper dependency when installing Accumulo in CentOS

2012-12-19 Thread Mike Drob
RPM is looking for a zookeeper package on the system to satisfy the
automatic dependency management. The installation instructions you linked
to for ZK seem to imply using a downloaded tar.

If that's the case then you'll need to either find a ZK RPM, install
Accumulo using a tar, or install Accumulo via RPM using the --nodeps option.


On Wed, Dec 19, 2012 at 5:03 PM, Kevin Pauli ke...@thepaulis.com wrote:

 I'm trying to install Accumulo in CentOS.  I have installed the jdk and
 hadoop, but can't seem to make Accumulo install happy wrt zookeeper.

 I installed Zookeper according to the instructions here:
 http://zookeeper.apache.org/doc/r3.4.5/zookeeperStarted.html#sc_InstallingSingleMode

 And Zookeeper is running:

 $ sudo bin/zkServer.sh start
 JMX enabled by default
 Using config: /usr/lib/zookeeper-3.4.5/bin/../conf/zoo.cfg
 Starting zookeeper ... STARTED

 But when trying to install Accumulo, this is what I get:

 $ sudo rpm -ivh Downloads/accumulo-1.4.2-1.amd64.rpm
 error: Failed dependencies:
 zookeeper is needed by accumulo-1.4.2-1.amd64

 --
 Regards,
 Kevin Pauli



Re: Satisfying Zookeper dependency when installing Accumulo in CentOS

2012-12-19 Thread Mike Drob
Default install is under /opt/accumulo

If locate doesn't find something, you can also try updatedb


On Wed, Dec 19, 2012 at 5:33 PM, Kevin Pauli ke...@thepaulis.com wrote:

 I was worried about forcing the rpm installation with --nodeps b/c I
 wasn't sure if there was some kind of linkage that would be formed from
 accumulo to the zookeeper package, which, due to zookeeper not being a true
 package, would cause accumulo to fail at runtime.

 But, based on your advice, I went ahead and installed Accumulo via rpm
 with the --nodeps option.  It completed without errors, and I was about to
 proceed with the next step of modifying conf/accumulo-env.sh I can't seem
 to find where it is!  locate accumulo-env.sh is resulting in no hits.
  Where would the rpm installation have put Accumulo?

 On Wed, Dec 19, 2012 at 4:10 PM, Mike Drob md...@mdrob.com wrote:

 RPM is looking for a zookeeper package on the system to satisfy the
 automatic dependency management. The installation instructions you linked
 to for ZK seem to imply using a downloaded tar.

 If that's the case then you'll need to either find a ZK RPM, install
 Accumulo using a tar, or install Accumulo via RPM using the --nodeps option.


 On Wed, Dec 19, 2012 at 5:03 PM, Kevin Pauli ke...@thepaulis.com wrote:

 I'm trying to install Accumulo in CentOS.  I have installed the jdk and
 hadoop, but can't seem to make Accumulo install happy wrt zookeeper.

 I installed Zookeper according to the instructions here:
 http://zookeeper.apache.org/doc/r3.4.5/zookeeperStarted.html#sc_InstallingSingleMode

 And Zookeeper is running:

 $ sudo bin/zkServer.sh start
 JMX enabled by default
 Using config: /usr/lib/zookeeper-3.4.5/bin/../conf/zoo.cfg
 Starting zookeeper ... STARTED

 But when trying to install Accumulo, this is what I get:

 $ sudo rpm -ivh Downloads/accumulo-1.4.2-1.amd64.rpm
 error: Failed dependencies:
 zookeeper is needed by accumulo-1.4.2-1.amd64

 --
 Regards,
 Kevin Pauli





 --
 Regards,
 Kevin Pauli



Re: Authentication - Kerberose

2012-11-01 Thread Mike Drob
There are a couple tickets that involve making Accumulo and Kerberos play
nice -

https://issues.apache.org/jira/browse/ACCUMULO-404 was to get accumulo
running on a kerberized HDFS
https://issues.apache.org/jira/browse/ACCUMULO-259 is for potentially
delegating the authentications to an external system (i.e. KRB)

It looks like it was planned for 1.5, but John can probably chime in and
let us know of the status.

Mike

On Thu, Nov 1, 2012 at 2:20 PM, Michael Peterson 
mike.peter...@ptech-llc.com wrote:

  To whom it may concern:

 Can you please provide a schedule of if/when Accumulo will use Kerberose
 for authentication? I’m working with multiple customers that are collecting
 data to make decisions to use Accumulo or other technologies. The absence
 of strong authentication with Accumulo is a major concern (and less
 subjective). 

 ** **

 Also, is there a POC for new features that will be available in 1.5?

 Thanks,

 Mike Peterson, Owner

 Peterson Technologies, LLC

 240-456-0094, ext 111

 240-456-0096 fax

 410-218-4004 cell

 Certified 8(a), SDVOSB, MBE/A

 ** **



Re: Accumulo and Java 7

2012-08-29 Thread Mike Drob
Turns out I had an errant security policy in place, thanks all!

On Tue, Aug 28, 2012 at 7:50 PM, Eric Newton eric.new...@gmail.com wrote:

 Check that the memory configuration you are using is appropriate for your
 system.  The master/monitor are relatively small processes in 1.4.

 Make sure the write-ahead log directory exists on all nodes.

 Be sure to check the .err/.out files.

 If you don't have .err/.out files, double check your ssh configuration.

 -Eric

 On Tue, Aug 28, 2012 at 7:41 PM, Gabe Bell christiang...@gmail.comwrote:

 I have Accumulo 1.5 HEAD running on JDK 1.7 on Ubuntu 12.04 x64. It runs
 fine

 On Aug 28, 2012, at 7:08 PM, Mike Drob md...@mdrob.com wrote:

  Does anybody have experience with running Accumulo on top of Java 7?
 The mailing list archives show that David Medinets tried compiling 1.3.5 on
 the openjdk implementation back in December, but it doesn't look like there
 was much follow up on it.
 
  When I'm trying to use the 1.4.1 dist tarball on CDH3, my gc and tracer
 start fine but the master and monitor silently fail. I haven't yet tried to
 fire up tablet servers. All logs are painfully bare.
 
  Any ideas from the wisdom of the internet?
 
  Mike





Accumulo and Java 7

2012-08-28 Thread Mike Drob
Does anybody have experience with running Accumulo on top of Java 7? The
mailing list archives show that David Medinets tried compiling 1.3.5 on the
openjdk implementation back in December, but it doesn't look like there was
much follow up on it.

When I'm trying to use the 1.4.1 dist tarball on CDH3, my gc and tracer
start fine but the master and monitor silently fail. I haven't yet tried to
fire up tablet servers. All logs are painfully bare.

Any ideas from the wisdom of the internet?

Mike