Re: Merging smaller/empty tablets [SEC=UNOFFICIAL]
I can't find this in the docs, but IIRC the merge command can take a start/end range for what to merge. So the best option might be to try it on a smaller slice and see what happens. At a guess, queries won't block but indexing will. Mike On Mon, Jan 16, 2017 at 5:23 PM, Dickson, Matt MR < matt.dick...@defence.gov.au> wrote: > *UNOFFICIAL* > That looks like a great option. Before using it, whats the cost/impact > of running this on a massive table in a system with other large bulk > ingests/queries running? In the past when I have used that (which was in > 2013 so things may have changed) all ingests were blocked and it took days > to complete. > > With 1.07T tablets to work on this may take some time? > > > ------ > *From:* Mike Drob [mailto:md...@mdrob.com] > *Sent:* Tuesday, 17 January 2017 09:37 > *To:* user@accumulo.apache.org > *Subject:* Re: Merging smaller/empty tablets [SEC=UNOFFICIAL] > > http://accumulo.apache.org/1.8/accumulo_user_manual.html#_merging_tablets > > In order to merge small tablets, you can ask Accumulo to merge sections of > a table smaller than a given size. > > root@myinstance> merge -t myTable -s 100M > > > > On Mon, Jan 16, 2017 at 4:31 PM, Dickson, Matt MR < > matt.dick...@defence.gov.au> wrote: > >> *UNOFFICIAL* >> I have a table that has evolved to have 1.07T tablets and I fairly >> confident a large portion of these are now empty or very small. I'd like >> to merge smaller tablets and delete empty tablets, is there a smart way to >> do this? >> >> My thought was to query the metadata table for all tablets under a >> certain size for the table and then merge these tablets. >> >> Is the first number in thevalue the size of the tablet, ie >> >> > scan -b 1xk -e 1xk\xff -c file >> 1xk;34234 file:hdfs//name/accumulo/tables/1xk/t-er23423/M423432.rf [] >> *213134*,234234 >> >> Also, are there any side effects of this that I need to be aware of when >> doing this on a massive table? >> >> Thanks in advance, >> Matt >> > >
Re: Merging smaller/empty tablets [SEC=UNOFFICIAL]
http://accumulo.apache.org/1.8/accumulo_user_manual.html#_merging_tablets In order to merge small tablets, you can ask Accumulo to merge sections of a table smaller than a given size. root@myinstance> merge -t myTable -s 100M On Mon, Jan 16, 2017 at 4:31 PM, Dickson, Matt MR < matt.dick...@defence.gov.au> wrote: > *UNOFFICIAL* > I have a table that has evolved to have 1.07T tablets and I fairly > confident a large portion of these are now empty or very small. I'd like > to merge smaller tablets and delete empty tablets, is there a smart way to > do this? > > My thought was to query the metadata table for all tablets under a > certain size for the table and then merge these tablets. > > Is the first number in thevalue the size of the tablet, ie > > > scan -b 1xk -e 1xk\xff -c file > 1xk;34234 file:hdfs//name/accumulo/tables/1xk/t-er23423/M423432.rf [] > *213134*,234234 > > Also, are there any side effects of this that I need to be aware of when > doing this on a massive table? > > Thanks in advance, > Matt >
Re: [ANNOUNCE] Apache Accumulo 1.7.2 Released
Whoops, meant to say that we are proud to announce the release of Accumulo version 1.7.2! On Thu, Jun 23, 2016 at 10:47 AM, Mike Drob <md...@apache.org> wrote: > The Accumulo team is proud to announce the release of Accumulo version > 1.7.1! > > This release contains over 30 bugfixes and improvements over 1.7.1, and is > backwards-compatible with 1.7.0 and 1.7.1. Existing users of 1.7.1 are > encouraged to > upgrade immediately. > > This version is now available in Maven Central, and at: > https://accumulo.apache.org/downloads/ > > The full release notes can be viewed at: > https://accumulo.apache.org/release_notes/1.7.2.html > > The Apache Accumulo™ sorted, distributed key/value store is a robust, > scalable, high performance data storage system that features cell-based > access control and customizable server-side processing. It is based on > Google's BigTable design and is built on top of Apache Hadoop, Apache > ZooKeeper, and Apache Thrift. > > -- > The Apache Accumulo Team >
Re: Accumulo GC and Hadoop trash settings
If something goes wrong (i.e. somebody accidentally issues a big delete), then having the Trash around makes recovery plausible. On Mon, Aug 17, 2015 at 2:57 PM, James Hughes jn...@virginia.edu wrote: Hi all, From reading about the Accumulo GC, it sounds like temporary files are routinely deleted during GC cycles. In a small testing environment, I've the HDFS Accumulo user's .Trash folder have 10s of gigabytes of data. Is there any reason that the default value for gc.trash.ignore is false? Is there any downside to deleting GC'ed files completely? Thanks in advance, Jim http://accumulo.apache.org/1.6/accumulo_user_manual.html#_gc_trash_ignore
Re: TSDB on Accumulo row key
Our very own Eric Newton has a port of OpenTSDB running on Accumulo, might be what you're looking for. https://github.com/ericnewton/accumulo-opentsdb On Mon, Jul 20, 2015 at 5:25 PM, Ranjan Sen ranjan_...@hotmail.com wrote: Hi All, Is there something like TSDB (Time series database) on Accumulo? Thanks Ranjan
Re: How to generate UUID in real time environment for Accumulo
This sounds super close to a type 1 UUID - https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_1_.28MAC_address_.26_date-time.29 On Tue, Jun 23, 2015 at 8:14 AM, Keith Turner ke...@deenlo.com wrote: Would something like the following work? row=time_client id_client counter Where the client id is a unique id per client instance, it would be allocated once using Zookeeper or an Accumulo Conditional writer when the client starts. The client counter would be an AtomicLong in the client. On Tue, Jun 23, 2015 at 8:08 AM, mohit.kaushik mohit.kaus...@orkash.com wrote: Hi All, I have an application which can index data at very high rate from multiple clients. I need to generate a unique id to store documents. It Should (1) use the current system time in millies. (2) it should be designed to sort lexicographically on the bases of time. (3) if I just store the currentTimeInMillies than i can just index 1000 unique docs per sec. It should be able to generate millions of UUID's per sec. I am searching for the best possible approach to implement, any help? Regards * Mohit Kaushik* Software Engineer A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India *Tel:* +91 (124) 4969352 | *Fax:* +91 (124) 4033553 http://politicomapper.orkash.cominteractive social intelligence at work... https://www.facebook.com/Orkash2012 http://www.linkedin.com/company/orkash-services-private-limited https://twitter.com/Orkash http://www.orkash.com/blog/ http://www.orkash.com http://www.orkash.com ... ensuring Assurance in complexity and uncertainty *This message including the attachments, if any, is a confidential business communication. If you are not the intended recipient it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information in this e-mail. If you have received it in error or are not the intended recipient, please destroy it and notify the sender immediately. Thank you *
Re: CHANGES files
Why not ask committers add a line to the CHANGES file about the change when committing? Good place to highlight contributors too. Instead of an auto-generated one or sticking the RM with it, we build it up over the course of development. Individual subtasks could be ignored, larger tasks could be included by discretion of the author. Ex: ACCUMULO-21224 Add more metrics to the monitor. (Jim Contributor via Mike Drob) On Wed, Jun 10, 2015 at 1:43 PM, Keith Turner ke...@deenlo.com wrote: On Wed, Jun 10, 2015 at 2:32 PM, Christopher ctubb...@apache.org wrote: Okay Accumulators, I have a minor rant about the CHANGES files in Accumulo, and I want to get feedback on this file from the user@ and dev@ lists. The summary is: * I think this CHANGES file is nearly worthless, and a release manager shouldn't have to bother with it. We should just delete it. +1 We could drop the file from releases and have a link to a jira query in the release notes on the web site. The justification is: * The CHANGES file is tedious to prepare (requires manual copy/paste from JIRA, after clicking the right buttons in the right order). * We now have release notes which compliment the full JIRA search and git history, to highlight particular changes, which is far more useful. * The file is just so big and contains material of questionable utility (do we really need to enumerate all sub-tasks for each issue, especially when they aren't even grouped with the parent issue?) * It's very easy for the CHANGES file to be wrong, by either including a JIRA issue which was incorrectly marked, or by omitting an issue which was inadvertently left open. The release manager can triage these things, but that's a lot of extra work, and it doesn't seem to matter whether it is actually wrong or not (it has been wrong in the past, and nobody has ever voiced a complaint or indicated any concern at all). * The CHANGES file is ugly. It follows no markup standard to render it in a presentable way (Markdown, APT, asciidoc, etc.). Any prettification must be done manually. * Issue numbers and subject lines rarely convey adequate information to satisfy curious readers wishing to inform themselves of what changed. Looking at the actual JIRA issues is necessary to do that, and these links are not clickable. * Because it is generated from the fixVersion in JIRA, it's often the case that we must omit useful fixVersions from JIRA in order to avoid confusing inclusions in the CHANGES file (like the JIRA pertaining to the release itself). And sometimes people add/remove the wrong fixVersion. We can fix this later when we discover it, but it's usually too late for the CHANGES file already bundled in a release. * Updating the CHANGES file creates unnecessary commits which are tedious and painful to merge forward (and usually risky, because it would involve -sours type merges) and pollute the git history without much use. * The convention for a CHANGES file seems to be born of an era prior to ubiquitous version control, and I don't think having one is required in any way. Sure, we could automate generating this file (maybe?), which would alleviate some of the burden. However, many of these problems would still exist, and in the end, I'm not really sure what the benefits are. It doesn't seem to be that useful, and especially not compared to the amount of work it takes to maintain it. Instead of deleting it, we could leave it in place with a generic comment referring the user to JIRA and git. But, even that seems to be unnecessary (these resources are already prominently linked on the Accumulo site and in the project pom.xml in the official source release, and it is already well understood that a project is going to have an SCM history and an issue tracker). But, what do you think? Is this file really useful to anybody? Does its utility outweigh the burden it places on release managers, which can slow down and complicate the release process? -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: admin and web dashboard
Also might need to run a 'flush -t $table -w' On Mon, Jun 8, 2015 at 1:39 PM, Josh Elser josh.el...@gmail.com wrote: Since 1.5, All of Accumulo files are stored in HDFS: RFiles and WALs. Tables have the name you provide, but also maintain an internal unique ID to make operations like renaming easy. You can see this mapping via `tables -l` in the Accumulo shell. Given the ID for a table, you should be able to find all rfiles for a table /accumulo/tables/$id/**/*.rf. If you don't see any rfiles there, run a `compact -t $table -w` and then check HDFS again. z11373 wrote: That makes sense, thanks Josh! Btw, where can I find the .rf files? I looked at under Accumulo install folder and also /tmp, and couldn't find them. I also look at hdfs, and only found the folder, i.e. /accumulo/tables/n/default_tablet (where 'n' is a number), and no files under that hdfs dir. I want to try the command 'accumulo rfile-info' you mentioned earlier. Thanks again, zainal -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/admin-and-web-dashboard-tp14347p14351.html Sent from the Users mailing list archive at Nabble.com.
Re: Possible information leak
Value will contain whatever the user provided on the command line, so printing it back out to them shouldn't result in exposing something secret. On Fri, May 8, 2015 at 12:29 PM, Rodrigo Andrade rodrigo...@gmail.com wrote: Hi, In this commit: https://github.com/apache/accumulo/commit/27d79c2651277c465a497c68ec238771692a6fa0 Does value contain private information? Regards, Rodrigo
Re: Unassigned, but not offline, tablets
What version? Could be https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/troubleshooting.txt#L314 On Tue, May 5, 2015 at 8:54 AM, Bill Slacum wsla...@gmail.com wrote: After a catasrophic failure, the Master Server section of the monitor = will report that there are 16 unassigned tablets (out of thousands), but = no table shows any offline tablets.=20 There were corrup files under the recovery directory. These were = removed. Otherwise, things seem fine with the cluster (we are having ingest = processes hang, which may or may not be related). What should I do, as an operator, when Accumulo is in this state? I have no logs provide, unfortunately.
Re: Q4A Project
Andrew, This is a cool thing to work on, I hope you have great success! A couple of questions about the motivations behind this, if you don't mind - - There are several SQL implementations already in the Hadoop ecosystem. In what ways do you expect this to improve upon Hive/Impala/Phoenix/Presto/Spark SQL? I haven't looked at the code, so it is quite possible you're already using one of those technologies. - In a conversation with some HP engineers earlier this year, they mentioned that building a SQL-92 layer is the easy part, and that a mature optimization engine is the really hard part. This is where Oracle may still be leaps and bounds ahead of its nearest competitors. Do you have plans for a query planner? If not, you might be back to writing MapReduce jobs sooner than you think. Look forward to seeing more! Mike On Mon, Apr 27, 2015 at 7:37 PM, Andrew Wells awe...@clearedgeit.com wrote: I have been working on a project, tentatively called Q4A (Query for Accumulo). Another possible name is ASQ (Accumulo Streaming Query) [discus]. This is a streaming query as the query is completed via a stream, should never group data in memory. To batch, intermediate results would be written back to Accumulo temporarily. The *primary goal* is to have a complete SQL implementation native to Accumulo. *Why do this?* I am getting tired of writing bad java code to query a database. I would rather write bad SQL code. Also, people should be able to get queries out faster and it shouldn't take a developer. *Native To Accumulo*: - There should be no special format to read a database created by Q4A - There should be no special format for Q4A to query a table - All tables are tables available to Q4A - Any special tables, are stored away from the users databases (indexes, column definitions, etc) *Other Goals*: - Implement the entire SQL definition (currently all of SQLite) - Create JDBC Driver/Server - Push down Expressions to the Tablet Servers - Install-less queries, use Q4A jar directly against any Accumulo Cluster ( less push-down expressions) - documentation :o - testing ;) *Does it work?* Not yet, the project is still a work in progress. and I will be working on it at the Accumulo Summit this year. Progress is slow as I am getting married in about a month and some change. *Questions:* If you have questions about Q4A as here, I will be at the Accumulo Summit @ ClearEdgeIT Table and Hackathon. *WHERE IS TEH LINK?!1!* Oh here: https://github.com/agwells0714/q4a -- *Andrew George Wells* *Software Engineer* *awe...@clearedgeit.com awe...@clearedgeit.com*
Re: Approach to hold the output of an iterator in memory to do further operations
Check out the MinCombiner https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/MinCombiner.java On Mon, Apr 27, 2015 at 12:19 PM, vaibhav thapliyal vaibhav.thapliyal...@gmail.com wrote: Hello everyone. I am trying to carry out max and min kind of operations using accumulo. But since the Accumulo iterators only operate on the entries that are lovally hosted I get the local max and local min of the instead of a global max and min. To get this global max and min, I have to calculate this client side. I want to ask if there is some way to store this local max and min in memory using iterator. So that a global max and min can be calculated server side only. I tried to this by writing the result in another table and using another iterator to return me the global max and min. I want to ask if there if a way to store this in memory so as to avoid writing it to a table? Thanks Vaibhav
Re: Accumulo 1.6.2 with Hadoop 2.2.0 Installation issues
Can you verify that once the processes started, they stayed up? ps -C java -fww | grep accumulo Also check your log directory for .err files On Thu, Mar 12, 2015 at 9:53 AM, Madabhattula Rajesh Kumar mrajaf...@gmail.com wrote: Hi Team, I'm not able to login into the accumlo shell. It is giving There are no tablet servers: check that zookeeper and accumulo are running. Could you please help me how to resolve this issue. *rajesh@rajesh-VirtualBox:~/accumulo-1.6.2$ ./bin/start-all.sh * Starting monitor on localhost WARN : Max open files on localhost is 1024, recommend 32768 Starting tablet servers done Starting tablet server on localhost WARN : Max open files on localhost is 1024, recommend 32768 OpenJDK 64-Bit Server VM warning: You have loaded library /home/rajesh/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c libfile', or link it with '-z noexecstack'. 2015-03-12 18:30:31,722 [util.NativeCodeLoader] WARN : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-03-12 18:30:35,779 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss 2015-03-12 18:30:35,791 [server.Accumulo] INFO : Attempting to talk to zookeeper 2015-03-12 18:30:36,036 [server.Accumulo] INFO : ZooKeeper connected and initialized, attempting to talk to HDFS 2015-03-12 18:30:36,328 [server.Accumulo] INFO : Connected to HDFS Starting master on localhost WARN : Max open files on localhost is 1024, recommend 32768 Starting garbage collector on localhost WARN : Max open files on localhost is 1024, recommend 32768 Starting tracer on localhost WARN : Max open files on localhost is 1024, recommend 32768 *rajesh@rajesh-VirtualBox:~/accumulo-1.6.2$ ./bin/accumulo shell -u root* OpenJDK 64-Bit Server VM warning: You have loaded library /home/rajesh/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c libfile', or link it with '-z noexecstack'. 2015-03-12 18:32:43,567 [util.NativeCodeLoader] WARN : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Password: ** 2015-03-12 18:32:52,533 [impl.ServerClient] WARN : There are no tablet servers: check that zookeeper and accumulo are running. Regards, Rajesh
Re: Failed to connect to zookeeper within 2x ZK timeout period 30000
Can you verify that zookeeper is running and accepting connections? nc [zk-host] [zk-port] stat And see that it does not result in error. On Mon, Feb 2, 2015 at 2:58 PM, Wyatt Frelot wyatt.fre...@altamiracorp.com wrote: Good afternoon all, I just literally started having this problem on Friday. My code worked previously (1mo ago) but I came back to it and I have not been able to resolve this problem since I have started experiencing it. So, I am seeking guidance and assistance. I have a Vagrant cluster setup with the following environment: Hadoop 2.4.1, ZK 3.3.6, Accumulo 1.6.1, and Java 7 I am able to ping the zookeeper node and there appears to be nothing in the ZK logs nor the Accumulo logs…I am not sure where to go from here. This is the only error that I can find: Exception in thread main org.apache.accumulo.core.client.AccumuloException: java.lang.RuntimeException: Failed to connect to zookeeper (mnode) within 2x zookeeper timeout period 3 at org.apache.accumulo.core.client.impl.ServerClient.execute( ServerClient.java:67) at org.apache.accumulo.core.client.impl.ConnectorImpl.init( ConnectorImpl.java:70) at org.apache.accumulo.core.client.ZooKeeperInstance.getConnector( ZooKeeperInstance.java:240) at accumulo101.solutions.writing.TableAdministration.main( TableAdministration.java:66) Caused by: java.lang.RuntimeException: Failed to connect to zookeeper (mnode) within 2x zookeeper timeout period 3 at org.apache.accumulo.fate.zookeeper.ZooSession.connect( ZooSession.java:117) at org.apache.accumulo.fate.zookeeper.ZooSession.getSession( ZooSession.java:161) at org.apache.accumulo.fate.zookeeper.ZooReader.getSession( ZooReader.java:35) at org.apache.accumulo.fate.zookeeper.ZooReader.getZooKeeper( ZooReader.java:39) at org.apache.accumulo.fate.zookeeper.ZooCache.getZooKeeper( ZooCache.java:58) at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:150) at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:277) at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:224) at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID( ZooKeeperInstance.java:161) at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:38) at org.apache.accumulo.core.client.impl.ServerClient.getConnection( ServerClient.java:128) at org.apache.accumulo.core.client.impl.ServerClient.getConnection( ServerClient.java:118) at org.apache.accumulo.core.client.impl.ServerClient.getConnection( ServerClient.java:113) at org.apache.accumulo.core.client.impl.ServerClient.executeRaw( ServerClient.java:95) at org.apache.accumulo.core.client.impl.ServerClient.execute( ServerClient.java:61) ... 3 more
Re: why a error about replicated
Has this error come up before? Is there room for us to intercept that stack trace and provide a check that HDFS has space left message? This might be especially relevant after we;ve removed the hadoop info box on the monitor. On Thu, Jan 22, 2015 at 8:30 AM, Josh Elser josh.el...@gmail.com wrote: How much free space do you still have in HDFS? If hdfs doesn't have enough free space to make the file, I believe you'll see the car that you have outlined. The way we create the file will also end up requiring at least one GB with the default configuration. Also make sure to take into account any reserved percent of hdfs when considering the hdfs usage. On Jan 22, 2015 1:46 AM, Lu.Qin luq.j...@gmail.com wrote: Hi,I have a Accumulo clusters and it run 10 days ,but it show me many errors now. 2015-01-22 13:04:21,161 [hdfs.DFSClient] WARN : Error while syncing org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /accumulo/wal/+9997/226dce4f-4e14-4704-b811-532afe0b0fb3 could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1471) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2791) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:606) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apache.hadoop.ipc.Client.call(Client.java:1411) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy20.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy20.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:368) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1449) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1270) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526) I use hadoop fs to put a file into hadoop ,and it works good,and the file has 2 replicates.Why accumulo can not work ? And I see there are so many file only 0B in /accumulo/wal/***/,why? Thanks.
Re: TableOperations.setProperty() not setting property
Ara, There is sometimes propogation delay in setting the properties, since they have to go through zookeeper and then out to the tablet servers. Try waiting 30 or 60 seconds before checking, and see if that changes things. Mike On Thu, Jan 8, 2015 at 6:07 PM, Ara Ebrahimi ara.ebrah...@argyledata.com wrote: Hi, I’m trying to set a few properties for a table “programmatically” right after I create that table (using accumulo shell). I use TableOperations.setProperty(). But then when I use the config command from accumulo shell I don’t see anything reflected there. It still holds the old default/site values. If I pass invalid values it does fail, so seems like it actually receives the request and validates it. But it does’t seem to persist it. I’ve tried a few different properties and none seem to stick. Do I need to flush the config somehow? I assume I can set these properties right after creating table only. I mean, what happens if I set a property like table.split.threshold after populating the table? There’s no command for setting these properties from accumulo shell other than the option to copy config from another table when executing create table. Thanks, Ara. This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
Re: Recursive Import Directory
Ariel, There is not an easy way to do this recursively. Your best option is going to be writing your own wrapper around the import command. If you're using shell commands, this could be as easy as feeding the results of 'find . -type d' into a script, or in Java you might want to look at DirectoryWalker in Apache Commons as possible solutions. Mike On Tue, Nov 25, 2014 at 10:22 AM, Ariel Valentin ar...@arielvalentin.com wrote: Hello! We are running a couple of experiments using importDirectory and are curious if there is a simple way to import directories recursively. Based on looking at the source code it does not look like it currently supports that feature: ( https://github.com/apache/accumulo/blob/1835c27ca41426ddd570cde14f9612c45680b917/core/src/main/java/org/apache/accumulo/core/client/admin/TableOperationsImpl.java ) Are there plans to add it in the future? Or is there a simple way to do this right now? Thanks, Ariel Valentin e-mail: ar...@arielvalentin.com website: http://blog.arielvalentin.com skype: ariel.s.valentin twitter: arielvalentin linkedin: http://www.linkedin.com/profile/view?id=8996534 --- *simplicity *communication *feedback *courage *respect
Re: Recursive Import Directory
Name collision of failures and I think name collision of successes might cause problems sometimes too. Or maybe that's just with older versions. Regardless, having to write your own code puts it out of the realm of easy into at least middling territory - if import directory could natively handle recursion then it would become easy. On Tue, Nov 25, 2014 at 10:44 AM, Josh Elser josh.el...@gmail.com wrote: What's the difficulty, Mike? Handling name collision of failures? Mike Drob wrote: Ariel, There is not an easy way to do this recursively. Your best option is going to be writing your own wrapper around the import command. If you're using shell commands, this could be as easy as feeding the results of 'find . -type d' into a script, or in Java you might want to look at DirectoryWalker in Apache Commons as possible solutions. Mike On Tue, Nov 25, 2014 at 10:22 AM, Ariel Valentin ar...@arielvalentin.com mailto:ar...@arielvalentin.com wrote: Hello! We are running a couple of experiments using importDirectory and are curious if there is a simple way to import directories recursively. Based on looking at the source code it does not look like it currently supports that feature: (https://github.com/apache/accumulo/blob/ 1835c27ca41426ddd570cde14f9612c45680b917/core/src/main/java/ org/apache/accumulo/core/client/admin/TableOperationsImpl.java) Are there plans to add it in the future? Or is there a simple way to do this right now? Thanks, Ariel Valentin e-mail: ar...@arielvalentin.com mailto:ar...@arielvalentin.com website: http://blog.arielvalentin.com skype: ariel.s.valentin twitter: arielvalentin linkedin: http://www.linkedin.com/profile/view?id=8996534 --- *simplicity *communication *feedback *courage *respect
Re: comparing different rfile densities
I'm not sure how to quantify this and give you a way to verify, but in my experience you want to be producing rflies that load into a single tablet. Typically, this means number of reducers equal to the number of tablets in the table that you will be importing and perhaps a custom partitioner. I think your intuition is spot on, here. Of course, if that means that you have a bunch of tiny files, then maybe it's time to rethink your split strategy. On Tue, Nov 11, 2014 at 5:56 AM, Jeff Turner sjtsp2...@gmail.com wrote: is there a good way to compare the overall system effect of bulk loading different sets of rfiles that have the same data, but very different densities? i've been working on a way to re-feed a lot of data in to a table, and have started to believe that our default scheme for creating rfiles - mapred in to ~100-200 splits, sampled from 50k tablets - is actually pretty bad. subjectively, it feels like rfiles that span 300 or 400 tablets is bad in at least two ways for the tservers - until the files are compacted, all of the potential tservers have to check the file, right? and then, during compaction, do portions of that rfile get volleyed around the cloud until all tservers have grabbed their portion? (so, there's network overhead, repeatedly reading files and skipping most of the data, ...) if my new idea works, i will have a lot more control over the density of rfiles, and most of them will span just one or two tablets. so, is there a way to measure/simulate overall system benefit or cost of different approaches to building bulk-load data (destined for an established table, across N tservers, ...)? i guess that a related question would be are 1000 smaller and denser bulk files better than 100 larger bulk files produced under a typical getSplits() scheme? thanks, jeff
Re: Accumulo version at runtime?
Unfortunately, I don't think we have a way to do this. Are you trying to check for the existence of a particular feature, or what is your goal? On Thu, Oct 23, 2014 at 6:44 PM, Dylan Hutchison dhutc...@stevens.edu wrote: Easy question Accumulators: Is there an easy way to grab the version of a running Accumulo instance programmatically from Java code in a class that connects to the instance? Something like: Instance instance = new ZooKeeperInstance(instanceName,zookeeper_address); String version = instance.getInstanceVersion(); Thanks, Dylan -- www.cs.stevens.edu/~dhutchis
Re: Removing 'accumulo' from Zookeeper
Michael, These are great ZK instructions. Have you considered contributing them to the project upstream? We can converse about this off-list if you'd prefer, since it's not particularly germane to this topic. Mike On Thu, Oct 2, 2014 at 12:50 PM, Michael Allen mich...@sqrrl.com wrote: I cut and paste a little fast there at the end, so obviously no one outside of Sqrrl has the zk-digest.sh script. Here's that in all its gory detail: #!/bin/bash if [ -z ${ZOOKEEPER_HOME} ]; then echo Set \$ZOOKEEPER_HOME before running this script exit 4747 fi if [ -z ${JAVA_HOME} ]; then echo Set \$JAVA_HOME before running this script exit 4747 fi if [ $# -eq 0 ]; then echo usage: zk-digest.sh digest string echo echo Utility to produce authentication digests, such as you might see in ZooKeeper node ACL entries echo echo Example: zk-digest.sh sqrrl:secret exit 4747 fi ZK_CLASSPATH=\ ${ZOOKEEPER_HOME}/build/classes:\ ${ZOOKEEPER_HOME}/build/lib/*.jar:\ ${ZOOKEEPER_HOME}/lib/slf4j-log4j12-1.6.1.jar:\ ${ZOOKEEPER_HOME}/lib/slf4j-api-1.6.1.jar:\ ${ZOOKEEPER_HOME}/lib/netty-3.2.2.Final.jar:\ ${ZOOKEEPER_HOME}/lib/log4j-1.2.15.jar:\ ${ZOOKEEPER_HOME}/lib/jline-0.9.94.jar:\ ${ZOOKEEPER_HOME}/zookeeper-3.4.5.jar:\ ${ZOOKEEPER_HOME}/src/java/lib/*.jar:\ ${ZOOKEEPER_HOME}/conf\ ${JAVA_HOME}/bin/java -Dzookeeper.log.dir=. \ -Dzookeeper.root.logger=INFO,CONSOLE \ -cp ${ZK_CLASSPATH} \ -Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.local.only=false \ org.apache.zookeeper.server.auth.DigestAuthenticationProvider $* On Thu, Oct 2, 2014 at 1:48 PM, Michael Allen mich...@sqrrl.com wrote: Hi Ranjan. If you're doing this on your own development node, or a production node you're in full control of, you can add a root password to ZooKeeper in order to blow away any nodes you like. Here's a little writeup I did about it: ZooKeeper has security features built into it by way of access control lists (ACLs) on nodes. Once set, these ACLs can be very hard to get rid of, especially if errant code has set up nodes that you no longer have any password for. This how-to guide shows you how to set up a root user inside of ZooKeeper that can wipe out any ACLed node. Step-by-step guide 1. Stop your currently running ZooKeeper. This is either a direct $ZOOKEEPER_HOME/bin/zkServer.sh stop command or a sudo service zookeeper-server stop command on some systest boxes. 2. Edit zkServer.sh and in the following section: start) echo -n Starting zookeeper ... if [ -f $ZOOPIDFILE ]; then if kill -0 `cat $ZOOPIDFILE` /dev/null 21; then echo $command already running as process `cat $ZOOPIDFILE`. exit 0 fi fi nohup $JAVA -Dzookeeper.log.dir=${ZOO_LOG_DIR} -Dzookeeper.root.logger=${ZOO_LOG4J_PROP} \ -cp $CLASSPATH $JVMFLAGS $ZOOMAIN $ZOOCFG $_ZOO_DAEMON_OUT 21 /dev/null Add the line -Dzookeeper.DigestAuthenticationProvider.superDigest=super:lK75jTNcA+U9vtVEw5vB51mj/w4= \ within the $JAVA invocation such that the resulting section looks like this: start) echo -n Starting zookeeper ... if [ -f $ZOOPIDFILE ]; then if kill -0 `cat $ZOOPIDFILE` /dev/null 21; then echo $command already running as process `cat $ZOOPIDFILE`. exit 0 fi fi nohup $JAVA -Dzookeeper.log.dir=${ZOO_LOG_DIR} -Dzookeeper.root.logger=${ZOO_LOG4J_PROP} \ -Dzookeeper.DigestAuthenticationProvider.superDigest=super:lK75jTNcA+U9vtVEw5vB51mj/w4= \ -cp $CLASSPATH $JVMFLAGS $ZOOMAIN $ZOOCFG $_ZOO_DAEMON_OUT 21 /dev/null 3. Start ZooKeeper again. 4. Log into ZooKeeper via zkCli.sh 5. Declare yourself the root user with the following addauth command: addauth digest super:secret 6. You should now be able to delete any node and/or change any ACL within the ZooKeeper system. Note that you should *NOT* set this setting up on any production system. If you need to set up a root user on a production system, you need to create a different digest (the super:lK75jTNcA+U9vtVEw5vB51mj/w4=stuff above is a digest) linked to a better password than secret. To make your own digest, use the $SQRRL_HOME/tools/useful-scripts/zk-digest.sh script. On Thu, Oct 2, 2014 at 11:39 AM, Keith Turner ke...@deenlo.com wrote: Accumulo will work properly if you do not clean it before installing, because each time you init Accumulo it stores the information for the new instance under a new random uuid. For the purpose of cleaning out old UUIDs, its possible each old UUID could have been created with a different password. Maybe thats what happening in your case? I can not remember if the syntax of your addauth command is correct. On Wed, Oct 1, 2014 at 11:06 PM, Ranjan Sen ranjan_...@hotmail.com wrote: Let me describe the scenario. Accumulo was installed
Re: rf_tmp file [SEC=UNOFFICIAL]
Which version of Accumulo are you seeing these files in? They should be getting cleaned up automatically after https://issues.apache.org/jira/browse/ACCUMULO-1452 was added to 1.4.5, 1.5.1, and 1.6.0. The brief explanation of their purpose is that they are the temporary files for minor/major compactions while the data is still being written and then they are moved in to place. Mike On Wed, Aug 6, 2014 at 11:34 PM, Dickson, Matt MR matt.dick...@defence.gov.au wrote: *UNOFFICIAL* What purpose does the .rf_tmp file on hdfs serve? Its often 0kb size and there appear to be a lot of these older than the table ageoff filter. Can we safely remove these? Thanks in advance, Matt
Re: accumulo 1.6 and HDFS non-HA conversion to HDFS HA
Hi Craig! Part of the HA transition is described at https://issues.apache.org/jira/browse/ACCUMULO-2793 although you'll have to read through the comments to get the actual steps. I don't have a concise summary of what needs to be done because I haven't had a chance to try it myself. Mike On Tue, Aug 5, 2014 at 12:06 PM, craig w codecr...@gmail.com wrote: I've setup an Accumulo 1.6 cluster with Hadoop 2.4.0 (with a secondary namenode). I wanted to convert the secondary namenode to be a standby (hence HDFS HA). After getting HDFS HA up and making sure the hadoop configuration files were accessible by Accumulo, I started up Accumulo. I noticed some reports of tablet servers failing to connect, however, they were failing to connect to HDFS over port 9000. That port is not configured/used with HDFS HA so I'm unsure why they are still trying to talk to HDFS using the old configuration. Any thoughts ideas? I know Accumulo 1.6 works with HDFS HA, but I'm curious if the tests have ever been run against a non-HA cluster that was converted to HA (with data in it). -- https://github.com/mindscratch https://www.google.com/+CraigWickesser https://twitter.com/mind_scratch https://twitter.com/craig_links
Re: Do Accumulo 1.5.1 and 1.4.4 work with ZooKeeper 3.4.5?
I've seen several vendors offering newer versions of zookeeper with Accumulo without issue. Cloudera has tested versions not too far off from Accumulo 1.4.5 and Accumulo 1.6.0 with CDH4, which uses ZK 3.4.5. Similarly, I just checked Hortonworks' documents on HDP 2.1 and that includes both Accumulo 1.5.1 and ZK 3.4.5. I think the general answer is that yes, it will work. From what I've seen the ZK team does a good job of semantic versions, and minor releases should be backwards compatible. I've not seen any testing on ZK 3.5, but that will likely be fine too. Mike On Thu, Jul 31, 2014 at 9:42 AM, Hunter Provyn f...@ccri.com wrote: We are contemplating upgrading to ZooKeeper 3.4.5 for a project that depends on Accumulo 1.5.1 and 1.4.4. In 1.5.1 pom there is the note: !-- ZooKeeper 3.4.x works also, but we're not using new features yet; this ensures 3.3.x compatibility. -- In the 1.4.4 pom there is the dependency on 3.3.1 with no note. Are there are any known issues with using ZooKeeper 3.4.x with 1.4.4? Is it known to work or not work given that no new ZK features are being used? Thanks!
Re: loaded family in MEATADATA table [SEC=UNOFFICIAL]
Filed a JIRA to update the docs, thanks for pointing this out to us, Matt! https://issues.apache.org/jira/browse/ACCUMULO-3032 On Thu, Jul 31, 2014 at 1:32 AM, Sean Busbey bus...@cloudera.com wrote: those are the markers that a tablet server has bulk loaded: https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/metadata/schema/MetadataSchema.java#L138 On Wed, Jul 30, 2014 at 11:05 PM, Dickson, Matt MR matt.dick...@defence.gov.au wrote: *UNOFFICIAL* Investigating the metadata table I have noticed a family of 'loaded' and looking at all the doco, including the new OReilly Accumulo book, there is no description of it. The column looks like *table_id*;*row **loaded*;*rfile_path value* Any insights would be much appreciated. Thanks in advance, Matt -- Sean
Re: Request for Configuration Help for basic test. Tservers dying and only one tablet being used
You should double-check your data, you might find that it's null padded or something like that which would screw up the splits. You can do a scan from the shell which might give you hints. On Tue, Jul 29, 2014 at 3:53 PM, Pelton, Aaron A. aaron.pel...@gd-ais.com wrote: I agree with the idea of pooling the writers. As for the discussion of the keys. I get what you are saying with choosing better keys for distribution based on frequency of the chars in the English language. But, for this test I'm just using apache RandomStringUtils to create a 2 char random alpha sequence to prepend, so it should be a moderately distributed sampling of chars. However, let me emphasize that I mean I'm seeing 1 tablet getting millions of entries in it, compared to the remaining 35 tablets having no entries or just like 1k. To me that says something isn't right. -Original Message- From: Josh Elser [mailto:josh.el...@gmail.com] Sent: Tuesday, July 29, 2014 4:20 PM To: user@accumulo.apache.org Subject: Re: Request for Configuration Help for basic test. Tservers dying and only one tablet being used On 7/29/14, 3:20 PM, Pelton, Aaron A. wrote: To followup to two of your statements/questions: 1. Good, pre-splitting your table should help with random data, but if you're only writing data to one tablet, you're stuck (very similar to hot-spotting reducers in MapReduce jobs). - OK so its good that the data is presplitting, but maybe this is conceptually something that I'm not grasping about accumulo yet, but I thought specifying the pre-splits is what causes the table to span multiple tablets on the various tserver initially. However, the core of the data appears to be in one specific tablet on on tserver. Each tserver appears to have a few tablets allocated to it for the table I'm working out of. So, I'm confused as to how to get the data to write to more than just the one tablet/partition. I would almost think my keys I specified aren't being matched correctly against incoming data then? No, it sounds like you have the idea correctly. Many tablets make up a table, the split points for a table are what defines those tablet boundaries. Consider you have a table where the rowID are English words ( http://en.wikipedia.org/wiki/Letter_frequency#Relative_frequencies_of_the_first_letters_of_a_word_in_the_English_language ). If you split your table on each letter (a-z), you would still see much more activity to the tablets which host words starting with 'a', 't', and 's' because you have significantly more data being ingested into those tablets. When designing a table (specifically the rowID of the key), it's desirable to try to make the rowID as distributed as possible across the entire table. This helps ensure even processing across all of your nodes. Does that make sense? 2. What do you actually do when you receive an HTTP request to write to Accumulo. It sounds like you're reading data and then writing? Is each HTTP request creating its own BatchWriter? More insight to what a write looks like in your system (in terms of Accumulo API calls) would help us make recommendations about more efficient things you can do. Yes each http request gets its own reference to a writer or scanner, which is closed when thre result is returned from the http request. There are two rest services. One transforms the data and preforms some indexes based on it and then sends both data and index to a BatchWriter. The sample code for the data being written is below. The indexes being written are similar but use different family and qualifier values. Text rowId = new Text(id + : + time); Text fam = new Text(COLUMN_FAMILY_KLV); Text qual = new Text(); Value val = new Value(data.getBytes()); Mutation mut = new Mutation(rowId); mut.put(fam, qual, val); long memBuf = 1_000_000L; long timeout = 1000L; int numThreads = 10; BatchWriter writer = null; try { writer = conn.createBatchWriter(TABLE_NAME, memBuf, timeout, numThreads); writer.addMutation(mut); } catch (Exception x) { // x.printStackTrace(); logger.error(x.toString(), x); result = ERROR; } finally { try { if (writer != null) { writer.close(); } } catch (Exception x) { // x.printStackTrace(); logger.error(x.toString(), x); result = ERROR; } } You could try to make a threadpool for BatchWriters instead of creating a new one for each HTTP thread. This might help amortize the RPC cost by sending more than one mutation
Re: Forgot SECRET, how to delete zookeeper nodes?
Another option would have been to pick a different instance name when rebuilding your cluster. Not that it helps you much now... On Sun, Jul 13, 2014 at 11:28 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Thanks for the help. I think I might better re-ingest the data I need. :( Jianshi On Mon, Jul 14, 2014 at 12:05 PM, Sean Busbey bus...@cloudera.com wrote: If you want to recover the data stored in tables from the old instance, it'll be more straightforward to follow the advanced troubleshooting section of the user manual. In there is a what if zookeeper fails section: http://accumulo.apache.org/1.6/accumulo_user_manual.html#zookeeper_failure Take note of the caveats in that section about potential data issues. -- Sean On Jul 13, 2014 11:02 PM, Vicky Kak vicky@gmail.com wrote: Here is the example about the import/export http://accumulo.apache.org/1.6/examples/export.html On Mon, Jul 14, 2014 at 9:27 AM, William Slacum wilhelm.von.cl...@accumulo.net wrote: If the zookeeper data is gone, your best bet is try and identify which directories under /accumulo/tables points to which tables you had. You can then bulk import the files into a new instance's tables. On Sun, Jul 13, 2014 at 11:54 PM, Vicky Kak vicky@gmail.com wrote: I am not sure if the tables could be recovered seamlessly, the tables are stored in undelying hdfs. I was thinking of using http://accumulo.apache.org/1.6/examples/bulkIngest.html to recover the tables, the better would be if we could update the zookeeper data pointing to the existing hdfs table data. I don't have more information about it as of now, we need someone else to help us here. On Mon, Jul 14, 2014 at 9:06 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: It's too deleted... so the only option I have is to delete the zookeeper nodes and reinitialize accumulo. You're right, I deleted the zk nodes and now Accumulo complains nonode error. Can I recover the tables for a new instance? Jianshi On Mon, Jul 14, 2014 at 11:28 AM, Vicky Kak vicky@gmail.com wrote: Can't you get the secret from the corresponding accumulo-site.xml or this is too deleted? Deletion from the zookeeper should be done using the rmr /accumulo command, you will have to use zkCli.sh to use zookeeper client. I have been doing this sometime back, have not used it recently. I would not recommend to delete the information in zookeeper unless there is not other option, you may loose the data IMO. On Mon, Jul 14, 2014 at 8:40 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: Clusters got updated and user home files lost... I tried to reinstall accumulo but I forgot the secret I put before. So how can I delete /accumulo in Zookeeper? Or is there a way to rename instance_id? -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/
Re: How does Accumulo compare to HBase
At the risk of derailing the original thread, I'll take a moment to explain my methodology. Since each entry can be thought of as a 6 dimensional vector (r, cf, cq, vis, ts, val) there's a lot of room for fiddling with the specifics of it. YCSB gives you several knobs, but unfortunately, not absolutely everything was tunable. The things that are configurable: - # of rows - # of column qualifiers - length of value - number of operations per client - number of clients Things that are not configurable: - row length (it's a long int) - # of column families (constant at one per row) - length of column qualifier (basically a one-up counter per row) - visibilities In all of my experiments, the goal was to keep data size constant. This can be approximated by (number of entries * entry size). Number of entries is intuitively rows (configurable) * column families (1) * columns qualifiers per family (configurable), while entry size is key overhead (about 40 bytes) + configured length of value. So to keep total size constant, we have three easy knobs. However, tweaking three values at a time produces really messy data where you're not always going to be sure where the causality arrow lies. Even doing two at a time can cause issues but then the choice is between tweaking two properties of the data, or one property of the data and the total size (which is also a relevant attribute). Whew. So why did I use two different independent variables between the two halves? Partly, because I'm not comparing the two tests to each other, so they don't have to be duplicative. I ran them on different hardware from each other, with different number of clients, disks, cores, etc. There's no meaningful comparisons to be drawn, so I wanted to remove the temptation to compare results against each other. I'll admit that I might be wrong in this regard. The graphs are not my complete data sets. For the Accumulo v Accumulo tests, we have about ten more data points varying rows and data size as well. Trying to show three independent variables on a graph was pretty painful, so they didn't make it into the presentation. The short version of the story is that nothing scaled linearly (some things were better, some things were worse) but the general trend lines were approximately what you would expect. Let me know if you have more questions, but we can probably start a new thread for future search posterity! (This applies to everybody). Mike On Thu, Jul 10, 2014 at 9:26 AM, Kepner, Jeremy - 0553 - MITLL kep...@ll.mit.edu wrote: Mike Drob put together a great talk at the Accumulo Summit ( http://www.slideshare.net/AccumuloSummit/10-30-drob) discussing Accumulo performance and HBase performance. This exactly the kind of work the entire Hadoop community needs to continue to move forward. I had one question about the talk which I was wondering if someone might be able to shed light on. In the Accumulo part of the talk the experiments varied #rows while keeping the #cols fixed, while in the Accumulo/HBase part of the the experiments varied #cols while keeping #rows fixed?
Re: Table entries in Accumulo removed ?
More likely: Are you inserting data with visibility labels that your scan user does not have? Less likely, but possible: Are you pushing any kind of deletes? Do you have an AgeOffIterator configured? Mike On Wed, Jun 25, 2014 at 2:21 PM, Sivan sivan...@gmail.com wrote: I'm using storm to push data into Accumulo and the number of emitted data is getting populated to Accumulo in real time . The table entities matches with the emitter count. But in course of time the entries are getting removed from Accumulo ? A scan on the table returns none .. And the UI shows 0 entries in the table ! What would be the possible reason ? Accumulo 1.5.1 Storm 0.8.2 Cdh4.5 Thanks Sent from my iPhone
Re: Updating Metadata of a Table
I'm not sure I understand what you are trying to do. Can you give us an example and a use case? The metadata table is just like any other table where you can do inserts/deletes/etc. On Tue, May 27, 2014 at 4:49 PM, Tiffany Reid tr...@eoir.com wrote: Or even via the Java API? I haven’t found any examples to update the MetaData for a specific table, only read the current entries. *From:* Tiffany Reid *Sent:* Tuesday, May 27, 2014 5:39 PM *To:* user@accumulo.apache.org *Subject:* Updating Metadata of a Table Hi, Does anyone know how to go about updating the metadata for a table in Accumulo via shell command tool? I’m using 1.4 and I cannot upgrade to the latest due to project requirements. Thanks, Tiffany
Re: RFiles not referenced in !METADATA [SEC=UNOFFICIAL]
Is your GC running? It should be catching the unreferenced files. I think you are safe to manually delete any files not references in the !METADATA table. What version of Accumulo are you running? On Wed, May 21, 2014 at 9:00 PM, Dickson, Matt MR matt.dick...@defence.gov.au wrote: *UNOFFICIAL* I've run scan on hdfs under /accumulo/tables/table_id for all rfiles older than our ageoff filter on that table. When I then scan for these rfiles in the metadata table most are not listed. Should all rfiles be referenced in the metadata table? My goal had been to get the rowid from the metadata and then force a compaction on that range. Eg for row 4n;234234234 file:/fdi-2342/234234.rfrun a compaction for 234234234 to 234234234~ Thanks in advance. Matt
Re: Embedded Mutations: Is this kind of thing done?
Large rows are only an issue if you are going to try to put the entire row in memory at once. As long as you have small enough entries in the row, and can treat them individually, you should be fine. The qualifier is anything that you want to use to determine uniqueness across keys. So yes, this sounds fine, although possibly not fine grain enough. Mike On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts threadedb...@gmail.comwrote: Interesting, multiple mutations that is. Are we talking multiples on the same row id? Upon reflection, I realized the embedded thing is nothing special. I think I'll keep adding columns to a single mutation. This will make for a wide row, but I'm not seeing that as a problem. I am I being naive? Another question if I may. As I walk my graph, I must keep track of the type of the value being persisted. I am using the qualifier for this, putting in it a URI that indicates the type. Is this a proper use for the qualifier? Thanks for the discussion On Thu, Apr 24, 2014 at 11:23 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Depending on your table schema, you'll probably want to translate an object graph into multiple mutations. On Thu, Apr 24, 2014 at 8:40 PM, David Medinets david.medin...@gmail.com wrote: If the sub-document changes, you'll need to search the values of every Accumulo entry? On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts threadedb...@gmail.com wrote: The use case is, I am walking a complex object graph and persisting what I find there. Said object graph in my case is always EMF (eclipse modeling framework) compliant. An EMF graph can have in if references to--brace yourself--a non-cross document containment reference. When using Mongo, these were persisted as a DBObject embedded into a containing DBObject. I'm trying to decide whether I want to follow suit. Any thoughts? On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey bus...@cloudera.comwrote: Can you describe the use case more? Do you know what the purpose for the embedded changes are? On Thu, Apr 24, 2014 at 2:59 PM, Geoffry Roberts threadedb...@gmail.com wrote: All, I am in the throws of converting some(else's) code from MongoDB to Accumulo. I am seeing a situation where one DBObject if being embedded into another DBObject. I see that Mutation supports a method called getRow() that returns a byte array. I gather I can use this to achieve a similar result if I were so inclined. Am I so inclined? i.e. Is this the way we do things in Accumulo? DBObject, roughly speaking, is Mongo's counterpart to Mutation. Thanks mucho -- There are ways and there are ways, Geoffry Roberts -- Sean -- There are ways and there are ways, Geoffry Roberts -- There are ways and there are ways, Geoffry Roberts
Re: Write to table from Accumulo iterator
Can you share a little more about what you are trying to achieve? My first thought would be to try looking at the Conditional Mutations present in 1.6.0 (not yet released) as either a ready implementation our a starting point for your own code. On Apr 25, 2014 10:13 PM, BlackJack76 justin@gmail.com wrote: I am trying to figure out the best way to write to the table from inside the seek method of a class that implements SortedKeyValueIterator. I originally tried to create a BatchWriter and just use that to write data. However, if the tablet moved during a flush then it would hang. Any other recommendations on how to write back to the table? Thanks! -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Write-to-table-from-Accumulo-iterator-tp9412.html Sent from the Users mailing list archive at Nabble.com.
Re: Accumulo and OSGi
Geoffry, Fixing our logging libraries is an open issue - https://issues.apache.org/jira/browse/ACCUMULO-1242 I hope to see it resolved soon. It's a pretty big task, so if you feel inspired to help, it would be appreciated as well! Thanks, Mike On Wed, Apr 23, 2014 at 9:39 AM, Geoffry Roberts threadedb...@gmail.comwrote: I thought I'd check in. After some encouragement from this group, I found some time and now have an Accumulo client running in OSGi (Felix). It's rather primitive, at this juncture, in that it is little more than a wrap job. I was, however, forced to hack Zookeeper to get things to work. Zookeeper needed to import an additional package. I used the servicemix bundle for Hadoop. Josh, You asked if there was anything that could be done upstream to make osgification go better. One thing, and it's not a huge deal, but getting everything on the same logging library would be nice. So far, I see both log4j and slf4j. Are there more? On Thu, Apr 10, 2014 at 12:49 PM, Russ Weeks rwe...@newbrightidea.comwrote: On Thu, Apr 10, 2014 at 7:18 AM, Geoffry Roberts threadedb...@gmail.comwrote: You say the community would be well-accepting of bundling up the Accumulo client. If that's the case, I'd like to hear from them. +1! -- There are ways and there are ways, Geoffry Roberts
Re: Accumulo not starting anymore
Can you verify that the accumulo files are still present in HDFS? hdfs -ls /accumulo On Wed, Apr 16, 2014 at 4:15 PM, Geoffry Roberts threadedb...@gmail.comwrote: All, Suddenly, Accumulo will no longer start. Log files are not helpful. Is there a way to troubleshoot this? The back story is I upgraded from OSX 10.7 to 10.9. Everything was working with 10.7. But with 10.9 Accumulo began to complain of insufficient file limits and recommended setting maxfiles to 65536, which I did. Hadoop starts -- version 2.3.0 Zookeeper starts -- version 3.4.6 Java -- version 1.7.55 I've included part of a log file just in case. Thanks mucho From: master_abend.home.debug.log t2014-04-16 15:59:07,250 [server.Accumulo] INFO : master starting 2014-04-16 15:59:07,251 [server.Accumulo] INFO : Instance d9f3a06a-ef06-4860-a08d-9cff805a9249 2014-04-16 15:59:07,254 [server.Accumulo] INFO : Data Version 5 2014-04-16 15:59:07,254 [server.Accumulo] INFO : Attempting to talk to zookeeper 2014-04-16 15:59:07,264 [zookeeper.ZooSession] DEBUG: Connecting to localhost:2181 with timeout 3 with auth 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment: host.name=abend.home 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:java.version=1.7.0_55 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:java.vendor=Oracle Corporation 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:java.class.path=/usr/local/accumulo/conf:/usr/local/accumulo/lib/accumulo-start.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:java.library.path=/usr/local/hadoop/lib/native 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:java.io.tmpdir=/var/folders/sb/g6bpj4cd401c1sw566x2r41mgn/T/ 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:java.compiler=NA 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment: os.name=Mac OS X 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:os.arch=x86_64 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:os.version=10.9.2 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment: user.name=gcr 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:user.home=/Users/gcr 2014-04-16 15:59:07,271 [zookeeper.ZooKeeper] INFO : Client environment:user.dir=/Users/gcr 2014-04-16 15:59:07,272 [zookeeper.ZooKeeper] INFO : Initiating client connection, connectString=localhost:2181 sessionTimeout=3 watcher=org.apache.accumulo.fate.zookeeper.ZooSession$ZooWatcher@14731467 2014-04-16 15:59:07,288 [zookeeper.ClientCnxn] INFO : Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2014-04-16 15:59:07,289 [zookeeper.ClientCnxn] INFO : Socket connection established to localhost/127.0.0.1:2181, initiating session 2014-04-16 15:59:07,294 [zookeeper.ClientCnxn] INFO : Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1456c089e9b0008, negotiated timeout = 3 2014-04-16 15:59:07,394 [watcher.MonitorLog4jWatcher] INFO : Set watch for Monitor Log4j watcher 2014-04-16 15:59:07,399 [server.Accumulo] INFO : Waiting for accumulo to be initialized 2014-04-16 15:59:08,401 [server.Accumulo] INFO : Waiting for accumulo to be initialized -- There are ways and there are ways, Geoffry Roberts
Re: How to delete all rows in Accumulo table?
All commands are from memory, so typos might exist. Deleting all rows can be a very lengthy operation. It will likely be much faster to delete the table and create a new one. droptable foo createtable foo If you had configuration settings on the table that you wanted to keep, then it might be easier to create, delete, and rename. createtable temp -cs foo -cc foo droptable foo renametable temp foo If you really need to delete all the rows, you can do it using deleterows -t foo -f On Mon, Apr 14, 2014 at 3:13 PM, Tiffany Reid tr...@eoir.com wrote: Hi, How do I delete all rows in a table via Accumulo Shell? Thanks, Tiffany
[ANN] Accumulo 1.4.5 Released
Users, I am pleased to announce that Accumulo 1.4.5 has been released. The bits are available on our downloads page [1]. Notable improvements of this release include: * Support for Hadoop 2 * Resilience to zookeeper node failure * Provide static utility for resource cleanup for web containers * Automatic cleanup of files resulting from failed flush/compaction Happy Accumulating! Mike [1]: http://accumulo.apache.org/downloads/index.html
Re: NOT operator in visibility string
Wait, I'm really confused by what you are describing, Jeff. Sorry if these are obvious questions, but can you help me get a better grasp of your use case? You have a large amount of data, that is generally readable by all users. Users create their own sandbox, from which they can later exclude portions of the global data set. User can share their sandbox with others, so really we are talking about sandbox permissions and not so much user permissions. Sandboxes are created often. Or, at least much more often than the data changes. Are those all accurate statements? If so, can you clarify the following points: Do users typically remove large amounts of data from their sandbox? 1%? 10%? 99%? Assuming data is removed via rules, are the rules applied automatically to new data under ingest? Thanks, Mike On Wed, Mar 19, 2014 at 12:54 PM, Jeff Kunkle kunkl...@gmail.com wrote: Hi John, Yes it's accurate that the system controls the label and who is associated with it; there are no Accumulo-internal user accounts. But I don't think it's feasible to remove a sandbox label from something that should be hidden. Such a scenario would imply that all data is tagged with the labels of every sandbox that is allowed to see the data, which would be most. It would also imply that the creation of a new sandbox would necessitate changing the visibility of everything in Accumulo to include the new sandbox label, effectively rewriting the entire database. Sanboxes are created and deleted all the time in our application, so it doesn't seem like a feasible solution to me. -Jeff On Mar 19, 2014, at 12:16 PM, Josh Elser josh.el...@gmail.com wrote: It kind of sounds like you could manage this much easier by controlling the authorizations a user gets (notably the workspace name) and the grant/revoke above the Accumulo level. A sandbox has a unique label and the external system controls which users are granted that label. This way, each sandbox can be modified individually (using authorizations that contain the data visibility and the sandbox label) or the original data set could be modified (by omitting a sandbox label in the authorizations used). Is that accurate? On 3/19/14, 12:05 PM, Jeff Kunkle wrote: I attempted to simplify the scenario to facilitate discussion, which on second thought may have been a mistake. Here's the whole scenario: Different users have access to different subsets of the data depending on their authorizations and the visibility of the data. Users work with the data in what we call a sandbox. Sanboxes can be shared with other users (this is the group creation I was talking about earlier). Deletes to the data would be scoped to the sandbox by changing the visibility to add !workspace_name so that people viewing the workspace wouldn't see the data but everyone else would. On Mar 19, 2014, at 11:48 AM, Sean Busbey busbey+li...@cloudera.com mailto:busbey+li...@cloudera.com wrote: On Wed, Mar 19, 2014 at 10:43 AM, Jeff Kunkle kunkl...@gmail.com mailto:kunkl...@gmail.com wrote: New groups are created on the fly by our application when needed. Under the scenario you describe we'd have to go through all the data in Accumulo whenever a group is created so that users in the group can see the existing data. Ah! So your use case is that all data defaults to world readable and then users have the option of opting out of seeing subsets. Right? In your scenario user groups also get to opt-out of seeing data on the fly, yes? Both require rewriting the data. Does the group creation happen more often?
Re: Filters and ScannerBase.fetchColumn
Yes, you are running into the same issue described in https://issues.apache.org/jira/browse/ACCUMULO-1801 On Wed, Mar 19, 2014 at 6:41 PM, John Vines vi...@apache.org wrote: Yes, column level filtering happens before any client iterators get a chance to touch the results. On Wed, Mar 19, 2014 at 6:36 PM, Russ Weeks rwe...@newbrightidea.comwrote: Sorry for the flood of e-mails... I'm not trying to spam the list, I'm just getting deeper into accumulo, and loving it, and I'm kind of stumped by it at the same time. Is it true that if a scanner restricts the column families/qualifiers to be returned, that these columns are not visible to any iterators? ie. that this restriction is applied at a higher priority than any of the iterators? I have some rows that look like this: 00021cdaac30 meta:size []656 00021cdaac30 meta:source []data2 00021cfaac30 meta:filename []doc04484522 00021cfaac30 meta:size []565 00021cfaac30 meta:source []data2 00021dcaac30 meta:filename []doc03342958 I have a couple of RowFilters chained together to filter based on source and filename. If I just run scan --columns meta:size I get no results. I have to specify scan --columns meta:size,meta:source,meta:filename to get any results, which implies that I need to know beforehand which columns are required for any active iterators. Is this expected behaviour? Thanks, -Russ
Re: Installing with Hadoop 2.2.0
$HADOOP_PREFIX/share/hadoop/common/.*.jar, $HADOOP_PREFIX/share/hadoop/common/lib/.*.jar, $HADOOP_PREFIX/share/hadoop/hdfs/.*.jar, $HADOOP_PREFIX/share/hadoop/mapreduce/.*.jar, $HADOOP_PREFIX/share/hadoop/yarn/.*.jar, /usr/lib/hadoop/.*.jar, /usr/lib/hadoop/lib/.*.jar, /usr/lib/hadoop-hdfs/.*.jar, /usr/lib/hadoop-mapreduce/.*.jar, /usr/lib/hadoop-yarn/.*.jar, $ACCUMULO_HOME/server/target/classes/, $ACCUMULO_HOME/lib/accumulo-server.jar, $ACCUMULO_HOME/core/target/classes/, $ACCUMULO_HOME/lib/accumulo-core.jar, $ACCUMULO_HOME/start/target/classes/, $ACCUMULO_HOME/lib/accumulo-start.jar, $ACCUMULO_HOME/fate/target/classes/, $ACCUMULO_HOME/lib/accumulo-fate.jar, $ACCUMULO_HOME/proxy/target/classes/, $ACCUMULO_HOME/lib/accumulo-proxy.jar, $ACCUMULO_HOME/lib/[^.].*.jar, $ZOOKEEPER_HOME/zookeeper[^.].*.jar, $HADOOP_CONF_DIR, $HADOOP_PREFIX/[^.].*.jar, $HADOOP_PREFIX/lib/[^.].*.jar, /value descriptionClasspaths that accumulo checks for updates and class files. When using the Security Manager, please remove the .../target/classes/ values. /description /property /configuration On Sun, Mar 16, 2014 at 9:06 PM, Josh Elser josh.el...@gmail.comwrote: Posting your accumulo-site.xml (filtering out instance.secret and trace.password before you post) would also help us figure out what exactly is going on. On 3/16/14, 8:41 PM, Mike Drob wrote: Which version of Accumulo are you using? You might be missing the hadoop libraries from your classpath. For this, you would check your accumulo-site.xml and find the comment about Hadoop 2 in the file. On Sun, Mar 16, 2014 at 8:28 PM, Benjamin Parrish benjamin.d.parr...@gmail.com mailto:benjamin.d.parr...@gmail.com wrote: I have a couple of issues when trying to use Accumulo on Hadoop 2.2.0 1) I start with accumulo init and everything runs through just fine, but I can find '/accumulo' using 'hadoop fs -ls /' 2) I try to run 'accumulo shell -u root' and it says that that Hadoop and ZooKeeper are not started, but if I run 'jps' on the each cluster node it shows all the necessary processes for both in the JVM. Is there something I am missing? -- Benjamin D. Parrish H: 540-597-7860 tel:540-597-7860 -- Benjamin D. Parrish H: 540-597-7860 -- Benjamin D. Parrish H: 540-597-7860 -- Benjamin D. Parrish H: 540-597-7860
Re: Installing with Hadoop 2.2.0
and limitations under the License. -- ?xml-stylesheet type=text/xsl href=configuration.xsl? configuration !-- Put your site-specific accumulo configurations here. The available configuration values along with their defaults are documented in docs/config.html Unless you are simply testing at your workstation, you will most definitely need to change the three entries below. -- property nameinstance.zookeeper.host/name valuehadoop-node-1:2181,hadoop-node-2:2181,hadoop-node-3:2181,hadoop-node-4:2181,hadoop-node-5:2181/value descriptioncomma separated list of zookeeper servers/description /property property namelogger.dir.walog/name valuewalogs/value descriptionThe property only needs to be set if upgrading from 1.4 which used to store write-ahead logs on the local filesystem. In 1.5 write-ahead logs are stored in DFS. When 1.5 is started for the first time it will copy any 1.4 write ahead logs into DFS. It is possible to specify a comma-separated list of directories. /description /property property nameinstance.secret/name value/value descriptionA secret unique to a given instance that all servers must know in order to communicate with one another. Change it before initialization. To change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new [newpasswd], and then update this file. /description /property property nametserver.memory.maps.max/name value1G/value /property property nametserver.cache.data.size/name value128M/value /property property nametserver.cache.index.size/name value128M/value /property property nametrace.token.property.password/name !-- change this to the root user's password, and/or change the user below -- value/value /property property nametrace.user/name valueroot/value /property property namegeneral.classpaths/name value $HADOOP_PREFIX/share/hadoop/common/.*.jar, $HADOOP_PREFIX/share/hadoop/common/lib/.*.jar, $HADOOP_PREFIX/share/hadoop/hdfs/.*.jar, $HADOOP_PREFIX/share/hadoop/mapreduce/.*.jar, $HADOOP_PREFIX/share/hadoop/yarn/.*.jar, /usr/lib/hadoop/.*.jar, /usr/lib/hadoop/lib/.*.jar, /usr/lib/hadoop-hdfs/.*.jar, /usr/lib/hadoop-mapreduce/.*.jar, /usr/lib/hadoop-yarn/.*.jar, $ACCUMULO_HOME/server/target/classes/, $ACCUMULO_HOME/lib/accumulo-server.jar, $ACCUMULO_HOME/core/target/classes/, $ACCUMULO_HOME/lib/accumulo-core.jar, $ACCUMULO_HOME/start/target/classes/, $ACCUMULO_HOME/lib/accumulo-start.jar, $ACCUMULO_HOME/fate/target/classes/, $ACCUMULO_HOME/lib/accumulo-fate.jar, $ACCUMULO_HOME/proxy/target/classes/, $ACCUMULO_HOME/lib/accumulo-proxy.jar, $ACCUMULO_HOME/lib/[^.].*.jar, $ZOOKEEPER_HOME/zookeeper[^.].*.jar, $HADOOP_CONF_DIR, $HADOOP_PREFIX/[^.].*.jar, $HADOOP_PREFIX/lib/[^.].*.jar, /value descriptionClasspaths that accumulo checks for updates and class files. When using the Security Manager, please remove the .../target/classes/ values. /description /property /configuration On Sun, Mar 16, 2014 at 9:06 PM, Josh Elser josh.el...@gmail.comwrote: Posting your accumulo-site.xml (filtering out instance.secret and trace.password before you post) would also help us figure out what exactly is going on. On 3/16/14, 8:41 PM, Mike Drob wrote: Which version of Accumulo are you using? You might be missing the hadoop libraries from your classpath. For this, you would check your accumulo-site.xml and find the comment about Hadoop 2 in the file. On Sun, Mar 16, 2014 at 8:28 PM, Benjamin Parrish benjamin.d.parr...@gmail.com mailto:benjamin.d.parr...@gmail.com wrote: I have a couple of issues when trying to use Accumulo on Hadoop 2.2.0 1) I start with accumulo init and everything runs through just fine, but I can find '/accumulo' using 'hadoop fs -ls /' 2) I try to run 'accumulo shell -u root' and it says that that Hadoop and ZooKeeper are not started, but if I run 'jps' on the each cluster node it shows all the necessary processes for both in the JVM. Is there something I am missing? -- Benjamin D. Parrish H: 540-597-7860 tel:540-597-7860 -- Benjamin D. Parrish H: 540-597-7860 -- Benjamin D. Parrish H: 540-597-7860 -- Benjamin D. Parrish H: 540-597-7860 -- Benjamin D. Parrish H: 540-597-7860
Re: HDFS caching w/ Accumulo?
First instinct is to use it for the root/metadata tablets. On Tue, Feb 25, 2014 at 10:49 AM, Donald Miner dmi...@clearedgeit.comwrote: HDFS caching is part of the new Hadoop 2.3 release. From what I understand, it allows you to mark specific files to be held in memory for faster reads. Has anyone thought about how Accumulo could leverage this?
Re: Error stressing with pyaccumulo app
For uuid4 keys, you might want to do [00, 01, 02, ..., 0e, 0f, 10, ..., fd, fe, ff] to cover the full range. On Tue, Feb 11, 2014 at 9:16 AM, Josh Elser josh.el...@gmail.com wrote: Ok. Even so, try adding some split points to the tables before you begin (if you aren't already) as it will *greatly* smooth the startup. Something like [00, 01, 02, ... 10, 11, 12, .. 97, 98, 99] would be good. You can easily dump this to a file on local disk and run the `addsplits` command in the Accumulo shell and provide it that file with the -sf (I think) option. On 2/11/14, 12:00 PM, Diego Woitasen wrote: I'm using random keys for this tests. They are uuid4 keys. On Tue, Feb 11, 2014 at 1:04 PM, Josh Elser josh.el...@gmail.com wrote: The other thing I thought about.. what's the distribution of Key-Values that you're writing? Specifically, do many of the Keys sort near each other. Similarly, do you notice excessive load on some tservers, but not all (the Tablet Servers page on the Monitor is a good check)? Consider the following: you have 10 tservers and you have 10 proxy servers. The first thought is that 10 tservers should be plenty to balance the load of those 10 proxy servers. However, a problem arises when if the data that each of those proxy servers is writing happens to reside on a _small number of tablet servers_. Thus, your 10 proxy servers might only be writing to one or two tabletservers. If you notice that you're getting skew like this (or even just know that you're apt to have a situation where multiple clients might write data that sorts close to one another), it would be a good idea to add splits to your table before starting your workload. e.g. if you consider that your Key-space is the numbers from 1 to 10, and you have ten tservers, it would be a good idea to add splits 1, 2, ... 10, so that each tservers hosts at least one tablet (e.g. [1,2), [2,3)... [10,+inf)). Having at least 5 or 10 tablets per tserver per table (split according to the distribution of your data) might help ease the load. On 2/11/14, 10:47 AM, Diego Woitasen wrote: Same results with 2G tserver.memory.maps.max. May be we just reached the limit :) On Mon, Feb 10, 2014 at 7:08 PM, Diego Woitasen diego.woita...@vhgroup.net wrote: On Mon, Feb 10, 2014 at 6:21 PM, Josh Elser josh.el...@gmail.com wrote: I assume you're running a datanode along side the tserver on that node? That may be stretching the capabilities of that node (not to mention ec2 nodes tend to be a little flakey in general). 2G for the tserver.memory.maps.max might be a little safer. You got an error in a tserver log about that IOException in internalReader. After that, the tserver was still alive? And the proxy client was dead - quit normally? Yes, everything is still alive. If that's the case, the proxy might just be disconnecting in a noisy manner? Right! I'll try with 2G tserver.memory.maps.max. On 2/10/14, 3:38 PM, Diego Woitasen wrote: Hi, I tried increasing the tserver.memory.maps.max to 3G and failed again, but with other error. I have a heap size of 3G and 7.5 GB of total ram. The error that I've found in the crashed tserver is: 2014-02-08 03:37:35,497 [util.TServerUtils$THsHaServer] WARN : Got an IOException in internalRead! The tserver haven't crashed, but the client was disconnected during the test. Another hint is welcome :) On Mon, Feb 3, 2014 at 3:58 PM, Josh Elser josh.el...@gmail.com wrote: Oh, ok. So that isn't quite as bad as it seems. The commits are held exception is thrown when the tserver is running low on memory. The tserver will block new mutations coming in until it can process the ones it already has and free up some memory. This makes sense that you would see this more often when you have more proxy servers as the total amount of Mutations you can send to your Accumulo instance is increased. With one proxy server, your tserver had enough memory to process the incoming data. With many proxy servers, your tservers would likely fall over eventually because they'll get bogged down in JVM garbage collection. If you have more memory that you can give the tservers, that would help. Also, you should make sure that you're using the Accumulo native maps as this will use off-JVM-heap space instead of JVM heap which should help tremendously with your ingest rates. Native maps should be on by default unless you turned them off using the property 'tserver.memory.maps.native.enabled' in accumulo-site.xml. Additionally, you can try increasing the size of the native maps using 'tserver.memory.maps.max' in accumulo-site.xml. Just be aware that with the native maps, you need to ensure that total_ram JVM_heap + tserver.memory.maps.max - Josh On 2/3/14, 1:33 PM, Diego Woitasen wrote: I've launched the cluster again and I was able to reproduce the error: In the proxy I had the same error
Re: force tablet assignment to tablet server?
You can implement your own Balancer. Or kill all the other tablet servers. :) On Tue, Feb 4, 2014 at 10:47 AM, Donald Miner dmi...@clearedgeit.comwrote: Is there a way to force a particular tablet to be hosted off of a particular tablet server? There is some tricky stuff I want to do with data locality alongside of another system and I think this would help.
Re: force tablet assignment to tablet server?
For future search indexing - we are referring to creating a custom implementation of http://accumulo.apache.org/1.5/apidocs/org/apache/accumulo/server/master/balancer/TabletBalancer.htmland loading it onto your cluster. On Tue, Feb 4, 2014 at 11:07 AM, Donald Miner dmi...@clearedgeit.comwrote: Balancer is going to do exactly what i need. The second option sounds much more fun though. Thanks! On Feb 4, 2014, at 10:49 AM, Mike Drob mad...@cloudera.com wrote: You can implement your own Balancer. Or kill all the other tablet servers. :) On Tue, Feb 4, 2014 at 10:47 AM, Donald Miner dmi...@clearedgeit.comwrote: Is there a way to force a particular tablet to be hosted off of a particular tablet server? There is some tricky stuff I want to do with data locality alongside of another system and I think this would help.
Re: Using Java7, fetch instance name or uuid WITHOUT Connector class?
Tangential note - In Java 7, I thought that Swing was deprecated in favour of JavaFX[1][2]? [1]: http://www.oracle.com/technetwork/java/javafx/overview/faq-1446554.html [2]: http://docs.oracle.com/javafx/2/swing/jfxpub-swing.htm On Tue, Jan 28, 2014 at 1:59 PM, Ott, Charles H. charles.h@leidos.comwrote: That'll work. I'm already prompting the user to enter the zookeeper host when adding the 'server' to the tool, so I'll just add a field to support the instance name as well. Thanks. *From:* user-return-3667-CHARLES.H.OTT=leidos@accumulo.apache.org[mailto: user-return-3667-CHARLES.H.OTT=leidos@accumulo.apache.org] *On Behalf Of *Keith Turner *Sent:* Tuesday, January 28, 2014 1:41 PM *To:* user@accumulo.apache.org *Subject:* Re: Using Java7, fetch instance name or uuid WITHOUT Connector class? On Tue, Jan 28, 2014 at 12:26 PM, Ott, Charles H. charles.h@leidos.com wrote: I am making a more user friendly (Swing) tool for performing import/exports of table data via hadoop.io.sequencefile. (Currently using Accumulo 1.5.0 w/ cdh4u5) The first thing I do is load a list of tables into a swing component using the http://monitorURL/xml URL and JAXB: private void loadTables() { try { jaxbContext = JAXBContext.newInstance(Stats.class); jaxbUnmarshaller = jaxbContext.createUnmarshaller(); jaxbMarshaller = jaxbContext.createMarshaller(); Stats stats = (Stats) jaxbUnmarshaller.unmarshal( new URL(http://; + associatedHost.getHostname() + : + associatedHost.getUi_port() + /xml)); String results = new String(); for (Table t : stats.getTables().getTable()) { results = results.concat(t.getTablename() + \r\n); } jEditorPane1.setText(results); } catch (Exception err) { err.printStackTrace(); } } Then I create a ZooKeeperInstance, and call the 'getConnector' method to get a connector used for scanning: try { connector = zooInstance.getConnector(username, password.getBytes()); getUserAuths(); } catch (Exception err) { err.printStackTrace(); } Since, I now have the connector, I can get the 'user' Authorizations class for the export tool's client.Scanner: this.authorizations = connector.securityOperations().getUserAuthorizations(username); The part I am not sure how to do is automatically determine the 'instance name' or 'instance uuid' when constructing the ZooKeeperInstance. I can see both strings displayed on the header of the Accumulo Monitor: div id='subheader'Instancenbsp;Name:nbsp;gmnbsp;nbsp;nbsp;Version:nbsp;1.5.0 brspan class='smalltext'Instancenbsp;ID:nbsp;a85286bf-031c-4e24-9b47-f6aca34401b8/span brspan class='smalltext'Tuenbsp;Jannbsp;28nbsp;12:15:41nbsp;ESTnbsp;2014/span/div /div But I do not see any 'clean' way to retrieve it using the Java API, without doing a parse of the monitor's HTML. Which feels dirty. This leaves me with one option, for the user to specify the instance name before clicking 'Export Tables'. Which I think is a bit silly considering the user has already entered and saved the MonitorURL, dbUsername and dbPassword within the tool. Thoughts? Maybe start off asking the user for the instance name and zookeepers instead of the monitor URL. Once you create a connector you can use connector.tableOperations() to get a list of tables w/o accessing the monitor. Thanks in advance to anyone who read this far!
Re: accumulo startup issue: Accumulo not initialized, there is no instance id at /accumulo/instance_id
The tracer does performance metrics logging, and stores the data internally in accumulo. It needs a tablet server running to persist everything and will complain until it finds one. Are your tablet servers and loggers running? I would stop your tracer app until you have everything else up. On Thu, Jan 16, 2014 at 1:41 PM, Steve Kruse skr...@adaptivemethods.comwrote: Straightened out HDFS, now having problem getting accumulo to start. Get the following exception in my tracer log repeatedly. 2014-01-16 13:06:49,131 [impl.ServerClient] DEBUG: ClientService request failed null, retrying ... org.apache.thrift.transport.TTransportException: Failed to connect to a server at org.apache.accumulo.core.client.impl.ThriftTransportPool.getAnyTransport(ThriftTransportPool.java:437) at org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:152) at org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:128) at org.apache.accumulo.core.client.impl.ServerClient.getConnection(ServerClient.java:123) at org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerClient.java:105) at org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:71) at org.apache.accumulo.core.client.impl.ConnectorImpl.init(ConnectorImpl.java:64) at org.apache.accumulo.server.client.HdfsZooInstance.getConnector(HdfsZooInstance.java:154) at org.apache.accumulo.server.client.HdfsZooInstance.getConnector(HdfsZooInstance.java:149) at org.apache.accumulo.server.trace.TraceServer.init(TraceServer.java:185) at org.apache.accumulo.server.trace.TraceServer.main(TraceServer.java:260) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.accumulo.start.Main$1.run(Main.java:101) at java.lang.Thread.run(Thread.java:619) When I look at my processes, I have a gc and tracer app running. I also can’t seem to run accumulo init again because it says that it’s already been initialized. Steve *From:* Sean Busbey [mailto:busbey+li...@cloudera.com] *Sent:* Wednesday, January 15, 2014 4:43 PM *To:* Accumulo User List *Subject:* Re: accumulo startup issue: Accumulo not initialized, there is no instance id at /accumulo/instance_id On Wed, Jan 15, 2014 at 3:36 PM, Steve Kruse skr...@adaptivemethods.com wrote: Sean, The classpath for HDFS was incorrect and that definitely helped when I corrected it. Now it seems I’m having a hadoop issue where the datanodes are not running. I’m going to keep plugging away. Glad to hear you made progress. Generally, I recommend people run through teragen / terasort to validate their HDFS and MR set up before the move on to installing Accumulo. Let us know when you get back to trying to get Accumulo going. -- Spamhttps://antispam.roaringpenguin.com/canit/b.php?i=0aLelHuyFm=25623c521f75t=20140115c=s Not spamhttps://antispam.roaringpenguin.com/canit/b.php?i=0aLelHuyFm=25623c521f75t=20140115c=n Forget previous votehttps://antispam.roaringpenguin.com/canit/b.php?i=0aLelHuyFm=25623c521f75t=20140115c=f
Re: accumulo startup issue: Accumulo not initialized, there is no instance id at /accumulo/instance_id
What do you get when you try to run accumulo init? On Wed, Jan 15, 2014 at 2:39 PM, Steve Kruse skr...@adaptivemethods.comwrote: Hello, I'm new to accumulo and I am trying to get it up and running. I currently = have hadoop 2.2.0 and zookeeper 3.4.5 installed and running. I have gone t= hrough the installation steps on the following page and I now am running in= to a problem when I try to start accumulo up. The error I receive is the f= ollowing: Thread org.apache.accumulo.server.master.state.SetGoalState died null java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Meth= od) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethod= AccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Delegati= ngMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.accumulo.start.Main$1.run(Main.java:101) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: Accumulo not initialized, there is n= o instance id at /accumulo/instance_id at org.apache.accumulo.core.client.ZooKeeperInstance.getIns= tanceIDFromHdfs(ZooKeeperInstance.java:293) at org.apache.accumulo.server.client.HdfsZooInstance._getIn= stanceID(HdfsZooInstance.java:126) at org.apache.accumulo.server.client.HdfsZooInstance.getIns= tanceID(HdfsZooInstance.java:119) at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUt= il.java:24) at org.apache.accumulo.server.master.state.SetGoalState.mai= n(SetGoalState.java:46) ... 6 more I have tried to run accumulo init several times but I still get the same re= sult every single time. Any help would be much appreciated. Thanks, Steve *H. Stephen Kruse* Software Engineer Adaptive Methods 5860 Trinity Parkway, Suite 200 Centreville, VA 20121 phone: (703) 968-6132 email: skr...@adaptivemethods.com
Re: Multiple masters in which version?
Joe, Stand-by master functionality has existed for a while now, (since before 1.4), so you should be good! Let us know if you run into any issues. Mike On Jan 6, 2014 3:13 AM, Joe Gresock jgres...@gmail.com wrote: I seem to remember reading in one of the user guides that you can configure multiple Masters in your cluster by adding more than one IP to the masters file. Is this available in Accumulo 1.4.3, or only later versions? Thanks! Joe -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being content in any and every situation, whether well fed or hungry, whether living in plenty or in want. I can do all this through him who gives me strength.*-Philippians 4:12-13*
Re: Re-applying split file to a table?
Aaron, If you attempt to apply the same splits file, then you are attempting to add already existing splits. Since the data is already split on those points, there are no changes, and nothing happens, exactly as you observed. If you apply a different split file to the existing data (after it already had the initial and natural splits), then you will likely get more split points. The data might not split immediately, but you can prompt it to do so by issuing a major compaction. Your underlying data will not change, but you should see more tablets in your table via the monitor interface. Mike On Mon, Jan 6, 2014 at 10:47 AM, Aaron aarongm...@gmail.com wrote: To set the stage: We create a table and pre-split it..then we start to ingest some data. During the ingest, the table splits a few more times maybe, and after the ingest is done the table balances itself out across the tablet severs. What happens if we apply the spilt file again to the same table? From what I can tell, nothing appears to change, but, just wanted to double check..make sure I wasn't missing anything. Same question, but if we use a completely different spilt file, with different splits? Same result..nothing changes?
Re: Is accumulo supported on centos (6.x)
Which version of the Cloudera Quickstart VM are you running? To install a 1.4.4 Accumulo RPM, you will indeed have to build it from source. 1.5.0 RPMs are available as downloads on the site, like Josh said. Thanks, Mike On Sat, Dec 21, 2013 at 10:28 AM, ashili kash...@yahoo.com wrote: I see from downloads accumulo is supported on debian. From following readme link, I see accumulo is supported on linux https://git-wip-us.apache.org/repos/asf?p=accumulo.git;a=blob_plain;f=README;hb=419aacc45279a3cd6b3b5bf61baf486f082a450a My question: are accumulo binaries available for centos? or should I build it from the source/git? I am trying to integrate accumulo with cloudera demo VM (centos, X64);
Re: 1.5 on cdh4u5
It looks like you are running with an improperly configured Java Security Policy. In the example accumulo-env.sh files there are some lines that look like: if [ -f ${ACCUMULO_CONF_DIR}/accumulo.policy ] then POLICY=-Djava.security.manager -Djava.security.policy=${ACCUMULO_CONF_DIR}/accumulo.policy fi test -z $ACCUMULO_MONITOR_OPTS export ACCUMULO_MONITOR_OPTS=${POLICY} -Xmx1g -Xms256m Does $ACCUMULO_CONF_DIR/accumulo.policy exist on your system? If so, it looks like you're missing PropertyPermission for the accumulo code. Compare to line 112 of the example policy file. [2] Mike [1]: https://github.com/apache/accumulo/blob/master/conf/examples/3GB/standalone/accumulo-env.sh?source=c [2]: https://github.com/apache/accumulo/blob/master/conf/accumulo.policy.example?source=cc On Wed, Dec 11, 2013 at 7:32 AM, Ott, Charles H. charles.h@leidos.comwrote: I am having a few issues getting 1.5 to run with cdh4u5 parcels installation. The baseline Accumulo-site.xml did not seem to point to a proper classpath, so I have made some modifications to the configuration. I was able to initialize the database (./accumulo init) and did not receive any errors when doing so. # vars from my accumulo-env.sh HADOOP_PREFIX=/opt/cloudera/parcels/CDH/lib/hadoop HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop ZOOKEEPER_HOME=/opt/cloudera/parcels/CDH/lib/zookeeper # cdh4 stuff I added to accumulo-env.sh export HADOOP_HDFS_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-hdfs export HADOOP_MAPREDUCE_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce # ACCUMULO_HOME is set as an env var for the ‘hdfs’ user. Which has ownership of the Accumulo home and walogs folder. # the accumulo-site.xml general.classpath info property namegeneral.classpaths/name value $ACCUMULO_HOME/server/target/classes/, $ACCUMULO_HOME/lib/accumulo-server.jar, $ACCUMULO_HOME/core/target/classes/, $ACCUMULO_HOME/lib/accumulo-core.jar, $ACCUMULO_HOME/start/target/classes/, $ACCUMULO_HOME/lib/accumulo-start.jar, $ACCUMULO_HOME/fate/target/classes/, $ACCUMULO_HOME/lib/accumulo-fate.jar, $ACCUMULO_HOME/proxy/target/classes/, $ACCUMULO_HOME/lib/accumulo-proxy.jar, $ACCUMULO_HOME/lib/[^.].*.jar, $ZOOKEEPER_HOME/zookeeper[^.].*.jar, $HADOOP_CONF_DIR, $HADOOP_PREFIX/[^.].*.jar, $HADOOP_PREFIX/lib/[^.].*.jar, $HADOOP_HDFS_HOME/.*.jar, $HADOOP_HDFS_HOME/lib/.*.jar, $HADOOP_MAPREDUCE_HOME/.*.jar, $HADOOP_MAPREDUCE_HOME/lib/.*.jar /value descriptionClasspaths that accumulo checks for updates and class files. When using the Security Manager, please remove the .../target/classes/ values. /description /property I have also disabled ipv6, selinux, and dfs.permissions. Also increases ulimit to 65536, swapiness set to 10, ntpd installed and running. Trying to start Accumulo as the ‘hdfs’ user as my current 1.4.4 cluster is running on cdh3u6. But, when I run ./start-all.sh I have the following issues: 1.)Monitor thread dies: *Thread monitor died null* *java.lang.reflect.InvocationTargetException* *at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)* *at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)* *at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)* *at java.lang.reflect.Method.invoke(Method.java:606)* *at org.apache.accumulo.start.Main$1.run(Main.java:101)* *at java.lang.Thread.run(Thread.java:744)* *Caused by: java.security.AccessControlException: access denied (java.util.PropertyPermission * read,write)* *at java.security.AccessControlContext.checkPermission(AccessControlContext.java:372)* *at java.security.AccessController.checkPermission(AccessController.java:559)* *at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)* *at java.lang.SecurityManager.checkPropertiesAccess(SecurityManager.java:1269)* *at java.lang.System.getProperties(System.java:624)* *at org.apache.commons.configuration.SystemConfiguration.init(SystemConfiguration.java:38)* *at org.apache.accumulo.core.conf.Property.getDefaultValue(Property.java:384)* *at org.apache.accumulo.core.conf.DefaultConfiguration.iterator(DefaultConfiguration.java:52)* *at org.apache.accumulo.core.conf.ConfigSanityCheck.validate(ConfigSanityCheck.java:29)* *at org.apache.accumulo.core.conf.DefaultConfiguration.getInstance(DefaultConfiguration.java:37)* *at org.apache.accumulo.core.conf.AccumuloConfiguration.getDefaultConfiguration(AccumuloConfiguration.java:153)* *at org.apache.accumulo.core.conf.AccumuloConfiguration.getSiteConfiguration(AccumuloConfiguration.java:163)* *at
Re: HBase rowkey design guidelines
Well, yes and no. Smaller keys still mean less network traffic, potentially less IO, and maybe faster operations if you're trying to do application logic. Using data or default or just d probably doesn't matter in the long term (although there are certainly cases where it might). On Dec 3, 2013 11:57 PM, David Medinets david.medin...@gmail.com wrote: http://hbase.apache.org/book/rowkey.design.html - unless I am misunderstanding much of the advice given for HBase simply doesn't apply to Accumulo. For example Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. d for data/default).
Re: How to reduce number of entries in memory
What are you trying to accomplish by reducing the number of entries in memory? A tablet server will not minor compact (flush) until the native map fills up, but keeping things in memory isn't really a performance concern. You can force a one-time minor compaction via the shell using the 'flush' command. On Mon, Oct 28, 2013 at 5:19 PM, Terry P. texpi...@gmail.com wrote: Greetings all, For a growing table that currently from zero to 70 million entries this weekend, I'm seeing 4.4 million entries still in memory, though the client programs are supposed to be flushing their entries. Is there a server-side setting to help reduce the number of entries that are in memory (not yet flushed to disk)? Our system has fairly light performance requirements, so I'm okay if a tweak may result in reduced ingest performance. Thanks in advance, Terry
Re: Deleting many rows that match a given criterion
Thanks for the feedback, Aru and Keith. I've had some more time to play around with this, and here's some additional observations. My existing process is very slow. I think this is due to each deletemany command starting up a new scanner and batchwriter, and creating a lot of rpc overhead. I didn't initially think that it would be a significant amount of data, but maybe I just had the wrong idea of what significant is in this case. I'm not sure the RowDeletingIterator would work in this case because I do use empty rows for other purposes. The RowFilter at compaction is a great option, except I had hoped to avoid writing actual java code. Looking back at this, I might have to bite that bullet. Again, thanks both for the suggestions! Mike On Tue, Oct 22, 2013 at 12:04 PM, Keith Turner ke...@deenlo.com wrote: If its a significant amount of data, you could create a class that extends row filter and set it as a compaction iterator. On Tue, Oct 22, 2013 at 11:45 AM, Mike Drob md...@mdrob.com wrote: I'm attempting to delete all rows from a table that contain a specific word in the value of a specified column. My current process looks like: accumulo shell -e 'egrep .*EXPRESSION.* -np -t tab -c col' | awk 'BEGIN {print table tab}; {print deletemany -f -np -r $1}; END {print exit}' rows.out accumulo shell -f rows.out I tried playing around with scan iterators and various options on deletemany and deleterows but wasn't able to find a more straightforward way to do this. Does anybody have any suggestions? Mike
Re: Cancel a compact [SEC=UNOFFICIAL]
Depending on the version that you are running, compactions can be cancelled with varying degrees of difficulty and perseverance (and tablet server restarts). On Tue, Oct 1, 2013 at 10:09 PM, Dickson, Matt MR matt.dick...@defence.gov.au wrote: ** *UNOFFICIAL* Can a compact process be cancelled?
Re: Tunneling over SSH
There is some development going on as part of ACCUMULO-1585https://issues.apache.org/jira/browse/ACCUMULO-1585[1] to allow tservers to store the hostname instead of the ip address. That seems like a good place to start, although I'm not sure if this is the same problem that you're seeing. [1]: https://issues.apache.org/jira/browse/ACCUMULO-1585 Mike https://issues.apache.org/jira/browse/ACCUMULO-1585 On Thu, Sep 5, 2013 at 7:14 PM, stbil...@gmail.com wrote: I'm trying to tunnel via SSH to a single Hadoop,Zoo, Accumulo stand-alone installation. The internal IP of the machine is on a local subnet behind a SSH-only firewall - 192.168.182.22.. I use static host names in all of the config files (Accumulo, Zoo, Hadoop) that resolve to 192.168.182.22 for all the servers. There is no problem connecting when I'm directly connected to the subnet inside the firewall. However, when I try to connect via the JAVA API from outside the firewall, I get an error: Failed to find an available server in the list of servers: [192.168.182.22:9997:9997 (12)]. I've created a Windows Loopback interface that allows me to forward unlimited ports directly through the SSH tunnel to the internal network - there is no issue with connecting to Hadoop via Java or the web interface, and I can view the Accumuoo status page at 50095 by just setting my Windows box to resolve the hostname to the loopback local IP - SSH - 192.168.182.22:50095. I think the problem is that Zookeeper is telling my Java process to try and make a connection directly to 192.168.22.9997. If Zoo would use the hostname, there'd be no problem as it'd resolve to the loopback, and get tunneled along with everything else. But since it uses the actual IP, the Windows box won't route that back through the SSH tunnel as it considers it a local subnet outside of the firewall. Anyone experienced this issue and have a solution? I guess one solution might be to 'trick' Windows into forwarding the 192.168.x.y subnet back through the loopback (- SSH), but I'm not seeing a good way to do that. Thanks
Re: Using Accumulo shell to add column visibility to cells containing Unicode values
What version are you using? According to ACCUMULO-241, you should be able to quote any UTF-8 characters for visibility using the Java API. The shell will likely have parsing issues, however. [1]: https://issues.apache.org/jira/browse/ACCUMULO-241 On Mon, Aug 26, 2013 at 3:56 PM, John Vines vi...@apache.org wrote: The java API is the most feature rich way of interfacing with Accumulo. The shell is a utility built on it, but occasionally issues get hit with parsing user input. It seems you have hit one of these cases. You may be able to quote your fields, etc. However, it is more important to note that the visibilities are very strict for the character set allowed. Only a-z, A-Z, 0-9, and a few additional characters are allowed (- and _ if I remember correctly). So unicode won't work, and you'd get an error indicating that if you could get the shell to accept them. On Mon, Aug 26, 2013 at 3:52 PM, Celeste Hofer celesteho...@gmail.comwrote: Hello, I am trying to add column visibility (a label) to cells containing Unicode values, using an Accumulo shell. However, I receive this ERROR: java.lang.IllegalArgumentException: Expected 4 arguments. There was 6. Is the use of the Accumulo shell supported for applying column visibility when the value is Unicode? If it is supported, please provide a simple example, or more information. If it is not supported via the Accumulo shell, is there another supported approach, for example, using the Java API? Thanks, Celeste H
Re: Okay to purge trace table?
David, I already created a ticket for it - https://issues.apache.org/jira/browse/ACCUMULO-1501 -Mike On Thu, Jun 6, 2013 at 9:00 PM, David Medinets david.medin...@gmail.comwrote: Does it make sense to create a JIRA ticket asking for an age-off iterator to be the default on the trace table? Maybe set for something like two weeks? If we don't add a default age-off iterator, where should the documentation be changed to talk about this topic? Does the user manual talk about the trace table? On Thu, Jun 6, 2013 at 3:54 PM, Eric Newton eric.new...@gmail.com wrote: You could put an age-off iterator on it, or just purge it from time-to-time. I probably should have configured the trace table with an age-off filter by default. But for now, you need to manually manage the data. You can use delete rows to wipe the table efficiently: shell deleterows -f -t trace -Eric On Thu, Jun 6, 2013 at 2:35 PM, Terry P. texpi...@gmail.com wrote: Greetings all, We have a million entries in the trace table in one of our Accumulo clusters and a million and a half in another. We haven't manually enabled any tracing activities, and in looking at the entries, they seem to be generated by Accumulo does on its own (compact, wal, getFileStatus, minorCompactionStarted, minorCompaction, prep, commit, etc). Does Accumulo maintain this table or do I / should I manually purge it from time to time? If it's on us to maintain it, are there any guidelines or a procedure for doing so? Thanks in advance, Terry
Re: master fails to start
Looks like you might be running with a Java Security Policy in place. On Mon, May 20, 2013 at 4:28 PM, Chris Retford chris.retf...@gmail.comwrote: Accumulo 1.4.3. Hadoop is CDH3u6 (0.20.2). I can manually list files in Hadoop. Accumulo was able to run the init script. All accumulo directories in HDFS are world readable and executable. On Mon, May 20, 2013 at 2:20 PM, Christopher ctubb...@apache.org wrote: What version of Accumulo are you running? Can you manually query HDFS as the same user Accumulo is running as? -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Mon, May 20, 2013 at 4:14 PM, Chris Retford chris.retf...@gmail.com wrote: I searched the archive before posting and didn't find anything. I have a new system with 12 nodes (3 ZK), and a single user in the hadoop group. The master fails to start. It looks to me like it is unable to read /accumulo/instance_id in HDFS, but I can't think why that would be. Thanks in advance for any advice on how to run this down. Here are the contents of master.err log: Thread master died null java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.accumulo.start.Main$1.run(Main.java:89) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.ExceptionInInitializerError at org.apache.hadoop.security.UserGroupInformation.clinit(UserGroupInformation.java:469) at org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1757) at org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1750) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1618) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:255) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124) at org.apache.accumulo.core.file.FileUtil.getFileSystem(FileUtil.java:554) at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceIDFromHdfs(ZooKeeperInstance.java:258) at org.apache.accumulo.server.conf.ZooConfiguration.getInstance(ZooConfiguration.java:65) at org.apache.accumulo.server.conf.ServerConfiguration.getZooConfiguration(ServerConfiguration.java:49) at org.apache.accumulo.server.conf.ServerConfiguration.getSystemConfiguration(ServerConfiguration.java:58) at org.apache.accumulo.server.client.HdfsZooInstance.init(HdfsZooInstance.java:62) at org.apache.accumulo.server.client.HdfsZooInstance.getInstance(HdfsZooInstance.java:70) at org.apache.accumulo.server.Accumulo.init(Accumulo.java:132) at org.apache.accumulo.server.master.Master.init(Master.java:534) at org.apache.accumulo.server.master.Master.main(Master.java:2190) ... 6 more Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission getenv.HADOOP_JAAS_DEBUG) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:366) at java.security.AccessController.checkPermission(AccessController.java:560) at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) at java.lang.System.getenv(System.java:883) at org.apache.hadoop.security.UserGroupInformation$HadoopConfiguration.clinit(UserGroupInformation.java:392)
Cancelling queued compactions in Accumulo 1.4
Somebody (totally not me) accidentally kicked off a full table compaction using Accumulo 1.4.3. There's a large number of them waiting and the queue is decreasing very slowly - what are my options for improving the situation. Ideally, I would be able to just cancel everything and then come back with a more precise approach. Thanks, Mike
Re: Cancelling queued compactions in Accumulo 1.4
Can I leave the ones that are already running and just dispose of the queued compactions? If not, that seems like a pretty serious limitation. On Wed, May 15, 2013 at 2:51 AM, John Vines vi...@apache.org wrote: I'm not sure if it's possible. Scheduling a compaction is an entry in the metadata table. But once it gets triggered, there are then compactions scheduled locally for the tserver. You might be able to delete the flag and bounce all the tservers to stop it, but I can't say for certain. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 11:48 PM, Mike Drob md...@mdrob.com wrote: Somebody (totally not me) accidentally kicked off a full table compaction using Accumulo 1.4.3. There's a large number of them waiting and the queue is decreasing very slowly - what are my options for improving the situation. Ideally, I would be able to just cancel everything and then come back with a more precise approach. Thanks, Mike
Re: Cancelling queued compactions in Accumulo 1.4
Some progress on this issue - If I stop the master then I can delete the fate transaction from zookeeper. First I used accumulo org.apache.accumulo.server.fate.Admin print | grep CompactRange to find the transactions and then accumulo o.a.a.s.f.Admin delete id to delete it. Started the master back up, manually peeked in zookeeper, and the transaction was gone. That said, looking at the monitor page there are still all of my compactions queued up, so I don't think that actually did anything. Is there another place that I need to look? I saw that zk has /accumulo/id/tables/tid/compact-id entry, but I don't know how that relates. Mike On Wed, May 15, 2013 at 2:56 AM, John Vines vi...@apache.org wrote: I do not believe there is a way to tell a tserver to cancel all compactions. It would be a nice feature though. Mind putting on a ticket? Sorry for the dupe mike, missed hitting reply all Sent from my phone, please pardon the typos and brevity. On May 14, 2013 11:54 PM, Mike Drob md...@mdrob.com wrote: Can I leave the ones that are already running and just dispose of the queued compactions? If not, that seems like a pretty serious limitation. On Wed, May 15, 2013 at 2:51 AM, John Vines vi...@apache.org wrote: I'm not sure if it's possible. Scheduling a compaction is an entry in the metadata table. But once it gets triggered, there are then compactions scheduled locally for the tserver. You might be able to delete the flag and bounce all the tservers to stop it, but I can't say for certain. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 11:48 PM, Mike Drob md...@mdrob.com wrote: Somebody (totally not me) accidentally kicked off a full table compaction using Accumulo 1.4.3. There's a large number of them waiting and the queue is decreasing very slowly - what are my options for improving the situation. Ideally, I would be able to just cancel everything and then come back with a more precise approach. Thanks, Mike
Re: [VOTE] 1.5.0-RC2
I noticed that ACCUMULO-970 still has 8 open issues. I would like to see those all resolved before 1.5 is actually released. On Thu, May 9, 2013 at 5:36 PM, Keith Turner ke...@deenlo.com wrote: On Thu, May 9, 2013 at 5:23 PM, Christopher ctubb...@apache.org wrote: Keith, I assume you mean the docs/apidocs directory is missing? Or did you mean the javadoc jars (which were intentionally omitted)? I was refering to docs/apidocs. The documentation available through the monitor references this. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Thu, May 9, 2013 at 4:05 PM, Keith Turner ke...@deenlo.com wrote: On Thu, May 9, 2013 at 3:42 PM, Christopher ctubb...@apache.org wrote: On Thu, May 9, 2013 at 2:34 PM, Keith Turner ke...@deenlo.com wrote: Are you thinking of maunually renaming the tar, rpm, and debs, replacing accumulo-assemble w/ accumulo, when these are pushed out to mirrors? For the tar this would require untar, rename, tar and recomputing the sigs and hashes. I was thinking about renaming the RPM and DEBs to conform to their respective naming conventions, but I see no reason to change the tarball names, or contents. No recalculations of sigs or hashes would be necessary for just a filename change. When I untar accumulo-assemble-1.5.0-bin.tar.gz and end up with a dir named accumulo-assemble-1.5.0, I find that really screwy. I understand how this came to be. But the name does not make sense from the perspective of an outsider. I would be happy reroll this tar ball with a dir name of accumulo-1.5.0. That would change the tars contents and require resigning. I can do this, post it, and we can include that in the vote. Also, the bin.tar.gz does not include the javadocs. Voting -1 based on the javadocs. If there is an artifact that should be built in a different way, with a different naming convention, please let me know, and I'll make Maven do it (though I think the docs current specify the names as they are right now). On Wed, May 8, 2013 at 8:31 PM, Christopher ctubb...@apache.org wrote: 1.5.0-RC2 for review. Might as well vote, also, as it's easily recalled if it's not up to par. https://repository.apache.org/content/repositories/orgapacheaccumulo-024/ -- Christopher L Tubbs II http://gravatar.com/ctubbsii -- Forwarded message -- From: Nexus Repository Manager ne...@repository.apache.org Date: Wed, May 8, 2013 at 8:26 PM Subject: Nexus: Staging Completed. To: Christopher Tubbs ctubb...@gmail.com Description: 1.5.0-RC2 Details: The following artifacts have been staged to the org.apache.accumulo-024 (u:ctubbsii, a:173.66.3.39) repository. archetype-catalog.xml accumulo-1.5.0-source-release.zip accumulo-1.5.0-source-release.tar.gz.asc accumulo-1.5.0.pom accumulo-1.5.0-site.xml accumulo-1.5.0.pom.asc accumulo-1.5.0-source-release.zip.asc accumulo-1.5.0-source-release.tar.gz accumulo-1.5.0-site.xml.asc accumulo-examples-1.5.0.pom.asc accumulo-examples-1.5.0.pom accumulo-core-1.5.0.pom.asc accumulo-core-1.5.0-javadoc.jar accumulo-core-1.5.0-sources.jar accumulo-core-1.5.0-javadoc.jar.asc accumulo-core-1.5.0.pom accumulo-core-1.5.0.jar accumulo-core-1.5.0-sources.jar.asc accumulo-core-1.5.0.jar.asc accumulo-examples-simple-1.5.0.jar accumulo-examples-simple-1.5.0.jar.asc accumulo-examples-simple-1.5.0-javadoc.jar.asc accumulo-examples-simple-1.5.0.pom.asc accumulo-examples-simple-1.5.0-sources.jar accumulo-examples-simple-1.5.0-javadoc.jar accumulo-examples-simple-1.5.0-sources.jar.asc accumulo-examples-simple-1.5.0.pom accumulo-test-1.5.0-sources.jar.asc accumulo-test-1.5.0.pom accumulo-test-1.5.0.jar.asc accumulo-test-1.5.0.pom.asc accumulo-test-1.5.0-javadoc.jar.asc accumulo-test-1.5.0-sources.jar accumulo-test-1.5.0.jar accumulo-test-1.5.0-javadoc.jar accumulo-assemble-1.5.0.pom accumulo-assemble-1.5.0-test.deb accumulo-assemble-1.5.0-test.deb.asc accumulo-assemble-1.5.0-native.deb accumulo-assemble-1.5.0-bin.rpm.asc accumulo-assemble-1.5.0-bin.tar.gz.asc accumulo-assemble-1.5.0-bin.deb accumulo-assemble-1.5.0.pom.asc accumulo-assemble-1.5.0-bin.deb.asc accumulo-assemble-1.5.0-bin.rpm accumulo-assemble-1.5.0-native.deb.asc accumulo-assemble-1.5.0-bin.tar.gz accumulo-assemble-1.5.0-native.rpm.asc accumulo-assemble-1.5.0-native.rpm accumulo-proxy-1.5.0-javadoc.jar.asc accumulo-proxy-1.5.0-sources.jar accumulo-proxy-1.5.0.pom.asc accumulo-proxy-1.5.0-javadoc.jar accumulo-proxy-1.5.0.jar.asc accumulo-proxy-1.5.0.jar accumulo-proxy-1.5.0-sources.jar.asc accumulo-proxy-1.5.0.pom accumulo-trace-1.5.0.jar.asc accumulo-trace-1.5.0-javadoc.jar accumulo-trace-1.5.0.pom.asc accumulo-trace-1.5.0-sources.jar.asc
Re: Using supervisor to monitor Accumulo
I've seen people use puppet to achieve the same goal with reasonable amounts of success. On Wed, May 8, 2013 at 6:33 PM, Phil Eberhardt p...@sqrrl.com wrote: Hello, I was looking into using supervisor (http://supervisord.org/index.html) to monitor a daemon running on top of Accumulo. I heard that Jason Trost may have mentioned using supervisor to monitor Accumulo and restart it if it stopped running in a presentation at Hadoop World. I was wondering if anyone was monitoring the Accumulo daemon and restarted it successfully using supervisor so I could do something similar. Thanks, Phil Eberhardt
Re: 807 tablets, for the same table, on one tserver?
Grep the master logs for balance usually gives some clue. On Apr 11, 2013 7:45 AM, David Medinets david.medin...@gmail.com wrote: From behaviour that I've witnessed before, on v1.4.1, Accumulo spreads tablets across the cluster. However, this morning I am seeing 807 tablets for the same table on one tserver which was unexpected. What affects the movement of tablets? Or perhaps more importantly, what might prevent the movement?
Re: Custom Iterators - behavior when switching tablets
David, This doesn't answer your design questions, but it might help shed some light on how to properly handle losing the sort order. Brian did a lot of work on this in https://issues.apache.org/jira/browse/ACCUMULO-956 so I highly recommend looking there and comparing to what you've developed. It's a tricky problem, but I think it is good to get your insights and experience from it added to the collective knowledge. Mike On Tue, Jan 22, 2013 at 11:55 AM, Slater, David M. david.sla...@jhuapl.eduwrote: In designing some of my own custom iterators, I was noticing some interesting behavior. Note: my iterator does not return the original key, but instead returns a computed value that is not necessarily in lexicographic order. ** ** So far as I can tell, when the Scanner switches between tablets, it checks the key that is returned in the new tablet and compares it (I think it compares key.row()) with the last key from the previous tablet. If the new key is greater than the previous one, then it proceeds normally. If, however, the new key is less than or equal to the previous key, then the Scanner does not return the value. It does, however, continue to iterate through the tablet, continuing to compare until it finds a key greater than the last one. Once it finds one, however, it progresses through the rest of that tablet without doing a check. (It implicitly assumes that everything in a tablet will be correctly ordered). ** ** Now if I was to return the original key, it would work fine (since it would always be in order), but that also limits the functionality of my custom iterator. ** ** My primary question is: why would it be designed this way? When switching between tablets, are there potential problems that might crop up if this check isn’t done? ** ** Thanks, David
Re: Satisfying Zookeper dependency when installing Accumulo in CentOS
RPM is looking for a zookeeper package on the system to satisfy the automatic dependency management. The installation instructions you linked to for ZK seem to imply using a downloaded tar. If that's the case then you'll need to either find a ZK RPM, install Accumulo using a tar, or install Accumulo via RPM using the --nodeps option. On Wed, Dec 19, 2012 at 5:03 PM, Kevin Pauli ke...@thepaulis.com wrote: I'm trying to install Accumulo in CentOS. I have installed the jdk and hadoop, but can't seem to make Accumulo install happy wrt zookeeper. I installed Zookeper according to the instructions here: http://zookeeper.apache.org/doc/r3.4.5/zookeeperStarted.html#sc_InstallingSingleMode And Zookeeper is running: $ sudo bin/zkServer.sh start JMX enabled by default Using config: /usr/lib/zookeeper-3.4.5/bin/../conf/zoo.cfg Starting zookeeper ... STARTED But when trying to install Accumulo, this is what I get: $ sudo rpm -ivh Downloads/accumulo-1.4.2-1.amd64.rpm error: Failed dependencies: zookeeper is needed by accumulo-1.4.2-1.amd64 -- Regards, Kevin Pauli
Re: Satisfying Zookeper dependency when installing Accumulo in CentOS
Default install is under /opt/accumulo If locate doesn't find something, you can also try updatedb On Wed, Dec 19, 2012 at 5:33 PM, Kevin Pauli ke...@thepaulis.com wrote: I was worried about forcing the rpm installation with --nodeps b/c I wasn't sure if there was some kind of linkage that would be formed from accumulo to the zookeeper package, which, due to zookeeper not being a true package, would cause accumulo to fail at runtime. But, based on your advice, I went ahead and installed Accumulo via rpm with the --nodeps option. It completed without errors, and I was about to proceed with the next step of modifying conf/accumulo-env.sh I can't seem to find where it is! locate accumulo-env.sh is resulting in no hits. Where would the rpm installation have put Accumulo? On Wed, Dec 19, 2012 at 4:10 PM, Mike Drob md...@mdrob.com wrote: RPM is looking for a zookeeper package on the system to satisfy the automatic dependency management. The installation instructions you linked to for ZK seem to imply using a downloaded tar. If that's the case then you'll need to either find a ZK RPM, install Accumulo using a tar, or install Accumulo via RPM using the --nodeps option. On Wed, Dec 19, 2012 at 5:03 PM, Kevin Pauli ke...@thepaulis.com wrote: I'm trying to install Accumulo in CentOS. I have installed the jdk and hadoop, but can't seem to make Accumulo install happy wrt zookeeper. I installed Zookeper according to the instructions here: http://zookeeper.apache.org/doc/r3.4.5/zookeeperStarted.html#sc_InstallingSingleMode And Zookeeper is running: $ sudo bin/zkServer.sh start JMX enabled by default Using config: /usr/lib/zookeeper-3.4.5/bin/../conf/zoo.cfg Starting zookeeper ... STARTED But when trying to install Accumulo, this is what I get: $ sudo rpm -ivh Downloads/accumulo-1.4.2-1.amd64.rpm error: Failed dependencies: zookeeper is needed by accumulo-1.4.2-1.amd64 -- Regards, Kevin Pauli -- Regards, Kevin Pauli
Re: Authentication - Kerberose
There are a couple tickets that involve making Accumulo and Kerberos play nice - https://issues.apache.org/jira/browse/ACCUMULO-404 was to get accumulo running on a kerberized HDFS https://issues.apache.org/jira/browse/ACCUMULO-259 is for potentially delegating the authentications to an external system (i.e. KRB) It looks like it was planned for 1.5, but John can probably chime in and let us know of the status. Mike On Thu, Nov 1, 2012 at 2:20 PM, Michael Peterson mike.peter...@ptech-llc.com wrote: To whom it may concern: Can you please provide a schedule of if/when Accumulo will use Kerberose for authentication? I’m working with multiple customers that are collecting data to make decisions to use Accumulo or other technologies. The absence of strong authentication with Accumulo is a major concern (and less subjective). ** ** Also, is there a POC for new features that will be available in 1.5? Thanks, Mike Peterson, Owner Peterson Technologies, LLC 240-456-0094, ext 111 240-456-0096 fax 410-218-4004 cell Certified 8(a), SDVOSB, MBE/A ** **
Re: Accumulo and Java 7
Turns out I had an errant security policy in place, thanks all! On Tue, Aug 28, 2012 at 7:50 PM, Eric Newton eric.new...@gmail.com wrote: Check that the memory configuration you are using is appropriate for your system. The master/monitor are relatively small processes in 1.4. Make sure the write-ahead log directory exists on all nodes. Be sure to check the .err/.out files. If you don't have .err/.out files, double check your ssh configuration. -Eric On Tue, Aug 28, 2012 at 7:41 PM, Gabe Bell christiang...@gmail.comwrote: I have Accumulo 1.5 HEAD running on JDK 1.7 on Ubuntu 12.04 x64. It runs fine On Aug 28, 2012, at 7:08 PM, Mike Drob md...@mdrob.com wrote: Does anybody have experience with running Accumulo on top of Java 7? The mailing list archives show that David Medinets tried compiling 1.3.5 on the openjdk implementation back in December, but it doesn't look like there was much follow up on it. When I'm trying to use the 1.4.1 dist tarball on CDH3, my gc and tracer start fine but the master and monitor silently fail. I haven't yet tried to fire up tablet servers. All logs are painfully bare. Any ideas from the wisdom of the internet? Mike
Accumulo and Java 7
Does anybody have experience with running Accumulo on top of Java 7? The mailing list archives show that David Medinets tried compiling 1.3.5 on the openjdk implementation back in December, but it doesn't look like there was much follow up on it. When I'm trying to use the 1.4.1 dist tarball on CDH3, my gc and tracer start fine but the master and monitor silently fail. I haven't yet tried to fire up tablet servers. All logs are painfully bare. Any ideas from the wisdom of the internet? Mike