[ https://issues.apache.org/jira/browse/ACCUMULO-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764264#comment-13764264 ]
ASF subversion and git services commented on ACCUMULO-1685: ----------------------------------------------------------- Commit e2bc157134878dcbf74c7a1f075219b07705bf2d in branch refs/heads/master from [~ecn] [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e2bc157 ] ACCUMULO-1685 noticed a double-slash in computed file names > bench testing shows that the NN loses the WAL > --------------------------------------------- > > Key: ACCUMULO-1685 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1685 > Project: Accumulo > Issue Type: Bug > Components: tserver > Environment: Hadoop 1.0.4, single node dev't system > Reporter: Eric Newton > Assignee: Eric Newton > Priority: Critical > Fix For: 1.6.0 > > > Doing bench testing; I build accumulo: > {noformat} > $ mvn -Pnative package -DskipTests > {noformat} > I go into the assembly area and configure and run accumulo > {noformat} > $ cd assemble/target/accumulo-1.6.0-SNAPSHOT-dev/accumulo-1.6.0-SNAPSHOT > $ cp ~/conf/* conf > $ hadoop fs -rmr /accumulo > Moved to trash: hdfs://somehost:9000/accumulo > $ ( echo test ; echo Y ; echo secret ; echo secret ) | ./bin/accumulo init > $ 2013-09-04 12:23:51,558 [util.Initialize] INFO : Hadoop Filesystem is > hdfs://somehost:9000 > 2013-09-04 12:23:51,559 [util.Initialize] INFO : Accumulo data dirs are > [hdfs://somehost:9000/accumulo] > 2013-09-04 12:23:51,559 [util.Initialize] INFO : Zookeeper server is > localhost:2181 > 2013-09-04 12:23:51,559 [util.Initialize] INFO : Checking if Zookeeper is > available. If this hangs, then you need to make sure zookeeper is running > Instance name : test > Instance name "test" exists. Delete existing entry from zookeeper? [Y/N] : Y > Enter initial password for root (this may not be applicable for your security > setup): ****** > Confirm initial password for root: ****** > $ ./bin/start-all.sh > Starting monitor on localhost > Starting tablet servers .... done > Starting tablet server on localhost > 2013-09-04 12:26:24,545 [server.Accumulo] INFO : Attempting to talk to > zookeeper > 2013-09-04 12:26:24,675 [server.Accumulo] INFO : Zookeeper connected and > initialized, attemping to talk to HDFS > 2013-09-04 12:26:24,679 [server.Accumulo] INFO : Connected to HDFS > Starting master on localhost > Starting garbage collector on localhost > Starting tracer on localhost > {noformat} > Next, create a table > {noformat} > $ ./bin/accumulo shell -u root -p secret > 2013-09-04 12:27:01,628 [shell.Shell] WARN : Specifying a raw password is > deprecated. > Shell - Apache Accumulo Interactive Shell > - > - version: 1.6.0-SNAPSHOT > - instance name: test > - instance id: 1967c1ec-cc0f-439b-b4da-4029debd16e3 > - > - type 'help' for a list of available commands > - > root@test> createtable t > root@test t> > {noformat} > Then I checked the tserver log for the write-ahead log created for this > update to the root table: > {noformat} > $ fgrep -a /wal/ logs/tserver_*.debug.log > 2013-09-04 12:26:27,130 [log.DfsLogger] DEBUG: Got new write-ahead log: > localhost+9997/hdfs://rd6ul-14706v.tycho.ncsc.mil:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9 > 2013-09-04 12:26:58,264 [tabletserver.Tablet] DEBUG: Logs for memory > compacted: !!R<< > localhost+9997/hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9 > {noformat} > Now, let's check for the file: > {noformat} > $ hadoop fs -ls > hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9 > ls: Cannot access > hdfs://somehost:9000/accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9: > No such file or directory. > {noformat} > What? > Check the NN logs: > {noformat} > $ fgrep 1dd2727f /some/log/dir/hadoop-ecnewt2-local-namenode-somehost.log > 2013-09-04 12:26:27,075 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.allocateBlock: > /accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9. > blk_-6011963215434912690_971163 > 2013-09-04 12:26:27,113 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.fsync: file > /accumulo/wal/localhost+9997/1dd2727f-1de9-417b-a5a2-e56f7d8020a9 for > DFSClient_-787226921 > {noformat} > So, the NN seems to be making the file, but it's not there when we go to look! > Here's my hdfs-site.xml file: > {noformat} > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > <!-- Put site-specific property overrides in this file. --> > <configuration> > <property> > <name>dfs.replication</name> > <value>1</value> > </property> > <property> > <name>dfs.name.dir</name> > <value>/local/ecn/data/hadoop/nn</value> > </property> > <property> > <name>dfs.data.dir</name> > > <value>/disk01/data/hadoop/dn,/disk02/data/hadoop/dn,/disk03/data/hadoop/dn</value> > </property> > <property> > <name>dfs.support.append</name> > <value>true</value> > </property> > <property> > <name>dfs.data.synconclose</name> > <value>true</value> > </property> > </configuration> > {noformat} > I have written an integration test that I dumped into RestartIT.java, but > that doesn't seem to fail in same way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira