Re: About debugging HDFS and MapReduce...

2013-10-06 Thread Ted Yu
Your image didn't go through. 
You may want to upload it to image sharing site.

What version of Hadoop are you using ?

Thanks

On Oct 6, 2013, at 8:56 PM, 남윤민  wrote:

> Hello!!
> My name is Yoonmin Nam from South Korea.
> 
> Currently, I am trying to debug namenode and datanode using eclipse and 
> fully-distributed environemnt in my lab. (1 master and 10 nodes)
> 
> However, I can do debug but not fully, just some portion of source code of 
> hadoop.
> 
> This means I cannot see how some daemon thread such as DataStreamer.
> Also, I want to see how NameNode thread and DataNode thread work but I can't.
> 
> As you can see, there is Daemon thread (Thre ad-2) and it might be 
> DataStreamer thread but.. I can't see.
> 
>  
> 
> 
> So, I think everyone in the hdfs-dev have very professional experience in 
> developing HDFS and MapReduce and I want to ask you about how can I see all 
> real daemons are working for academic reason.
> 
> Even you have no idea about it, may I ask you that how you test your 
> implementation of HDFS or MapReduce?
> 
> 
> Thanks!!
>  
> 
> 
> 
> 
> // Yoonmin Nam
> 


Re: Build failed in Jenkins: Hadoop-Hdfs-trunk #1552

2013-10-14 Thread Ted Yu
I noticed test failures in 'Hadoop HDFS' caused tests in other modules to
be skipped:

[INFO] Apache Hadoop HDFS  FAILURE
[1:44:29.257s]
[INFO] Apache Hadoop HttpFS .. SKIPPED
[INFO] Apache Hadoop HDFS BookKeeper Journal . SKIPPED
[INFO] Apache Hadoop HDFS-NFS  SKIPPED

Should we apply '--fail-at-end' maven option so that all tests are run ?

See http://maven.apache.org/guides/mini/guide-multiple-modules.html

Cheers


On Mon, Oct 14, 2013 at 6:19 AM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See 
>
> Changes:
>
> [sandy] MAPREDUCE-5463. Deprecate SLOTS_MILLIS counters. (Tzuyoshi Ozawa
> via Sandy Ryza)
>
> [sandy] YARN-305. Fair scheduler logs too many "Node offered to app"
> messages. (Lohit Vijayarenu via Sandy Ryza)
>
> [sseth] MAPREDUCE-5329. Allow MR applications to use additional
> AuxServices, which are compatible with the default MapReduce shuffle.
> Contributed by Avner BenHanoch.
>
> --
> [...truncated 11425 lines...]
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.954 sec
> - in org.apache.hadoop.hdfs.TestSetrepIncreasing
> Running org.apache.hadoop.hdfs.TestEncryptedTransfer
> Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 79.573
> sec - in org.apache.hadoop.hdfs.TestEncryptedTransfer
> Running org.apache.hadoop.hdfs.TestDFSUpgrade
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.229 sec
> - in org.apache.hadoop.hdfs.TestDFSUpgrade
> Running org.apache.hadoop.hdfs.TestCrcCorruption
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.306 sec
> - in org.apache.hadoop.hdfs.TestCrcCorruption
> Running org.apache.hadoop.hdfs.TestHFlush
> Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.177 sec
> - in org.apache.hadoop.hdfs.TestHFlush
> Running org.apache.hadoop.hdfs.TestFileAppendRestart
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.167 sec
> - in org.apache.hadoop.hdfs.TestFileAppendRestart
> Running org.apache.hadoop.hdfs.TestDatanodeReport
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.518 sec
> - in org.apache.hadoop.hdfs.TestDatanodeReport
> Running org.apache.hadoop.hdfs.TestShortCircuitLocalRead
> Tests run: 10, Failures: 0, Errors: 0, Skipped: 10, Time elapsed: 0.195
> sec - in org.apache.hadoop.hdfs.TestShortCircuitLocalRead
> Running org.apache.hadoop.hdfs.TestFileInputStreamCache
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.205 sec
> - in org.apache.hadoop.hdfs.TestFileInputStreamCache
> Running org.apache.hadoop.hdfs.TestRestartDFS
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.3 sec -
> in org.apache.hadoop.hdfs.TestRestartDFS
> Running org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.577 sec
> - in org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
> Running org.apache.hadoop.hdfs.TestDFSRemove
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.897 sec
> - in org.apache.hadoop.hdfs.TestDFSRemove
> Running org.apache.hadoop.hdfs.TestHDFSTrash
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.345 sec
> - in org.apache.hadoop.hdfs.TestHDFSTrash
> Running org.apache.hadoop.hdfs.TestClientReportBadBlock
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 63.52 sec
> - in org.apache.hadoop.hdfs.TestClientReportBadBlock
> Running org.apache.hadoop.hdfs.TestQuota
> Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.236 sec
> - in org.apache.hadoop.hdfs.TestQuota
> Running org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.421 sec
> - in org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart
> Running org.apache.hadoop.hdfs.TestDatanodeRegistration
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.058 sec
> - in org.apache.hadoop.hdfs.TestDatanodeRegistration
> Running org.apache.hadoop.hdfs.TestAbandonBlock
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.214 sec
> - in org.apache.hadoop.hdfs.TestAbandonBlock
> Running org.apache.hadoop.hdfs.TestDFSShell
> Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.489
> sec - in org.apache.hadoop.hdfs.TestDFSShell
> Running org.apache.hadoop.hdfs.TestListFilesInDFS
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.339 sec
> - in org.apache.hadoop.hdfs.TestListFilesInDFS
> Running org.apache.hadoop.hdfs.TestParallelShortCircuitReadUnCached
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 0.163 sec
> - in org.apache.hadoop.hdfs.TestParallelShortCircuitReadUnCached
> Running org.apache.hadoop.hdfs.TestPeerCache
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elap

Re: Intermittent test errors in Hadoop-hdfs project

2013-12-25 Thread Ted Yu
TestOfflineEditsViewer is the one appearing in Jenkins builds.

I ran the other tests locally on Mac and they passed.

Cheers


On Wed, Dec 25, 2013 at 5:38 AM, Abhijeet Apsunde <
abhijeet_apsu...@persistent.co.in> wrote:

> Hi all,
>
> I'm running Junit tests suite for Hadoop-hdfs project, however none of 3
> run attempts could complete successfully.
> There were random test failing due to some error and one consistent
> failure.
>
> I believe one consistently broken test is due to following change, since
> corresponding "editsStored" binary file is not checked in with modified
> "editsStored.xml" file in this particular revision.
> ###
> Revision: 1552841
> Author: cmccabe
> Date: Saturday, December 21, 2013 4:57:20 AM
> Message:
> HDFS-5636. Enforce a max TTL per cache pool (awang via cmccabe)
> 
> .
> .
> Modified :
> /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml
> .
> ###
>
> 'Tests in error' passed successfully when ran separately. Test results
> snippet for all 3 runs is attached at the end of this mail.
> I would like to know if others also have noted similar observation or
> there's something wrong with my setup.
> If it's a common behaviour how does apache CI handles such scenarios while
> testing a patch?
>
> Thanks,
> Abhijeet.
>
>
>
>
>
> __Test results snippets 
>
> Run1
> Failed tests:
>   TestOfflineEditsViewer.testStored:193 Reference XML edits and parsed to
> XML should be same
>
> Tests in error:
>   TestHDFSFileSystemContract.setUp:39 » IO Failed to save in any storage
> directo...
>   TestDatanodeReport.testDatanodeReport:48 » InconsistentFSState
> Directory /home...
> ###
> Run2
> Failed tests:
>   TestOfflineEditsViewer.testStored:193 Reference XML edits and parsed to
> XML should be same
>
> Tests in error:
>   TestLeaseRecovery.testBlockSynchronization:80 » IO NameNode is not
> formatted.
>   TestDataNodeVolumeFailureToleration.setUp:71 » IO Timed out waiting for
> Mini H...
>   TestDataNodeVolumeFailureToleration.tearDown:83 NullPointer
> ###
> Run 3
> Failed tests:
>   TestOfflineEditsViewer.testStored:193 Reference XML edits and parsed to
> XML should be same
>
> Tests in error:
>   TestAppendDifferentChecksum.setupCluster:52 » IO Cannot lock storage
> /home/abh...
>   TestFileAppend3.testTC12 » Remote No lease on /TC12/foo: File is not
> open for ...
>   TestHDFSFileSystemContract.setUp:39 » InconsistentFSState Directory
> /home/abhi...
> ##
>
>
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>
>


Re: New NameNode UI does not support Internet Explorer

2014-02-06 Thread Ted Yu
Attachment didn't go through. 
Consider using third party website for the image. 

Cheers

On Feb 6, 2014, at 1:16 AM, Vinayakumar B  wrote:

> Some problem with Image. Attached the same image as file.
>  
> From: Vinayakumar B [mailto:vinayakuma...@huawei.com] 
> Sent: 06 February 2014 14:15
> To: hdfs-dev@hadoop.apache.org
> Subject: New NameNode UI does not support Internet Explorer
>  
> Hi,
>  
> Currently new UI for namenode does not support Internet Explorer. But Chrome 
> and Mozilla UI looks good. I got the page as below in IE.
>  
> 
>  
>  
> Is everyone facing this issue or only coming for me..?
>  
> If Internet explorer support is not provided now, is there any plan to 
> provide same in future?
>  
> Regards,
> Vinayakumar B


Re: New NameNode UI does not support Internet Explorer

2014-02-06 Thread Ted Yu
Could the difference be due to versions of Internet explorer ?


On Thu, Feb 6, 2014 at 11:16 AM, Haohui Mai  wrote:

> Works for me on a windows 8 machine.
>
> http://filebin.ca/1BRAGUqlZZ73/webui-ie.png
>
> ~Haohui
>
>
> On Thu, Feb 6, 2014 at 8:42 AM, Vinayakumar B  >wrote:
>
> > [image: Inline image 1]
> > [image: Inline image 2]
> >
> > Attaching the Image from home.
> >
> >
> > On Thu, Feb 6, 2014 at 6:40 PM, Ted Yu  wrote:
> >
> >> Attachment didn't go through.
> >> Consider using third party website for the image.
> >>
> >> Cheers
> >>
> >> On Feb 6, 2014, at 1:16 AM, Vinayakumar B 
> >> wrote:
> >>
> >> > Some problem with Image. Attached the same image as file.
> >> >
> >> > From: Vinayakumar B [mailto:vinayakuma...@huawei.com]
> >> > Sent: 06 February 2014 14:15
> >> > To: hdfs-dev@hadoop.apache.org
> >> > Subject: New NameNode UI does not support Internet Explorer
> >> >
> >> > Hi,
> >> >
> >> > Currently new UI for namenode does not support Internet Explorer. But
> >> Chrome and Mozilla UI looks good. I got the page as below in IE.
> >> >
> >> >
> >> >
> >> >
> >> > Is everyone facing this issue or only coming for me..?
> >> >
> >> > If Internet explorer support is not provided now, is there any plan to
> >> provide same in future?
> >> >
> >> > Regards,
> >> > Vinayakumar B
> >>
> >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: Jenkins running?

2014-05-12 Thread Ted Yu
https://builds.apache.org/job/Hadoop-hdfs-trunk/ was not responding when I
checked a moment ago.

https://builds.apache.org/job/PreCommit-HDFS-Build
 neither.

On Mon, May 12, 2014 at 5:55 AM, Uma Maheswara Rao G
wrote:

>  I am not seeing pre-commit builds running on uploaded patches.
>
> Regards,
> Uma
>


Re: hedged read bug

2014-06-09 Thread Ted Yu
Lei:
If you can attach the test code from your first email to HDFS-6494, that
would help us know the scenario you were referring to.

Cheers


On Mon, Jun 9, 2014 at 12:06 PM, Chris Nauroth 
wrote:

> Hi Lei,
>
> I just reviewed this code path on trunk again, and I couldn't find a
> problem.  It appears to me that if one future fails, then the exception
> handling logic will allow the other future to proceed without canceling.
>  Also, I haven't been able to reproduce the infinite loop that you reported
> with the test case that you gave.
>
> However, if you're still seeing a bug on your side, then I recommend filing
> a new jira issue with a full description.  We can continue troubleshooting
> there.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Sun, Jun 8, 2014 at 8:16 PM, lei liu  wrote:
>
> > Hi Chris,
> >
> > I review the patch, I think there is problem in the patch.
> >
> > Example  there are two futures, if the first return futrue is failure
>  and
> > then the the second future will be cancled.
> >
> >
> > 2014-06-07 3:44 GMT+08:00 Chris Nauroth :
> >
> > > Hello Lei,
> > >
> > > There is a known bug in 2.4.0 that can cause hedged reads to hang.  I
> > fixed
> > > it in HDFS-6231:
> > >
> > > https://issues.apache.org/jira/browse/HDFS-6231
> > >
> > > This patch will be included in the forthcoming 2.4.1 release.  I'm
> > curious
> > > to see if applying this patch fixes the problem for you.  Can you try
> it
> > > and let us know?  Thank you!
> > >
> > > Chris Nauroth
> > > Hortonworks
> > > http://hortonworks.com/
> > >
> > >
> > >
> > > On Thu, Jun 5, 2014 at 8:34 PM, lei liu  wrote:
> > >
> > > > I use hadoop2.4.
> > > >
> > > > When I use "hedged read", If there is only one live datanode, the
> > reading
> > > > from  the datanode throw TimeoutException and ChecksumException., the
> > > > Client will infinite wait.
> > > >
> > > > Example below test case:
> > > >   @Test
> > > >   public void testException() throws IOException,
> InterruptedException,
> > > > ExecutionException {
> > > > Configuration conf = new Configuration();
> > > > int numHedgedReadPoolThreads = 5;
> > > > final int hedgedReadTimeoutMillis = 50;
> > > >
> > conf.setInt(DFSConfigKeys.DFS_DFSCLIENT_HEDGED_READ_THREADPOOL_SIZE,
> > > > numHedgedReadPoolThreads);
> > > >
> > > conf.setLong(DFSConfigKeys.DFS_DFSCLIENT_HEDGED_READ_THRESHOLD_MILLIS,
> > > >   hedgedReadTimeoutMillis);
> > > > // Set up the InjectionHandler
> > > > DFSClientFaultInjector.instance =
> > > > Mockito.mock(DFSClientFaultInjector.class);
> > > > DFSClientFaultInjector injector =
> DFSClientFaultInjector.instance;
> > > > // make preads ChecksumException
> > > > Mockito.doAnswer(new Answer() {
> > > >   @Override
> > > >   public Void answer(InvocationOnMock invocation) throws
> Throwable
> > {
> > > > if(true) {
> > > >   Thread.sleep(hedgedReadTimeoutMillis + 10);
> > > >   throw new ChecksumException("test", 100);
> > > > }
> > > > return null;
> > > >   }
> > > > }*).when(injector).fetchFromDatanodeException();*
> > > >
> > > > MiniDFSCluster cluster = new
> > > > MiniDFSCluster.Builder(conf).numDataNodes(3).format(true).build();
> > > > DistributedFileSystem fileSys = cluster.getFileSystem();
> > > > DFSClient dfsClient = fileSys.getClient();
> > > > DFSHedgedReadMetrics metrics = dfsClient.getHedgedReadMetrics();
> > > >
> > > > try {
> > > >   Path file = new Path("/hedgedReadException.dat");
> > > >   FSDataOutputStream  output = fileSys.create(file,(short)1);
> > > >   byte[] data = new byte[64 * 1024];
> > > >   output.write(data);
> > > >   output.flush();
> > > >   output.write(data);
> > > >   output.flush();
> > > >   output.write(data);
> > > >   output.flush();
> > > >   output.close();
> > > >   byte[] buffer = new byte[64 * 1024];
> > > >   FSDataInputStream  input = fileSys.open(file);
> > > >   input.read(0, buffer, 0, 1024);
> > > >   input.close();
> > > >   assertTrue(metrics.getHedgedReadOps() == 1);
> > > >   assertTrue(metrics.getHedgedReadWins() == 1);
> > > > } finally {
> > > >   fileSys.close();
> > > >   cluster.shutdown();
> > > >   Mockito.reset(injector);
> > > > }
> > > >   }
> > > >
> > > >
> > > > *The code of actualGetFromOneDataNode() method call
> > > > **fetchFromDatanodeException()
> > > > method as below:*
> > > >   try {
> > > > *DFSClientFaultInjector.get().fetchFromDatanodeException();*
> > > > Token blockToken =
> block.getBlockToken();
> > > > int len = (int) (end - start + 1);
> > > > reader = new BlockReaderFactory(dfsClient.getConf()).
> > > > setInetSocketAddress(targetAddr).
> > > > setRemotePeerFactory(dfsClient).
> > > > setDatanodeInfo(chosenNode).
> > > > setFileName(src).
> > > > setBloc

Re: I need to debug namenode shut-down

2014-07-07 Thread Ted Yu
Adding / utilizing logs would be better compared to setting breakpoints.
There're many moving parts in Namenode.

Cheers


On Sun, Jul 6, 2014 at 10:15 AM, MrAsanjar .  wrote:

> Any advises within which hdfs class->method I need to enable breakpoint(s)?
> Thanks in advance
>


hdfs Jenkins build fails due to findbugs not finishing

2014-07-29 Thread Ted Yu
Hi,
You may have noticed the following in recent hdfs Jenkins builds:

[INFO] 

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-antrun-plugin:1.7:run
(site) on project hadoop-hdfs: An Ant BuildException has occured: input
file <
https://builds.apache.org/job/Hadoop-Hdfs-trunk/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml>
does not exist
[ERROR] around Ant part ...https://builds.apache.org/job/Hadoop-Hdfs-trunk/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml";>
out="<
https://builds.apache.org/job/Hadoop-Hdfs-trunk/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/site/findbugs.html"/>...>
@ 44:322 in <
https://builds.apache.org/job/Hadoop-Hdfs-trunk/ws/trunk/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
>
[ERROR] -> [Help 1]

Looking at https://builds.apache.org/job/Hadoop-Hdfs-trunk/1819/consoleFull
:

 [java] Exception in thread "main" java.lang.OutOfMemoryError: GC
overhead limit exceeded
 [java] at
edu.umd.cs.findbugs.ba.npe.IsNullValueAnalysis.createFact(IsNullValueAnalysis.java:208)
 [java] at
edu.umd.cs.findbugs.ba.npe.IsNullValueAnalysis.getFactAtMidEdge(IsNullValueAnalysis.java:849)
 [java] at
edu.umd.cs.findbugs.ba.npe.IsNullValueDataflow.getFactAtMidEdge(IsNullValueDataflow.java:33)
 [java] at
edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.examineNullValues(NullDerefAndRedundantComparisonFinder.java:289)
 [java] at
edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.execute(NullDerefAndRedundantComparisonFinder.java:150)
 [java] at
edu.umd.cs.findbugs.detect.FindNullDeref.analyzeMethod(FindNullDeref.java:278)
 [java] at
edu.umd.cs.findbugs.detect.FindNullDeref.visitClassContext(FindNullDeref.java:205)
 [java] at
edu.umd.cs.findbugs.DetectorToDetector2Adapter.visitClass(DetectorToDetector2Adapter.java:68)
 [java] at
edu.umd.cs.findbugs.FindBugs2.analyzeApplication(FindBugs2.java:979)
 [java] at edu.umd.cs.findbugs.FindBugs2.execute(FindBugs2.java:230)
 [java] at edu.umd.cs.findbugs.FindBugs.runMain(FindBugs.java:348)
 [java] at edu.umd.cs.findbugs.FindBugs2.main(FindBugs2.java:1057)
 [java] Java Result: 1


Currently we have:

export MAVEN_OPTS=-Xmx2048m


Should the value of 2048m be increased ?

Cheers


Re: Branching 2.5

2014-07-30 Thread Ted Yu
Adding bui...@apache.org

Cheers

On Jul 30, 2014, at 12:52 AM, Andrew Wang  wrote:

> Alright, dug around some more and I think it's that FINDBUGS_HOME is not
> being set correctly. I downloaded and extracted Findbugs 1.3.9, pointed
> FINDBUGS_HOME at it, and the build worked after that. I don't know what's
> up with the default maven build, it'd be great if someone could check.
> 
> Can someone with access to the build machines check this?
> 
> As a side note, I think 1.3.9 was released in 2009. It'd be nice to catch
> up with the last 5 years of static analysis :)
> 
> 
> On Tue, Jul 29, 2014 at 11:36 PM, Andrew Wang 
> wrote:
> 
>> I looked in the log, it also looks like findbugs is OOMing:
>> 
>> [java] Exception in thread "main" java.lang.OutOfMemoryError: GC 
>> overhead limit exceeded
>> [java]at edu.umd.cs.findbugs.ba.Path.grow(Path.java:263)
>> [java]at edu.umd.cs.findbugs.ba.Path.copyFrom(Path.java:113)
>> [java]at edu.umd.cs.findbugs.ba.Path.duplicate(Path.java:103)
>> [java]at edu.umd.cs.findbugs.ba.obl.State.duplicate(State.java:65)
>> 
>> 
>> This is quite possibly related, since there's an error at the end like
>> this:
>> 
>> [ERROR] Failed to execute goal 
>> org.apache.maven.plugins:maven-antrun-plugin:1.7:run (site) on project 
>> hadoop-hdfs: An Ant BuildException has occured: input file 
>> /home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml
>>  does not exist
>> 
>> [ERROR] around Ant part ...> style="/home/jenkins/tools/findbugs/latest/src/xsl/default.xsl"
>> in="/home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml"
>> out="/home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/site/findbugs.html"/>...
>> @ 44:368 in
>> /home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
>> 
>> I'll try to figure out how to increase this, but if anyone else knows,
>> feel free to chime in.
>> 
>> 
>> On Tue, Jul 29, 2014 at 5:41 PM, Karthik Kambatla 
>> wrote:
>> 
>>> Devs,
>>> 
>>> I created branch-2.5.0 and was trying to cut an RC, but ran into issues
>>> with creating one. If anyone knows what is going on, please help me out. I
>>> ll continue looking into it otherwise.
>>> 
>>> https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder/24/console
>>> is the build that failed. It appears the issue is because it can't find
>>> Null.java. I run into the same issue locally as well, even with
>>> branch-2.4.1. So, I wonder if I should be doing anything else to create
>>> the
>>> RC instead?
>>> 
>>> Thanks
>>> Karthik
>>> 
>>> 
>>> On Sun, Jul 27, 2014 at 11:09 AM, Zhijie Shen 
>>> wrote:
>>> 
 I've just committed YARN-2247, which is the last 2.5 blocker from YARN.
 
 
 On Sat, Jul 26, 2014 at 5:02 AM, Karthik Kambatla 
 wrote:
 
> A quick update:
> 
> All remaining blockers are on the verge of getting committed. Once
>>> that
 is
> done, I plan to cut a branch for 2.5.0 and get an RC out hopefully
>>> this
> coming Monday.
> 
> 
> On Fri, Jul 25, 2014 at 12:32 PM, Andrew Wang <
>>> andrew.w...@cloudera.com>
> wrote:
> 
>> One thing I forgot, the release note activities are happening at
>> HADOOP-10821. If you have other things you'd like to see mentioned,
 feel
>> free to leave a comment on the JIRA and I'll try to include it.
>> 
>> Thanks,
>> Andrew
>> 
>> 
>> On Fri, Jul 25, 2014 at 12:28 PM, Andrew Wang <
 andrew.w...@cloudera.com>
>> wrote:
>> 
>>> I just went through and fixed up the HDFS and Common CHANGES.txt
>>> for
>> 2.5.0.
>>> 
>>> As a friendly reminder, please try to put things under the correct
>> section
>>> :) We have subsections for the xattr changes in HDFS-2006 and
>> HADOOP-10514,
>>> and there were some unrelated JIRAs appended to the end.
>>> 
>>> I'd also encourage committers to be more liberal with their use of
 the
>> NEW
>>> FEATURES section. I'm helping Karthik write up the 2.5 release
>>> notes,
> and
>>> I'm using NEW FEATURES to fill it out. When looking through the
>>> JIRA
> list
>>> though, I decided to promote things like the SNN/DN/JN webUI
>> improvements,
>>> the HCFS specification work, and OIV read-only WebHDFS access to
>>> new
>>> features. One rule-of-thumb, if a feature required an umbrella
>>> JIRA,
> put
>>> the umbrella under NEW FEATURES when it's resolved.
>>> 
>>> Thanks,
>>> Andrew
>>> 
>>> 
>>> On Wed, Jul 16, 2014 at 7:59 PM, Wangda Tan 
> wrote:
>>> 
 Thanks Tsuyoshi for pointing me this,
 
 Wangda
 
>>>

Re: HDFS-6902 FileWriter should be closed in finally block in BlockReceiver#receiveBlock()

2014-08-21 Thread Ted Yu
bq. else there is a memory leak

Moving call of close() would prevent the leak.

bq. but then this code snippet could be java and can be messy

The code is in Java.

Cheers

On Wed, Aug 20, 2014 at 10:00 PM, vlab  wrote:

> Unless you need 'out' later, have this statement.
> FileWriter out(restartMeta);
> then when exiting the try block, 'out' will go out of scope
>
> i assume this FileWriter that is create is delete'd else where
> (else there is a memory leak).   {but then this code snippet could be java
> and can be messy.}
>
>
> On 8/20/2014 8:50 PM, Ted Yu (JIRA) wrote:
>
>> Ted Yu created HDFS-6902:
>> 
>>
>>   Summary: FileWriter should be closed in finally block in
>> BlockReceiver#receiveBlock()
>>   Key: HDFS-6902
>>   URL: https://issues.apache.org/jira/browse/HDFS-6902
>>   Project: Hadoop HDFS
>>Issue Type: Bug
>>  Reporter: Ted Yu
>>  Priority: Minor
>>
>>
>> Here is code starting from line 828:
>> {code}
>>  try {
>>FileWriter out = new FileWriter(restartMeta);
>>// write out the current time.
>>out.write(Long.toString(Time.now() + restartBudget));
>>out.flush();
>>out.close();
>>  } catch (IOException ioe) {
>> {code}
>> If write() or flush() call throws IOException, out wouldn't be closed.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.2#6252)
>>
>>
>


Re: transfer large file onto hdfs

2012-07-06 Thread Ted Yu
Thanks for the reply, Harsh.

BTW Does anyone deploy cdh3u4 on Solaris ?

On Thu, Jul 5, 2012 at 11:47 PM, Harsh J  wrote:

> Should we assume that the non-hadoop system has no way to get on the
> network of the hadoop cluster and its clients? Otherwise, you can
> grant it temporary access and do a write from it itself. If not:
>
> Run a micro FTP server pointed to that file, and then do a 'hadoop fs
> -cp ftp://location hdfs://location', since FTPFileSystem is present in
> Hadoop? Or if NFS/etc. mounted, file:/// will work (or via
> copyFromLocal/put). Essentially you're bringing in the file remotely
> but performing the copy via CLI.
>
> Or you can copy them in chunks, either keeping the destination file
> writer open, if possible, or appending (depending on what version of
> Hadoop you're using).
>
> On Thu, Jul 5, 2012 at 11:54 PM, Ted Yu  wrote:
> > Hi,
> > One of the customers wants to transfer dump file (size ~ 2TB) from
> outside
> > hadoop cluster onto hdfs.
> > The size exceeds free space on CLI machine.
> >
> > I want to poll best practice in this scenario.
> >
> > Thanks
>
>
>
> --
> Harsh J
>


Re: Cannot create a new Jira issue for MapReduce

2012-08-11 Thread Ted Yu
I made some suggestions to hbase dev mailing list a few weeks ago. The
following suggestion is about hbase development which can be extrapolated
to other Apache projects.


People can continue discussion through dev mailing list when JIRA is down.
When JIRA comes back up, transcript of such discussion can be posted back
on related issues.
Use of https://reviews.apache.org is encouraged. The review board wasn't
affected by JIRA downtime.
Running test suite by contributors and committers is encouraged which
alleviates the burden on Hadoop QA.

Goal for the above suggestions is for alleviating the impact of JIRA down
time.

BTW I have kept notifications from iss...@hbase.apache.org in my Inbox.
This shows benefit when JIRA is down.

Cheers

On Sat, Aug 11, 2012 at 7:14 PM, Jun Ping Du  wrote:

> Yes. I saw JIRA is in maintenance now and the schedule is as below:
>
> Host Name   Service Entry Time  Author  Comment Start Time
>  End TimeTypeDurationDowntime ID Trigger ID
>  Actions
> ull.zones.apache.orgIssues - JIRA - General 2012-08-11 19:06:08
> danielshMigrating to a different physical host  2012-08-11 19:06:08
> 2012-08-13 19:06:08 Fixed   2d 0h 0m 0s 1663N/A
> Delete/Cancel This Scheduled Downtime Entry
>
> Looks like it will take 2 days to migrate to a different host. As JIRA is
> a key component to dev process in community, do we think of some ways to
> lower the maintenance overhead?
>
>
> Thanks,
>
> Junping
>
> - Original Message -
> From: "Steve Loughran" 
> To: mapreduce-...@hadoop.apache.org
> Sent: Friday, August 10, 2012 7:33:04 AM
> Subject: Re: Cannot create a new Jira issue for MapReduce
>
> There's been disk problems w/ Jira recently. Githubs been playing up
> this morning to. Time to put away the dev tools and get powerpoint out
> instead
>
> On 9 August 2012 13:38, Robert Evans  wrote:
> > It is a bit worse then that though.  I found that it did create the JIRA,
> > but it is in a bad state where you cannot put it in patch available or
> > close it. So we may need to do some cleanup of these JIRAs later.
> >
> > --Bobby
> >
> > On 8/9/12 3:19 PM, "Ted Yu"  wrote:
> >
> >>This has been reported by HBase developers as well.
> >>
> >>See https://issues.apache.org/jira/browse/INFRA-5131
> >>
> >>On Thu, Aug 9, 2012 at 1:10 PM, Benoy Antony  wrote:
> >>
> >>> Hi,
> >>>
> >>> I am getting the following error when I try to create a Jira issue.
> >>>
> >>> Error creating issue: com.atlassian.jira.util.RuntimeIOException:
> >>> java.io.IOException: read past EOF
> >>>
> >>> Anyone else face the same problem ?
> >>>
> >>> Thanks ,
> >>> Benoy
> >>>
> >
>


Re: current direction in namenode HA

2012-09-04 Thread Ted Yu
Uma:
Attachment is stripped in this mailing list.
I guess you were trying to attach this file:
https://issues.apache.org/jira/secure/attachment/12538911/BKTestDoc.pdf

On Tue, Sep 4, 2012 at 4:09 PM, Uma Maheswara Rao G wrote:

> Hi Sujee,
>
> Thanks a lot for your interest on HA.
>
> for #1
> If you can invest on NFS filers, it is another option.  If you want to try
> this, you can use released Hadoop-2 version and try.
>   but above #2 and #3 will avoid this external hardware dependency.
>
> for #2 you can take a look at HDFS-3399
>   We are testing with BookeKeeper from last 2/3 months and going well. BK
> is progressing on autorecovery and security parts. Almost auto recoverry
> done(BOOKKEEPER-237) and will be released in BK 4.2 version very soon. BK
> already started work on security part as well. Also this integration part
> will come out with next hadoop-2 release as well. Also attached tested
> scenarios in HDFS-3399 for your reference if you want to take a look.
> Also there is one subTask in that umbrella  JIRA for user manual
> information.
>
>
> for #3 you can take a look at HDFS-3077
>In this umbrella JIRA work is going on actively.
>
>
> for #4
> I am not sure any one working on it.
>
> The advantage here is, you can plugin the shared storage whichever you
> want.
>
> Regards,
> Uma
>
> On Wed, Sep 5, 2012 at 4:07 AM, Sujee Maniyam  wrote:
>
> > Hello devs,
> >
> > I am trying to understand the current state / direction of  namenode
> > HA implementation.
> >
> > For using shared directory, I see the following options
> > (from
> >
> http://www.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
> >   and  https://issues.apache.org/jira/browse/HDFS-3278)
> >
> > 1) rely on external HA filer
> > 2) multiple edit directories
> > 3) book keeper
> > 4) keep edits in HDFS / quorum based
> >
> > is there going to be an 'official / supported' method, or it is going
> > to be a configurable choice when setting up a cluster?
> >
> > thanks
> > Sujee
> > http://sujee.net
> >
>


Re: [VOTE] Release Apache Hadoop 2.0.4-alpha

2013-04-09 Thread Ted Yu
HADOOP-9467 has patch available. 

It would be nice to include that as well. 

Thanks

On Apr 9, 2013, at 11:14 PM, Siddharth Seth  wrote:

> Arun, MAPREDUCE-5094 would be a useful jira to include in the 2.0.4-alpha
> release. It's not an absolute blocker since the values can be controlled
> explicitly by changing tests which use the cluster.
> 
> Thanks
> - Sid
> 
> 
> On Tue, Apr 9, 2013 at 8:39 PM, Arun C Murthy  wrote:
> 
>> Folks,
>> 
>> I've created a release candidate (rc0) for hadoop-2.0.4-alpha that I would
>> like to release.
>> 
>> This is a bug-fix release which solves a number of issues discovered
>> during integration testing of the full-stack.
>> 
>> The RC is available at:
>> http://people.apache.org/~acmurthy/hadoop-2.0.4-alpha-rc0/
>> The RC tag in svn is here:
>> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.4-alpha-rc0
>> 
>> The maven artifacts are available via repository.apache.org.
>> 
>> Please try the release and vote; the vote will run for the usual 7 days.
>> 
>> thanks,
>> Arun
>> 
>> P.S. Many thanks are in order - Roman/Cos and rest of BigTop community for
>> helping to find a number of integration issues, Ted Yu for co-ordinating on
>> HBase, Alejandro for co-ordinating on Oozie, Vinod/Sid/Alejandro/Xuan/Daryn
>> and rest of devs for quickly jumping and fixing these.
>> 
>> 
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> 


collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-15 Thread Ted Yu
Hi,
This afternoon Huned ad I discovered an issue while playing with HBase
Snapshots on top of Hadoop's Snapshot branch (
http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2802/).

HDFS (built from HDFS-2802 branch) doesn't allow paths with .snapshot as a
component while HBase tries to create paths with .snapshot as a component.
This leads to issues in HBase, and one of HDFS or HBase needs to give up
the .snapshot reserved keyword. HBase released Snapshots feature in 0.94.6
(quite recently) and it may not be too late to change HBase to use a
different path component in an upcoming new release.

In HBase these path names are not user visible. If there is a deployment of
0.94.6, one could provide a migration tool that renames .snapshot to
.hbase-snapshot or something to be able to move to the Snapshot release of
Hadoop. On the other hand, .snapshot in HDFS is a user visible name and is
a convention that is used by many file systems. It's a matter of
familiarity with such path names that would help users in using HDFS
snapshots.

I am including the hdfs-dev in this email. Would appreciate if we could
work together and come up with a solution.

You can find sample output from hdfs command here:
http://pastebin.com/bBqR4Fvr

Cheers


Re: collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-15 Thread Ted Yu
Lars:
I will go ahead and log an HBase JIRA, tomorrow morning.

This hopefully would give people enough time to respond.

Cheers

On Mon, Apr 15, 2013 at 8:00 PM, lars hofhansl  wrote:

> OK. Let's try to fix that quickly, so that I can release HBase 0.94.7.
>
> -- Lars
>
>
>
> ________
>  From: Ted Yu 
> To: d...@hbase.apache.org; hdfs-dev@hadoop.apache.org
> Sent: Monday, April 15, 2013 7:13 PM
> Subject: collision in the naming of '.snapshot' directory between hdfs
> snapshot and hbase snapshot
>
>
> Hi,
> This afternoon Huned ad I discovered an issue while playing with HBase
> Snapshots on top of Hadoop's Snapshot branch (
> http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2802/).
>
> HDFS (built from HDFS-2802 branch) doesn't allow paths with .snapshot as a
> component while HBase tries to create paths with .snapshot as a component.
> This leads to issues in HBase, and one of HDFS or HBase needs to give up
> the .snapshot reserved keyword. HBase released Snapshots feature in 0.94.6
> (quite recently) and it may not be too late to change HBase to use a
> different path component in an upcoming new release.
>
> In HBase these path names are not user visible. If there is a deployment of
> 0.94.6, one could provide a migration tool that renames .snapshot to
> .hbase-snapshot or something to be able to move to the Snapshot release of
> Hadoop. On the other hand, .snapshot in HDFS is a user visible name and is
> a convention that is used by many file systems. It's a matter of
> familiarity with such path names that would help users in using HDFS
> snapshots.
>
> I am including the hdfs-dev in this email. Would appreciate if we could
> work together and come up with a solution.
>
> You can find sample output from hdfs command here:
> http://pastebin.com/bBqR4Fvr
>
> Cheers
>


Re: collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-15 Thread Ted Yu
I plan to rename ".sanpshot" in HBase to ".hbase-sanpshot"

Please suggest better name in future correspondence.

Thanks

On Mon, Apr 15, 2013 at 8:12 PM, Ted Yu  wrote:

> Lars:
> I will go ahead and log an HBase JIRA, tomorrow morning.
>
> This hopefully would give people enough time to respond.
>
> Cheers
>
>
> On Mon, Apr 15, 2013 at 8:00 PM, lars hofhansl  wrote:
>
>> OK. Let's try to fix that quickly, so that I can release HBase 0.94.7.
>>
>> -- Lars
>>
>>
>>
>> 
>>  From: Ted Yu 
>> To: d...@hbase.apache.org; hdfs-dev@hadoop.apache.org
>> Sent: Monday, April 15, 2013 7:13 PM
>> Subject: collision in the naming of '.snapshot' directory between hdfs
>> snapshot and hbase snapshot
>>
>>
>> Hi,
>> This afternoon Huned ad I discovered an issue while playing with HBase
>> Snapshots on top of Hadoop's Snapshot branch (
>> http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2802/).
>>
>> HDFS (built from HDFS-2802 branch) doesn't allow paths with .snapshot as a
>> component while HBase tries to create paths with .snapshot as a component.
>> This leads to issues in HBase, and one of HDFS or HBase needs to give up
>> the .snapshot reserved keyword. HBase released Snapshots feature in 0.94.6
>> (quite recently) and it may not be too late to change HBase to use a
>> different path component in an upcoming new release.
>>
>> In HBase these path names are not user visible. If there is a deployment
>> of
>> 0.94.6, one could provide a migration tool that renames .snapshot to
>> .hbase-snapshot or something to be able to move to the Snapshot release of
>> Hadoop. On the other hand, .snapshot in HDFS is a user visible name and is
>> a convention that is used by many file systems. It's a matter of
>> familiarity with such path names that would help users in using HDFS
>> snapshots.
>>
>> I am including the hdfs-dev in this email. Would appreciate if we could
>> work together and come up with a solution.
>>
>> You can find sample output from hdfs command here:
>> http://pastebin.com/bBqR4Fvr
>>
>> Cheers
>>
>
>


Re: collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-15 Thread Ted Yu
Putting back dev@hbase.

".hbase-sanpshot" would be created at cluster startup. After that user
wouldn't be able to use the same directory name.

On Mon, Apr 15, 2013 at 8:23 PM, Azuryy Yu  wrote:

> I think ".hbase-sanpshot" is good, but we should also disallow user to
> create ".hbase-sanpshot" under hbase.root sub directories.
>
>
> On Tue, Apr 16, 2013 at 11:18 AM, Ted Yu  wrote:
>
> > I plan to rename ".sanpshot" in HBase to ".hbase-sanpshot"
> >
> > Please suggest better name in future correspondence.
> >
> > Thanks
> >
> > On Mon, Apr 15, 2013 at 8:12 PM, Ted Yu  wrote:
> >
> > > Lars:
> > > I will go ahead and log an HBase JIRA, tomorrow morning.
> > >
> > > This hopefully would give people enough time to respond.
> > >
> > > Cheers
> > >
> > >
> > > On Mon, Apr 15, 2013 at 8:00 PM, lars hofhansl 
> wrote:
> > >
> > >> OK. Let's try to fix that quickly, so that I can release HBase 0.94.7.
> > >>
> > >> -- Lars
> > >>
> > >>
> > >>
> > >> 
> > >>  From: Ted Yu 
> > >> To: d...@hbase.apache.org; hdfs-dev@hadoop.apache.org
> > >> Sent: Monday, April 15, 2013 7:13 PM
> > >> Subject: collision in the naming of '.snapshot' directory between hdfs
> > >> snapshot and hbase snapshot
> > >>
> > >>
> > >> Hi,
> > >> This afternoon Huned ad I discovered an issue while playing with HBase
> > >> Snapshots on top of Hadoop's Snapshot branch (
> > >> http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2802/).
> > >>
> > >> HDFS (built from HDFS-2802 branch) doesn't allow paths with .snapshot
> > as a
> > >> component while HBase tries to create paths with .snapshot as a
> > component.
> > >> This leads to issues in HBase, and one of HDFS or HBase needs to give
> up
> > >> the .snapshot reserved keyword. HBase released Snapshots feature in
> > 0.94.6
> > >> (quite recently) and it may not be too late to change HBase to use a
> > >> different path component in an upcoming new release.
> > >>
> > >> In HBase these path names are not user visible. If there is a
> deployment
> > >> of
> > >> 0.94.6, one could provide a migration tool that renames .snapshot to
> > >> .hbase-snapshot or something to be able to move to the Snapshot
> release
> > of
> > >> Hadoop. On the other hand, .snapshot in HDFS is a user visible name
> and
> > is
> > >> a convention that is used by many file systems. It's a matter of
> > >> familiarity with such path names that would help users in using HDFS
> > >> snapshots.
> > >>
> > >> I am including the hdfs-dev in this email. Would appreciate if we
> could
> > >> work together and come up with a solution.
> > >>
> > >> You can find sample output from hdfs command here:
> > >> http://pastebin.com/bBqR4Fvr
> > >>
> > >> Cheers
> > >>
> > >
> > >
> >
>


Re: collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-15 Thread Ted Yu
bq. let's make the hbase snapshot for a conf variable.

Once we decide on the new name of snapshot directory, we should still use
hardcoded value. This aligns with current code base:
See this snippet from HConstants:

  public static final List HBASE_NON_TABLE_DIRS =

Collections.unmodifiableList(Arrays.asList(new String[] {
HREGION_LOGDIR_NAME,

  HREGION_OLDLOGDIR_NAME, CORRUPT_DIR_NAME, SPLIT_LOGDIR_NAME,

  HBCK_SIDELINEDIR_NAME, HFILE_ARCHIVE_DIRECTORY, SNAPSHOT_DIR_NAME,
HBASE_TEMP_DIRECTORY }));
Cheers

On Mon, Apr 15, 2013 at 8:24 PM, Jonathan Hsieh  wrote:

> constraints:
>
> 1) hbase 0.94.6 is released and .snapshot is hardcoded in there.
> 2) hdfs snapshots is a Hadoop 2.1 or 3.0 feature. I doubt that it will ever
> make it to 1.x.  This hdfs feature ideally this shouldn't affect current A
> pache Hbase 0.94.x's.
> 3) hbase 95/96 may default to Hadoop1 or Hadoop 2. these versions should
> pick a different table snapshot name to respect fs conventions.
>
> proposed actions:
>
> 1) let's make the hbase snapshot for a conf variable. (hbase.
> snapshots.dir)  let's change the default for hbase 95+. (maybe
> .hbase-snapshots). we'll also port this patch to 0.94.x
> 2) let's publish instructions on how to update the hbase snapshot dir:
> shutdown hbase, config update, rename dir, restart hbase.
> 3) I lean towards leaving the current default hbase snapshot dir in 94
> since it shouldn't be affected.  upgrading hbase to 95/96 will require
> shutdown and update scripts so it seems like the ideal time to autoforce
> this default change.
>
> Thoughts?
>
>
> On Monday, April 15, 2013, lars hofhansl wrote:
>
> > OK. Let's try to fix that quickly, so that I can release HBase 0.94.7.
> >
> > -- Lars
> >
> >
> >
> > 
> >  From: Ted Yu 
> > To: d...@hbase.apache.org; hdfs-dev@hadoop.apache.org
> > Sent: Monday, April 15, 2013 7:13 PM
> > Subject: collision in the naming of '.snapshot' directory between hdfs
> > snapshot and hbase snapshot
> >
> >
> > Hi,
> > This afternoon Huned ad I discovered an issue while playing with HBase
> > Snapshots on top of Hadoop's Snapshot branch (
> > http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2802/).
> >
> > HDFS (built from HDFS-2802 branch) doesn't allow paths with .snapshot as
> a
> > component while HBase tries to create paths with .snapshot as a
> component.
> > This leads to issues in HBase, and one of HDFS or HBase needs to give up
> > the .snapshot reserved keyword. HBase released Snapshots feature in
> 0.94.6
> > (quite recently) and it may not be too late to change HBase to use a
> > different path component in an upcoming new release.
> >
> > In HBase these path names are not user visible. If there is a deployment
> of
> > 0.94.6, one could provide a migration tool that renames .snapshot to
> > .hbase-snapshot or something to be able to move to the Snapshot release
> of
> > Hadoop. On the other hand, .snapshot in HDFS is a user visible name and
> is
> > a convention that is used by many file systems. It's a matter of
> > familiarity with such path names that would help users in using HDFS
> > snapshots.
> >
> > I am including the hdfs-dev in this email. Would appreciate if we could
> > work together and come up with a solution.
> >
> > You can find sample output from hdfs command here:
> > http://pastebin.com/bBqR4Fvr
> >
> > Cheers
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // j...@cloudera.com
>


Re: collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-15 Thread Ted Yu
bq. Alternatively, we can detect the underlying Hadoop version, and use
either .snapshot or .hbase_snapshot in 0.94 depending on h1 & h2.

I think this would introduce more confusion, especially for operations.

Cheers

On Mon, Apr 15, 2013 at 8:52 PM, Enis Söztutar  wrote:

> Because HDFS exposes the snapshots so that the normal file system
> operations are mapped inside snapshot dirs, I think HDFS reserving the
> .snapshot name makes sense. OTOH, nothing is specific about the dir name
> that is chosen by HBase.
>
> I would prefer to change the dir name in 0.94 as well, since 0.94 is also
> being run on top of hadoop 2. Alternatively, we can detect the underlying
> Hadoop version, and use either .snapshot or .hbase_snapshot in 0.94
> depending on h1 & h2.
>
> Enis
>
>
> On Mon, Apr 15, 2013 at 8:31 PM, Ted Yu  wrote:
>
> > bq. let's make the hbase snapshot for a conf variable.
> >
> > Once we decide on the new name of snapshot directory, we should still use
> > hardcoded value. This aligns with current code base:
> > See this snippet from HConstants:
> >
> >   public static final List HBASE_NON_TABLE_DIRS =
> >
> > Collections.unmodifiableList(Arrays.asList(new String[] {
> > HREGION_LOGDIR_NAME,
> >
> >   HREGION_OLDLOGDIR_NAME, CORRUPT_DIR_NAME, SPLIT_LOGDIR_NAME,
> >
> >   HBCK_SIDELINEDIR_NAME, HFILE_ARCHIVE_DIRECTORY, SNAPSHOT_DIR_NAME,
> > HBASE_TEMP_DIRECTORY }));
> > Cheers
> >
> > On Mon, Apr 15, 2013 at 8:24 PM, Jonathan Hsieh 
> wrote:
> >
> > > constraints:
> > >
> > > 1) hbase 0.94.6 is released and .snapshot is hardcoded in there.
> > > 2) hdfs snapshots is a Hadoop 2.1 or 3.0 feature. I doubt that it will
> > ever
> > > make it to 1.x.  This hdfs feature ideally this shouldn't affect
> current
> > A
> > > pache Hbase 0.94.x's.
> > > 3) hbase 95/96 may default to Hadoop1 or Hadoop 2. these versions
> should
> > > pick a different table snapshot name to respect fs conventions.
> > >
> > > proposed actions:
> > >
> > > 1) let's make the hbase snapshot for a conf variable. (hbase.
> > > snapshots.dir)  let's change the default for hbase 95+. (maybe
> > > .hbase-snapshots). we'll also port this patch to 0.94.x
> > > 2) let's publish instructions on how to update the hbase snapshot dir:
> > > shutdown hbase, config update, rename dir, restart hbase.
> > > 3) I lean towards leaving the current default hbase snapshot dir in 94
> > > since it shouldn't be affected.  upgrading hbase to 95/96 will require
> > > shutdown and update scripts so it seems like the ideal time to
> autoforce
> > > this default change.
> > >
> > > Thoughts?
> > >
> > >
> > > On Monday, April 15, 2013, lars hofhansl wrote:
> > >
> > > > OK. Let's try to fix that quickly, so that I can release HBase
> 0.94.7.
> > > >
> > > > -- Lars
> > > >
> > > >
> > > >
> > > > 
> > > >  From: Ted Yu 
> > > > To: d...@hbase.apache.org; hdfs-dev@hadoop.apache.org
> > > > Sent: Monday, April 15, 2013 7:13 PM
> > > > Subject: collision in the naming of '.snapshot' directory between
> hdfs
> > > > snapshot and hbase snapshot
> > > >
> > > >
> > > > Hi,
> > > > This afternoon Huned ad I discovered an issue while playing with
> HBase
> > > > Snapshots on top of Hadoop's Snapshot branch (
> > > > http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2802/).
> > > >
> > > > HDFS (built from HDFS-2802 branch) doesn't allow paths with .snapshot
> > as
> > > a
> > > > component while HBase tries to create paths with .snapshot as a
> > > component.
> > > > This leads to issues in HBase, and one of HDFS or HBase needs to give
> > up
> > > > the .snapshot reserved keyword. HBase released Snapshots feature in
> > > 0.94.6
> > > > (quite recently) and it may not be too late to change HBase to use a
> > > > different path component in an upcoming new release.
> > > >
> > > > In HBase these path names are not user visible. If there is a
> > deployment
> > > of
> > > > 0.94.6, one could provide a migration tool that renames .snapshot to
> > > > .hbase-snapshot or something to be able to move to the Snapshot
> release
> > > of
> > > > Hadoop. On the other hand, .snapshot in HDFS is a user visible name
> and
> > > is
> > > > a convention that is used by many file systems. It's a matter of
> > > > familiarity with such path names that would help users in using HDFS
> > > > snapshots.
> > > >
> > > > I am including the hdfs-dev in this email. Would appreciate if we
> could
> > > > work together and come up with a solution.
> > > >
> > > > You can find sample output from hdfs command here:
> > > > http://pastebin.com/bBqR4Fvr
> > > >
> > > > Cheers
> > >
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // j...@cloudera.com
> > >
> >
>


Re: collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-15 Thread Ted Yu
I have a patch which touched these tests:

http://pastebin.com/P4p8LEAZ

I am running 0.94 test suite now - will publish patch and test result in
the morning.

Cheers

On Mon, Apr 15, 2013 at 9:00 PM, Ted Yu  wrote:

> bq. Alternatively, we can detect the underlying Hadoop version, and use
> either .snapshot or .hbase_snapshot in 0.94 depending on h1 & h2.
>
> I think this would introduce more confusion, especially for operations.
>
> Cheers
>
> On Mon, Apr 15, 2013 at 8:52 PM, Enis Söztutar  wrote:
>
>> Because HDFS exposes the snapshots so that the normal file system
>> operations are mapped inside snapshot dirs, I think HDFS reserving the
>> .snapshot name makes sense. OTOH, nothing is specific about the dir name
>> that is chosen by HBase.
>>
>> I would prefer to change the dir name in 0.94 as well, since 0.94 is also
>> being run on top of hadoop 2. Alternatively, we can detect the underlying
>> Hadoop version, and use either .snapshot or .hbase_snapshot in 0.94
>> depending on h1 & h2.
>>
>> Enis
>>
>>
>> On Mon, Apr 15, 2013 at 8:31 PM, Ted Yu  wrote:
>>
>> > bq. let's make the hbase snapshot for a conf variable.
>> >
>> > Once we decide on the new name of snapshot directory, we should still
>> use
>> > hardcoded value. This aligns with current code base:
>> > See this snippet from HConstants:
>> >
>> >   public static final List HBASE_NON_TABLE_DIRS =
>> >
>> > Collections.unmodifiableList(Arrays.asList(new String[] {
>> > HREGION_LOGDIR_NAME,
>> >
>> >   HREGION_OLDLOGDIR_NAME, CORRUPT_DIR_NAME, SPLIT_LOGDIR_NAME,
>> >
>> >   HBCK_SIDELINEDIR_NAME, HFILE_ARCHIVE_DIRECTORY, SNAPSHOT_DIR_NAME,
>> > HBASE_TEMP_DIRECTORY }));
>> > Cheers
>> >
>> > On Mon, Apr 15, 2013 at 8:24 PM, Jonathan Hsieh 
>> wrote:
>> >
>> > > constraints:
>> > >
>> > > 1) hbase 0.94.6 is released and .snapshot is hardcoded in there.
>> > > 2) hdfs snapshots is a Hadoop 2.1 or 3.0 feature. I doubt that it will
>> > ever
>> > > make it to 1.x.  This hdfs feature ideally this shouldn't affect
>> current
>> > A
>> > > pache Hbase 0.94.x's.
>> > > 3) hbase 95/96 may default to Hadoop1 or Hadoop 2. these versions
>> should
>> > > pick a different table snapshot name to respect fs conventions.
>> > >
>> > > proposed actions:
>> > >
>> > > 1) let's make the hbase snapshot for a conf variable. (hbase.
>> > > snapshots.dir)  let's change the default for hbase 95+. (maybe
>> > > .hbase-snapshots). we'll also port this patch to 0.94.x
>> > > 2) let's publish instructions on how to update the hbase snapshot dir:
>> > > shutdown hbase, config update, rename dir, restart hbase.
>> > > 3) I lean towards leaving the current default hbase snapshot dir in 94
>> > > since it shouldn't be affected.  upgrading hbase to 95/96 will require
>> > > shutdown and update scripts so it seems like the ideal time to
>> autoforce
>> > > this default change.
>> > >
>> > > Thoughts?
>> > >
>> > >
>> > > On Monday, April 15, 2013, lars hofhansl wrote:
>> > >
>> > > > OK. Let's try to fix that quickly, so that I can release HBase
>> 0.94.7.
>> > > >
>> > > > -- Lars
>> > > >
>> > > >
>> > > >
>> > > > 
>> > > >  From: Ted Yu 
>> > > > To: d...@hbase.apache.org; hdfs-dev@hadoop.apache.org
>> > > > Sent: Monday, April 15, 2013 7:13 PM
>> > > > Subject: collision in the naming of '.snapshot' directory between
>> hdfs
>> > > > snapshot and hbase snapshot
>> > > >
>> > > >
>> > > > Hi,
>> > > > This afternoon Huned ad I discovered an issue while playing with
>> HBase
>> > > > Snapshots on top of Hadoop's Snapshot branch (
>> > > > http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2802/).
>> > > >
>> > > > HDFS (built from HDFS-2802 branch) doesn't allow paths with
>> .snapshot
>> > as
>> > > a
>> > > > component while HBase tries to create paths with .snapshot as a
>> > > component.
>> > > > This leads to issues in 

Re: collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-15 Thread Ted Yu
I have logged HBASE-8352

Cheers

On Mon, Apr 15, 2013 at 9:17 PM, Ted Yu  wrote:

> I have a patch which touched these tests:
>
> http://pastebin.com/P4p8LEAZ
>
> I am running 0.94 test suite now - will publish patch and test result in
> the morning.
>
> Cheers
>
>
> On Mon, Apr 15, 2013 at 9:00 PM, Ted Yu  wrote:
>
>> bq. Alternatively, we can detect the underlying Hadoop version, and use
>> either .snapshot or .hbase_snapshot in 0.94 depending on h1 & h2.
>>
>> I think this would introduce more confusion, especially for operations.
>>
>> Cheers
>>
>> On Mon, Apr 15, 2013 at 8:52 PM, Enis Söztutar wrote:
>>
>>> Because HDFS exposes the snapshots so that the normal file system
>>> operations are mapped inside snapshot dirs, I think HDFS reserving the
>>> .snapshot name makes sense. OTOH, nothing is specific about the dir name
>>> that is chosen by HBase.
>>>
>>> I would prefer to change the dir name in 0.94 as well, since 0.94 is also
>>> being run on top of hadoop 2. Alternatively, we can detect the underlying
>>> Hadoop version, and use either .snapshot or .hbase_snapshot in 0.94
>>> depending on h1 & h2.
>>>
>>> Enis
>>>
>>>
>>> On Mon, Apr 15, 2013 at 8:31 PM, Ted Yu  wrote:
>>>
>>> > bq. let's make the hbase snapshot for a conf variable.
>>> >
>>> > Once we decide on the new name of snapshot directory, we should still
>>> use
>>> > hardcoded value. This aligns with current code base:
>>> > See this snippet from HConstants:
>>> >
>>> >   public static final List HBASE_NON_TABLE_DIRS =
>>> >
>>> > Collections.unmodifiableList(Arrays.asList(new String[] {
>>> > HREGION_LOGDIR_NAME,
>>> >
>>> >   HREGION_OLDLOGDIR_NAME, CORRUPT_DIR_NAME, SPLIT_LOGDIR_NAME,
>>> >
>>> >   HBCK_SIDELINEDIR_NAME, HFILE_ARCHIVE_DIRECTORY,
>>> SNAPSHOT_DIR_NAME,
>>> > HBASE_TEMP_DIRECTORY }));
>>> > Cheers
>>> >
>>> > On Mon, Apr 15, 2013 at 8:24 PM, Jonathan Hsieh 
>>> wrote:
>>> >
>>> > > constraints:
>>> > >
>>> > > 1) hbase 0.94.6 is released and .snapshot is hardcoded in there.
>>> > > 2) hdfs snapshots is a Hadoop 2.1 or 3.0 feature. I doubt that it
>>> will
>>> > ever
>>> > > make it to 1.x.  This hdfs feature ideally this shouldn't affect
>>> current
>>> > A
>>> > > pache Hbase 0.94.x's.
>>> > > 3) hbase 95/96 may default to Hadoop1 or Hadoop 2. these versions
>>> should
>>> > > pick a different table snapshot name to respect fs conventions.
>>> > >
>>> > > proposed actions:
>>> > >
>>> > > 1) let's make the hbase snapshot for a conf variable. (hbase.
>>> > > snapshots.dir)  let's change the default for hbase 95+. (maybe
>>> > > .hbase-snapshots). we'll also port this patch to 0.94.x
>>> > > 2) let's publish instructions on how to update the hbase snapshot
>>> dir:
>>> > > shutdown hbase, config update, rename dir, restart hbase.
>>> > > 3) I lean towards leaving the current default hbase snapshot dir in
>>> 94
>>> > > since it shouldn't be affected.  upgrading hbase to 95/96 will
>>> require
>>> > > shutdown and update scripts so it seems like the ideal time to
>>> autoforce
>>> > > this default change.
>>> > >
>>> > > Thoughts?
>>> > >
>>> > >
>>> > > On Monday, April 15, 2013, lars hofhansl wrote:
>>> > >
>>> > > > OK. Let's try to fix that quickly, so that I can release HBase
>>> 0.94.7.
>>> > > >
>>> > > > -- Lars
>>> > > >
>>> > > >
>>> > > >
>>> > > > 
>>> > > >  From: Ted Yu 
>>> > > > To: d...@hbase.apache.org; hdfs-dev@hadoop.apache.org
>>> > > > Sent: Monday, April 15, 2013 7:13 PM
>>> > > > Subject: collision in the naming of '.snapshot' directory between
>>> hdfs
>>> > > > snapshot and hbase snapshot
>>> > > >
>>> > > >
>>> > > > Hi,
>>> > > > This afte

Re: collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-16 Thread Ted Yu
Let's get proper release notes for HBASE-8352 .

Either Lars or I can send out notification to user mailing list so that
there is enough preparation for this change.

Cheers

On Tue, Apr 16, 2013 at 8:46 AM, Jonathan Hsieh  wrote:

> I was away from keyboard when I asserted that hdfs snapshot was a hadoop
> 2.1 or 3.0 feature.  Apparently it is targeted as a hadoop 2.0.5 feature.
>  (I'm a little surprised -- expected this to be a hadoop2 compat breaking
> feature) -- so I agree that this is a bit more urgent.
>
> Anyway, I agree that the fs .snapshot naming convention is long standing
> and should win.
>
> My concern is with breaking compatibility in 0.94 again -- if we don't go
> down the conf variable route,  I consider having docs to properly document
> how to do the upgrade and caveats of doing the upgrade in the docs/release
> notes blocker to hbase 0.94.7.  (specifically mentioning from 0.94.6 to
> 0.94.7, and to possibly to 0.95).
>
> Jon.
>
> On Mon, Apr 15, 2013 at 9:00 PM, Ted Yu  wrote:
>
> > bq. Alternatively, we can detect the underlying Hadoop version, and use
> > either .snapshot or .hbase_snapshot in 0.94 depending on h1 & h2.
> >
> > I think this would introduce more confusion, especially for operations.
> >
> > Cheers
> >
> > On Mon, Apr 15, 2013 at 8:52 PM, Enis Söztutar 
> wrote:
> >
> > > Because HDFS exposes the snapshots so that the normal file system
> > > operations are mapped inside snapshot dirs, I think HDFS reserving the
> > > .snapshot name makes sense. OTOH, nothing is specific about the dir
> name
> > > that is chosen by HBase.
> > >
> > > I would prefer to change the dir name in 0.94 as well, since 0.94 is
> also
> > > being run on top of hadoop 2. Alternatively, we can detect the
> underlying
> > > Hadoop version, and use either .snapshot or .hbase_snapshot in 0.94
> > > depending on h1 & h2.
> > >
> > > Enis
> > >
> > >
> > > On Mon, Apr 15, 2013 at 8:31 PM, Ted Yu  wrote:
> > >
> > > > bq. let's make the hbase snapshot for a conf variable.
> > > >
> > > > Once we decide on the new name of snapshot directory, we should still
> > use
> > > > hardcoded value. This aligns with current code base:
> > > > See this snippet from HConstants:
> > > >
> > > >   public static final List HBASE_NON_TABLE_DIRS =
> > > >
> > > > Collections.unmodifiableList(Arrays.asList(new String[] {
> > > > HREGION_LOGDIR_NAME,
> > > >
> > > >   HREGION_OLDLOGDIR_NAME, CORRUPT_DIR_NAME, SPLIT_LOGDIR_NAME,
> > > >
> > > >   HBCK_SIDELINEDIR_NAME, HFILE_ARCHIVE_DIRECTORY,
> > SNAPSHOT_DIR_NAME,
> > > > HBASE_TEMP_DIRECTORY }));
> > > > Cheers
> > > >
> > > > On Mon, Apr 15, 2013 at 8:24 PM, Jonathan Hsieh 
> > > wrote:
> > > >
> > > > > constraints:
> > > > >
> > > > > 1) hbase 0.94.6 is released and .snapshot is hardcoded in there.
> > > > > 2) hdfs snapshots is a Hadoop 2.1 or 3.0 feature. I doubt that it
> > will
> > > > ever
> > > > > make it to 1.x.  This hdfs feature ideally this shouldn't affect
> > > current
> > > > A
> > > > > pache Hbase 0.94.x's.
> > > > > 3) hbase 95/96 may default to Hadoop1 or Hadoop 2. these versions
> > > should
> > > > > pick a different table snapshot name to respect fs conventions.
> > > > >
> > > > > proposed actions:
> > > > >
> > > > > 1) let's make the hbase snapshot for a conf variable. (hbase.
> > > > > snapshots.dir)  let's change the default for hbase 95+. (maybe
> > > > > .hbase-snapshots). we'll also port this patch to 0.94.x
> > > > > 2) let's publish instructions on how to update the hbase snapshot
> > dir:
> > > > > shutdown hbase, config update, rename dir, restart hbase.
> > > > > 3) I lean towards leaving the current default hbase snapshot dir in
> > 94
> > > > > since it shouldn't be affected.  upgrading hbase to 95/96 will
> > require
> > > > > shutdown and update scripts so it seems like the ideal time to
> > > autoforce
> > > > > this default change.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > >
> > > > > On Monday, April 15, 2

Re: collision in the naming of '.snapshot' directory between hdfs snapshot and hbase snapshot

2013-04-16 Thread Ted Yu
Hi,
Please take a look at patch v5 attached to HBASE-8352.

It would be nice to resolve this blocker today so that 0.94.7 RC can be cut.

Thanks

On Tue, Apr 16, 2013 at 10:12 AM, lars hofhansl  wrote:

> Please see my last comment on the jira. We can make this work without
> breaking users who are using HDFS snapshots.
>
>   --
>  *From:* Ted Yu 
> *To:* d...@hbase.apache.org
> *Cc:* hdfs-dev@hadoop.apache.org; lars hofhansl 
> *Sent:* Tuesday, April 16, 2013 10:00 AM
> *Subject:* Re: collision in the naming of '.snapshot' directory between
> hdfs snapshot and hbase snapshot
>
> Let's get proper release notes for HBASE-8352 .
>
> Either Lars or I can send out notification to user mailing list so that
> there is enough preparation for this change.
>
> Cheers
>
> On Tue, Apr 16, 2013 at 8:46 AM, Jonathan Hsieh  wrote:
>
> I was away from keyboard when I asserted that hdfs snapshot was a hadoop
> 2.1 or 3.0 feature.  Apparently it is targeted as a hadoop 2.0.5 feature.
>  (I'm a little surprised -- expected this to be a hadoop2 compat breaking
> feature) -- so I agree that this is a bit more urgent.
>
> Anyway, I agree that the fs .snapshot naming convention is long standing
> and should win.
>
> My concern is with breaking compatibility in 0.94 again -- if we don't go
> down the conf variable route,  I consider having docs to properly document
> how to do the upgrade and caveats of doing the upgrade in the docs/release
> notes blocker to hbase 0.94.7.  (specifically mentioning from 0.94.6 to
> 0.94.7, and to possibly to 0.95).
>
> Jon.
>
> On Mon, Apr 15, 2013 at 9:00 PM, Ted Yu  wrote:
>
> > bq. Alternatively, we can detect the underlying Hadoop version, and use
> > either .snapshot or .hbase_snapshot in 0.94 depending on h1 & h2.
> >
> > I think this would introduce more confusion, especially for operations.
> >
> > Cheers
> >
> > On Mon, Apr 15, 2013 at 8:52 PM, Enis Söztutar 
> wrote:
> >
> > > Because HDFS exposes the snapshots so that the normal file system
> > > operations are mapped inside snapshot dirs, I think HDFS reserving the
> > > .snapshot name makes sense. OTOH, nothing is specific about the dir
> name
> > > that is chosen by HBase.
> > >
> > > I would prefer to change the dir name in 0.94 as well, since 0.94 is
> also
> > > being run on top of hadoop 2. Alternatively, we can detect the
> underlying
> > > Hadoop version, and use either .snapshot or .hbase_snapshot in 0.94
> > > depending on h1 & h2.
> > >
> > > Enis
> > >
> > >
> > > On Mon, Apr 15, 2013 at 8:31 PM, Ted Yu  wrote:
> > >
> > > > bq. let's make the hbase snapshot for a conf variable.
> > > >
> > > > Once we decide on the new name of snapshot directory, we should still
> > use
> > > > hardcoded value. This aligns with current code base:
> > > > See this snippet from HConstants:
> > > >
> > > >   public static final List HBASE_NON_TABLE_DIRS =
> > > >
> > > > Collections.unmodifiableList(Arrays.asList(new String[] {
> > > > HREGION_LOGDIR_NAME,
> > > >
> > > >   HREGION_OLDLOGDIR_NAME, CORRUPT_DIR_NAME, SPLIT_LOGDIR_NAME,
> > > >
> > > >   HBCK_SIDELINEDIR_NAME, HFILE_ARCHIVE_DIRECTORY,
> > SNAPSHOT_DIR_NAME,
> > > > HBASE_TEMP_DIRECTORY }));
> > > > Cheers
> > > >
> > > > On Mon, Apr 15, 2013 at 8:24 PM, Jonathan Hsieh 
> > > wrote:
> > > >
> > > > > constraints:
> > > > >
> > > > > 1) hbase 0.94.6 is released and .snapshot is hardcoded in there.
> > > > > 2) hdfs snapshots is a Hadoop 2.1 or 3.0 feature. I doubt that it
> > will
> > > > ever
> > > > > make it to 1.x.  This hdfs feature ideally this shouldn't affect
> > > current
> > > > A
> > > > > pache Hbase 0.94.x's.
> > > > > 3) hbase 95/96 may default to Hadoop1 or Hadoop 2. these versions
> > > should
> > > > > pick a different table snapshot name to respect fs conventions.
> > > > >
> > > > > proposed actions:
> > > > >
> > > > > 1) let's make the hbase snapshot for a conf variable. (hbase.
> > > > > snapshots.dir)  let's change the default for hbase 95+. (maybe
> > > > > .hbase-snapshots). we'll also port this patch to 0.94.x
> > > >

Slow region server recoveries due to lease recovery going to stale data node

2013-04-19 Thread Ted Yu
I think the issue would be more appropriate for hdfs-dev@ mailing list.

Putting use@hbase as Bcc.

-- Forwarded message --
From: Varun Sharma 
Date: Fri, Apr 19, 2013 at 1:10 PM
Subject: Re: Slow region server recoveries
To: u...@hbase.apache.org


This is 0.94.3 hbase...


On Fri, Apr 19, 2013 at 1:09 PM, Varun Sharma  wrote:

> Hi Ted,
>
> I had a long offline discussion with nicholas on this. Looks like the last
> block which was still being written too, took an enormous time to recover.
> Here's what happened.
> a) Master split tasks and region servers process them
> b) Region server tries to recover lease for each WAL log - most cases are
> noop since they are already rolled over/finalized
> c) The last file lease recovery takes some time since the crashing server
> was writing to it and had a lease on it - but basically we have the lease
1
> minute after the server was lost
> d) Now we start the recovery for this but we end up hitting the stale data
> node which is puzzling.
>
> It seems that we did not hit the stale datanode when we were trying to
> recover the finalized WAL blocks with trivial lease recovery. However, for
> the final block, we hit the stale datanode. Any clue why this might be
> happening ?
>
> Varun
>
>
> On Fri, Apr 19, 2013 at 10:40 AM, Ted Yu  wrote:
>
>> Can you show snippet from DN log which mentioned UNDER_RECOVERY ?
>>
>> Here is the criteria for stale node checking to kick in (from
>>
>>
https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch
>> ):
>>
>> +   * Check if the datanode is in stale state. Here if
>> +   * the namenode has not received heartbeat msg from a
>> +   * datanode for more than staleInterval (default value is
>> +   * {@link
>> DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_MILLI_DEFAULT}),
>> +   * the datanode will be treated as stale node.
>>
>>
>> On Fri, Apr 19, 2013 at 10:28 AM, Varun Sharma 
>> wrote:
>>
>> > Is there a place to upload these logs ?
>> >
>> >
>> > On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma 
>> > wrote:
>> >
>> > > Hi Nicholas,
>> > >
>> > > Attached are the namenode, dn logs (of one of the healthy replicas of
>> the
>> > > WAL block) and the rs logs which got stuch doing the log split.
Action
>> > > begins at 2013-04-19 00:27*.
>> > >
>> > > Also, the rogue block is 5723958680970112840_174056. Its very
>> interesting
>> > > to trace this guy through the HDFS logs (dn and nn).
>> > >
>> > > Btw, do you know what the UNDER_RECOVERY stage is for, in HDFS ? Also
>> > does
>> > > the stale node stuff kick in for that state ?
>> > >
>> > > Thanks
>> > > Varun
>> > >
>> > >
>> > > On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon > > >wrote:
>> > >
>> > >> Thanks for the detailed scenario and analysis. I'm going to have a
>> look.
>> > >> I can't access the logs (ec2-107-20-237-30.compute-1.amazonaws.com
>> > >> timeouts), could you please send them directly to me?
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Nicolas
>> > >>
>> > >>
>> > >> On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma 
>> > >> wrote:
>> > >>
>> > >> > Hi Nicholas,
>> > >> >
>> > >> > Here is the failure scenario, I have dug up the logs.
>> > >> >
>> > >> > A machine fails and stops accepting/transmitting traffic. The
>> HMaster
>> > >> > starts the distributed split for 13 tasks. There are 12 region
>> > servers.
>> > >> 12
>> > >> > tasks succeed but the 13th one takes a looong time.
>> > >> >
>> > >> > Zookeeper timeout is set to 30 seconds. Stale node timeout is 20
>> > >> seconds.
>> > >> > Both patches are there.
>> > >> >
>> > >> > a) Machine fails around 27:30
>> > >> > b) Master starts the split around 27:40 and submits the tasks. The
>> one
>> > >> task
>> > >> > which fails seems to be the one which contains the WAL being
>> currently
>> > >> > written to:
>> > >> >
>> > >> > 2013-04-19 00:27:44,325 INFO
>> > >> > org.apache.hadoop.hbase.regionse

Re: Cannot communicate

2013-04-22 Thread Ted Yu
The exception was due to incompatible RPC versions between Apache maven
artifacts and CDH4.

I suggest you build the project with same hadoop version as in your cluster.

On Mon, Apr 22, 2013 at 7:50 AM, Kevin Burton wrote:

> I am relatively new to Hadoop and am working through a Manning publication
> "Hadoop in Action". One of the first program in the book (page 44) gives me
> a Java exception: org.apache.hadoop.ipc.RemoteException: Server IPC version
> 7 cannot communicate with client version 3.
>
> My Hadoop distribution is CDH4. The Java Maven project takes its
> dependency from Apache. The exception comes from a line involving the
> "Configuration" class.
>
> Any idea on how to avoid this exception?


Re: Testing online one class

2013-04-22 Thread Ted Yu
You can use the following command:

mvn test -Dtest=TestReplicationPolicy

Cheers

On Mon, Apr 22, 2013 at 10:47 AM, Mohammad Mustaqeem <3m.mustaq...@gmail.com
> wrote:

> I have seen the test folder in trunk. How to use these test code.
> Like I want to test only TestReplicationPolicy. How to run this code?
> --
> *With regards ---*
> *Mohammad Mustaqeem*,
> M.Tech (CSE)
> MNNIT Allahabad
> 9026604270
>


Re: Heads up: moving from 2.0.4.1-alpha to 2.0.5-alpha

2013-05-31 Thread Ted Yu
I am currently testing HBase 0.95 using 2.0.5-SNAPSHOT artifacts.

Would 2.1.0-SNAPSHOT maven artifacts be available after tomorrow's change ?

Thanks

On Fri, May 31, 2013 at 12:45 PM, Konstantin Boudnik  wrote:

> Guys,
>
> I will be performing some changes wrt to moving 2.0.4.1 release candidate
> to
> 2.0.5 space. As outline below by Alejandro:
>
> 1. I will create new 2.0.5-alpha branch from the current head of
> 2.0.4-alpha
> that contains 2.0.4.1 changes
> 2. consequently, set the artifacts version on the new branch to be
> 2.0.5-alpha
> 3. the CHANGES.txt will be updated accordingly on the new 2.0.5 branch
> 4. At this point I can cut an RC and put it out for re-vote. The staging
> can
> be done after the next two steps.
>
> I will be doing all these modifications in the next hour or so.
>
> Tomorrow at 1 pm PDT I would like to:
> 1. update the version of the artifacts on branch-2 to become 2.1.0-SNAPSHOT
> 2. update the CHANGES.txt in the trunk and branch-2 to reflect new version
> names
> 3. at this point it should safe to do the staging for 2.0.5-alpha RC
>
> To avoid any collisions during the last two steps - especially 2. - I would
> ask everyone to hold off the modifications of the CHANGES.txt files on
> trunk
> and branch-2 between 1 pm and 2 pm PDT.
>
> Please let me know if you see any flaw above, questions.
>   Cos
>
> > As we change from 2.0.4.1 to 2.0.5 you'll need to do the following
> > housekeeping as you work the new RC.
> >
> > * rename the svn branch
> > * update the versions in the POMs
> > * update the CHANGES.txt in trunk, branch-2 and the release branch
> > * change the current 2.0.5 version in JIRA to 2.1.0, create a new 2.0.5
> > version, change the fix version of the 2 JIRAs that make the RC
>
> > I renamed 2.0.5-beta to 2.1.0-beta and 2.0.4.1-alpha to 2.0.5-alpha
> versions
> > in jira for HADOOP, HDFS, YARN & MAPREDUCE.
>
> > Please take care of the rest.
>
> > Also, in branch-2, the version should be 2.1.0-SNAPSHOT.
>


Re: Restarting HDFS datanode process

2014-09-18 Thread Ted Yu
How long would your datanode be out of service ?

If the duration is long (say 1 day), you'd better decommission it first.

Otherwise, datanode restart can be done without going through
decommissioning.

On Thu, Sep 18, 2014 at 8:25 AM, Biju N  wrote:

> Hello There,
>  Can individual datanodes restarted (through service command) or do we
> need to add it to the exclude file in NN, restart DN it, remove it from the
> exclude file in NN as if the data node is being decommissioned and put
> back? Any feedback on this is much appreciated.
>
> Thanks,
> Biju
>


Re: Restarting HDFS datanode process

2014-09-18 Thread Ted Yu
DFSClient would resume reading from where it stops from the next data node.

Cheers

On Thu, Sep 18, 2014 at 11:53 AM, Biju N  wrote:

> It is for the duration of the restart. The concern is what will happen to
> any reads which is inflight that is using the data node. Will HDFS client
> know the offset to start reading from where the read failed using another
> DN?
>
> On Thu, Sep 18, 2014 at 1:57 PM, Ted Yu  wrote:
>
> > How long would your datanode be out of service ?
> >
> > If the duration is long (say 1 day), you'd better decommission it first.
> >
> > Otherwise, datanode restart can be done without going through
> > decommissioning.
> >
> > On Thu, Sep 18, 2014 at 8:25 AM, Biju N  wrote:
> >
> > > Hello There,
> > >  Can individual datanodes restarted (through service command) or do
> > we
> > > need to add it to the exclude file in NN, restart DN it, remove it from
> > the
> > > exclude file in NN as if the data node is being decommissioned and put
> > > back? Any feedback on this is much appreciated.
> > >
> > > Thanks,
> > > Biju
> > >
> >
>


Re: builds failing on H9 with "cannot access java.lang.Runnable"

2014-10-03 Thread Ted Yu
Adding builds@

On Fri, Oct 3, 2014 at 1:07 PM, Colin McCabe  wrote:

> It looks like builds are failing on the H9 host with "cannot access
> java.lang.Runnable"
>
> Example from
> https://builds.apache.org/job/PreCommit-HDFS-Build/8313/artifact/patchprocess/trunkJavacWarnings.txt
> :
>
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 03:13 min
> [INFO] Finished at: 2014-10-03T18:04:35+00:00
> [INFO] Final Memory: 57M/839M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile
> (default-testCompile) on project hadoop-mapreduce-client-app:
> Compilation failure
> [ERROR]
> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/commit/TestCommitterEventHandler.java:[189,-1]
> cannot access java.lang.Runnable
> [ERROR] bad class file: java/lang/Runnable.class(java/lang:Runnable.class)
>
> I don't have shell access to this, does anyone know what's going on on H9?
>
> best,
> Colin
>


Fwd: Slow waitForAckedSeqno took too long time

2014-12-06 Thread Ted Yu
eNode at l-hbase2.dba.dev.cn0/10.86.36.218:8020 active...
2014-12-05 11:48:47,284 INFO org.apache.hadoop.ha.ZKFailoverController:
Successfully transitioned NameNode at l-hbase2.dba.dev.cn0/10.86.36.218:8020
to active state
2014-12-05 12:29:56,906 INFO
org.apache.hadoop.hdfs.tools.DFSZKFailoverController: Allowed RPC access
from hadoop (auth:SIMPLE) at 10.86.36.217
2014-12-05 12:29:56,907 INFO org.apache.hadoop.ha.ZKFailoverController:
Requested by hadoop (auth:SIMPLE) at 10.86.36.217 to cede active role.
2014-12-05 12:29:56,943 INFO org.apache.hadoop.ha.ZKFailoverController:
Successfully ensured local node is in standby mode
2014-12-05 12:29:56,943 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Yielding from election
2014-12-05 12:29:56,943 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Deleting bread-crumb of active node…

{log}


the name node log is a bit large, so attach it.


On Dec 5, 2014, at 23:01, Ted Yu  wrote:

The warning was logged by DFSOutputStream.

What was the load on hdfs around 2014-12-05 12:03 ?
Have you checked namenode log ?

Cheers

On Thu, Dec 4, 2014 at 9:01 PM, mail list  wrote:

Hi ,all

I deploy Hbase0.98.6-cdh5.2.0 on 3 machine:

l-hbase1.dev.dba.cn0(hadoop namenode active, HMaster active)
l-hbase2.dev.dba.cn0(hadoop namenode standby, HMaster standby, hadoop
datanode)
l-hbase3.dev.dba.cn0(regionserver, hadoop datanode)

Then I shutdown the l-hbase1.dev.dba.cn0,  But HBase can not work until
about 15mins later.
I check the log and find the following log in the region server’s log:

2014-12-05 12:03:19,169 WARN  [regionserver60020-WAL.AsyncSyncer0]
hdfs.DFSClient: Slow waitForAckedSeqno took 927762ms (threshold=3ms)
2014-12-05 12:03:19,186 INFO  [regionserver60020-WAL.AsyncSyncer0]
wal.FSHLog: Slow sync cost: 927779 ms, current pipeline: [
10.86.36.219:50010]
2014-12-05 12:03:19,186 DEBUG [regionserver60020.logRoller]
regionserver.LogRoller: HLog roll requested
2014-12-05 12:03:19,236 WARN  [regionserver60020-WAL.AsyncSyncer1]
hdfs.DFSClient: Slow waitForAckedSeqno took 867706ms (threshold=3ms)

It seems the WAL Asysnc took too long time for region server recovery? I
don’t know if the log matters ?
Can any body explain the reason? and how to reduce the time for recovery?


Re: Switching to Java 7

2014-12-07 Thread Ted Yu
Looking at the test failures of
https://builds.apache.org/job/Hadoop-Hdfs-trunk/1963/ which uses jdk 1.7:

e.g.
https://builds.apache.org/job/Hadoop-Hdfs-trunk/1963/testReport/junit/org.apache.hadoop.hdfs.server.namenode.snapshot/TestRenameWithSnapshots/testRenameFileAndDeleteSnapshot/

java.lang.OutOfMemoryError: Java heap space
at sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:120)
at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:68)
at 
sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126)
at io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:120)
at 
io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)


Should more heap be given to the tests ?


Cheers


On Sun, Dec 7, 2014 at 2:09 PM, Steve Loughran 
wrote:

> The latest migration status:
>
>   if the jenkins builds are happy then the patch will go in -I do that
> monday morning 10:00 UTC
>
> https://builds.apache.org/view/H-L/view/Hadoop/
>
> Getting jenkins to work has been "surprisingly difficult"...it turns out
> that those builds which we thought were java7 or java8 weren't, as setting
>   export JAVA_HOME=${TOOLS_HOME}/java/latest
>
> meant that they picked up a java 6 machine
>
> Now the trunk precommit/postcommit and scheduled branches should have
> export JAVA_HOME=${TOOLS_HOME}/java/jdk1.7.0_55
>
> the Java 8 builds have more changes
>
> export JAVA_HOME=${TOOLS_HOME}/java/jdk1.8.0
> export MAVEN_OPTS="-Xmx3072m -XX:MaxPermSize=768m"
> and  -Dmaven.javadoc.skip=true  on the mvn builds
>
> without these javadocs fails and test runs OOM.
>
> We need to have something resembling the nightly build env setup again,
> git/Svn stored file with something for java8 alongside the normal env vars.
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: Switching to Java 7

2014-12-08 Thread Ted Yu
Looks like there was still OutOfMemoryError :

https://builds.apache.org/job/Hadoop-Hdfs-trunk/1964/testReport/junit/org.apache.hadoop.hdfs.server.namenode.snapshot/TestRenameWithSnapshots/testRenameDirAcrossSnapshottableDirs/

FYI

On Mon, Dec 8, 2014 at 2:42 AM, Steve Loughran 
wrote:

> yes, bumped them up to
>
> export MAVEN_OPTS="-Xmx3072m -XX:MaxPermSize=768m"
> export ANT_OPTS=$MAVEN_OPTS
>
> also extended test runs times.
>
>
>
> On 8 December 2014 at 00:58, Ted Yu  wrote:
>
> > Looking at the test failures of
> > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1963/ which uses jdk
> 1.7:
> >
> > e.g.
> >
> >
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1963/testReport/junit/org.apache.hadoop.hdfs.server.namenode.snapshot/TestRenameWithSnapshots/testRenameFileAndDeleteSnapshot/
> >
> > java.lang.OutOfMemoryError: Java heap space
> > at
> sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:120)
> > at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:68)
> > at
> >
> sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
> > at
> > io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126)
> > at
> io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:120)
> > at
> >
> io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)
> > at
> >
> io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)
> >
> >
> > Should more heap be given to the tests ?
> >
> >
> > Cheers
> >
> >
> > On Sun, Dec 7, 2014 at 2:09 PM, Steve Loughran 
> > wrote:
> >
> > > The latest migration status:
> > >
> > >   if the jenkins builds are happy then the patch will go in -I do that
> > > monday morning 10:00 UTC
> > >
> > > https://builds.apache.org/view/H-L/view/Hadoop/
> > >
> > > Getting jenkins to work has been "surprisingly difficult"...it turns
> out
> > > that those builds which we thought were java7 or java8 weren't, as
> > setting
> > >   export JAVA_HOME=${TOOLS_HOME}/java/latest
> > >
> > > meant that they picked up a java 6 machine
> > >
> > > Now the trunk precommit/postcommit and scheduled branches should have
> > > export JAVA_HOME=${TOOLS_HOME}/java/jdk1.7.0_55
> > >
> > > the Java 8 builds have more changes
> > >
> > > export JAVA_HOME=${TOOLS_HOME}/java/jdk1.8.0
> > > export MAVEN_OPTS="-Xmx3072m -XX:MaxPermSize=768m"
> > > and  -Dmaven.javadoc.skip=true  on the mvn builds
> > >
> > > without these javadocs fails and test runs OOM.
> > >
> > > We need to have something resembling the nightly build env setup again,
> > > git/Svn stored file with something for java8 alongside the normal env
> > vars.
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> > >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: Controlling the block placement and the file placement in HDFS writes

2014-12-19 Thread Ted Yu
Interesting - HDFS-6133 would directly help HBase data locality use case.

On Fri, Dec 19, 2014 at 2:20 PM, Yongjun Zhang  wrote:

> Hi,
>
> FYI,
>
> A relevant jira HDFS-6133 tries to tell Balancer not to move around the
> blocks stored at the favored nodes that application selected. I reviewed
> the patch, and the latest on looks good to me. Hope some committers can
> pick it up and push it forward.
>
> Thanks.
>
> --Yongjun
>
>
> On Fri, Dec 19, 2014 at 1:52 PM, Ananth Gundabattula <
> agundabatt...@gmail.com> wrote:
> >
> > Hello Zhe,
> >
> > Thanks a lot for the inputs. Storage policies is really what I was
> looking
> > for one of the problems.
> >
> > @Nick: I agree that it would be a nice feature to have. Thanks for the
> > info.
> >
> > Regards,
> > Ananth
> >
> > On Fri, Dec 19, 2014 at 10:49 AM, Nick Dimiduk 
> wrote:
> >
> > > HBase would enjoy a similar functionality. In our case, we'd like all
> > > replicas for all files in a given HDFS path to land on the same set of
> > > machines. That way, in the event of a failover, regions can be assigned
> > to
> > > one of these other machines that has local access to all blocks for all
> > > region files.
> > >
> > > On Thu, Dec 18, 2014 at 3:36 PM, Zhe Zhang <
> zhe.zhang.resea...@gmail.com
> > >
> > > wrote:
> > > >
> > > > > The second aspect is that our queries are time based and this time
> > > window
> > > > > follows a familiar pattern of old data not being queried much.
> Hence
> > we
> > > > > would like to preserve the most recent data in the HDFS cache (
> > impala
> > > is
> > > > > helping us manage this aspect via their command set ) but we would
> > like
> > > > the
> > > > > next recent amount of data chunks to land on an SSD that is present
> > on
> > > > > every datanode. The remaining set of blocks which are "very old but
> > in
> > > > > large quantities" would land on spinning disks. The decision to
> > choose
> > > a
> > > > > given volume is based on the file name as we can control the
> filename
> > > > that
> > > > > is being used to generate the file.
> > > > >
> > > >
> > > > Have you tried the 'setStoragePolicy' command? It's part of the HDFS
> > > > "Heterogeneous Storage Tiers" work and seems to address your
> scenario.
> > > >
> > > > > 1. Is there a way to control that all file blocks belonging to a
> > > > particular
> > > > > hdfs directory & file go to the same physical datanode ( and their
> > > > > corresponding replicas as well ? )
> > > >
> > > > This seems inherently hard: the file/dir could have more data than a
> > > > single DataNode can host. Implementation wise, if requires some sort
> > > > of a map in BlockPlacementPolicy from inode or file path to DataNode
> > > > address.
> > > >
> > > > My 2 cents..
> > > >
> > > > --
> > > > Zhe Zhang
> > > > Software Engineer, Cloudera
> > > > https://sites.google.com/site/zhezhangresearch/
> > > >
> > >
> >
>


Re: How to use hadoop-1.0.3-core-SNAPSHOT.jar

2015-02-02 Thread Ted Yu
You can locate the hadoop-core jar (assuming it is compatible with 1.0.3),
make a copy of it and either move it aside or rename it to an extension
other than .jar.
Then you can copy hadoop-1.0.3-core-SNAPSHOT.jar to the same location.

The above should be done for all nodes.

Cheers

On Mon, Feb 2, 2015 at 6:54 AM, Bo Fu  wrote:

> Hi all,
>
> I have a stupid question. I svn checked out the hadoop-1.0.3 and I use ant
> command to get the hadoop-1.0.3-core-SNAPSHOT.jar. But I just don’t know
> how to apply new compiled file to the hadoop system? It seems that I can
> run bin/start-dfs.sh without this jar.
>
> Can anyone tell me? Thank you!
>
> Bo Fu
> Master’s Program of Computer Science, University of Chicago
> Chicago, IL 60615
> Phone: 630-400-8561
>
>
>


Re: subscribe the dev mailing list

2015-03-12 Thread Ted Yu
Please send email to hdfs-dev-subscr...@hadoop.apache.org

On Thu, Mar 12, 2015 at 6:56 AM, 张铎  wrote:

> Thanks.
>


Re: [VOTE] Release Apache Hadoop 2.7.1 RC0

2015-06-29 Thread Ted Yu
+1 (non-binding)

Compiled hbase branch-1 with Java 1.8.0_45
Ran unit test suite which passed.

On Mon, Jun 29, 2015 at 7:22 AM, Steve Loughran 
wrote:

>
> +1 binding from me.
>
> Tests:
>
> Rebuild slider with Hadoop.version=2.7.1; ran all the tests including
> against a secure cluster.
> Repeated for windows running Java 8.
>
> All tests passed
>
>
> > On 29 Jun 2015, at 09:45, Vinod Kumar Vavilapalli 
> wrote:
> >
> > Hi all,
> >
> > I've created a release candidate RC0 for Apache Hadoop 2.7.1.
> >
> > As discussed before, this is the next stable release to follow up 2.6.0,
> > and the first stable one in the 2.7.x line.
> >
> > The RC is available for validation at:
> > *http://people.apache.org/~vinodkv/hadoop-2.7.1-RC0/
> > *
> >
> > The RC tag in git is: release-2.7.1-RC0
> >
> > The maven artifacts are available via repository.apache.org at
> > *
> https://repository.apache.org/content/repositories/orgapachehadoop-1019/
> > <
> https://repository.apache.org/content/repositories/orgapachehadoop-1019/>*
> >
> > Please try the release and vote; the vote will run for the usual 5 days.
> >
> > Thanks,
> > Vinod
> >
> > PS: It took 2 months instead of the planned [1] 2 weeks in getting this
> > release out: post-mortem in a separate thread.
> >
> > [1]: A 2.7.1 release to follow up 2.7.0
> > http://markmail.org/thread/zwzze6cqqgwq4rmw
>
>


Re: [VOTE] Release Apache Hadoop 2.7.1 RC0

2015-07-03 Thread Ted Yu
Tsuyoshi:
I tried just now with the following:

tar --version
tar (GNU tar) 1.23

uname -a
Linux a.com 2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014
x86_64 x86_64 x86_64 GNU/Linux

I was able to expand the tarball.

Can you use another machine ?

Cheers


On Fri, Jul 3, 2015 at 9:53 AM, Tsuyoshi Ozawa  wrote:

> Thank you for starting voting, Vinod.
> I tried to untar the tarball, but the command exited with an error. Is
> binary tarball broken?
>
> $ tar xzvf hadoop-2.7.1-RC0.tar.gz
> ...
>
> hadoop-2.7.1/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/hadoop-common-2.7.1.jar
>
> gzip: stdin: unexpected end of file
> tar: Unexpected EOF in archive
> tar: Unexpected EOF in archive
> tar: Error is not recoverable: exiting now
>
> $ tar --version
> tar (GNU tar) 1.27.1
>
> $ uname -a
> Linux ip-172-31-4-8 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12
> 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>
> Can anyone reproduce this problem? If this is my environment-depend
> problem, please ignore it.
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Jul 2, 2015 at 11:01 PM, Masatake Iwasaki
>  wrote:
> > +1 (non-binding)
> >
> > + verified mds of source and binary tarball
> > + built from source tarball
> > + deployed binary tarball to 4 nodes cluster and run some
> > hadoop-mapreduce-examples jobs
> >
> > Thanks,
> > Masatake Iwasaki
> >
> >
> >
> > On 6/29/15 17:45, Vinod Kumar Vavilapalli wrote:
> >>
> >> Hi all,
> >>
> >> I've created a release candidate RC0 for Apache Hadoop 2.7.1.
> >>
> >> As discussed before, this is the next stable release to follow up 2.6.0,
> >> and the first stable one in the 2.7.x line.
> >>
> >> The RC is available for validation at:
> >> *http://people.apache.org/~vinodkv/hadoop-2.7.1-RC0/
> >> *
> >>
> >> The RC tag in git is: release-2.7.1-RC0
> >>
> >> The maven artifacts are available via repository.apache.org at
> >> *
> https://repository.apache.org/content/repositories/orgapachehadoop-1019/
> >>
> >> <
> https://repository.apache.org/content/repositories/orgapachehadoop-1019/>*
> >>
> >> Please try the release and vote; the vote will run for the usual 5 days.
> >>
> >> Thanks,
> >> Vinod
> >>
> >> PS: It took 2 months instead of the planned [1] 2 weeks in getting this
> >> release out: post-mortem in a separate thread.
> >>
> >> [1]: A 2.7.1 release to follow up 2.7.0
> >> http://markmail.org/thread/zwzze6cqqgwq4rmw
> >>
> >
>


Re: Jenkins : Unable to create new native thread

2015-08-07 Thread Ted Yu
I observed the same behavior in hbase QA run as well:
https://builds.apache.org/job/PreCommit-HBASE-Build/15000/console

This was on ubuntu-2 
Looks like certain machines may have environment issue.

FYI

On Wed, Aug 5, 2015 at 12:59 AM, Brahma Reddy Battula <
brahmareddy.batt...@huawei.com> wrote:

> Dear All
>
> had seen following error (OOM) in HDFS-1148 and Hadoop-12302..jenkin
> machine have some problem..?
>
>
>
> [
> https://builds.apache.org/static/ea60962f/images/16x16/document_delete.png]
> Error Details
>
> unable to create new native thread
>
> [
> https://builds.apache.org/static/ea60962f/images/16x16/document_delete.png]
> Stack Trace
>
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
>
>
>
> Thanks & Regards
>
>  Brahma Reddy Battula
>
>
>
>


Re: FW: Handling read failures during recovery

2011-08-09 Thread Ted Yu
>> If read request comes before completing client recovery process, do we
need to make the read operation wait until recovery completes successfully?
That would be desirable.

On Tue, Aug 9, 2011 at 7:08 AM, Uma Maheswara Rao G 72686 <
mahesw...@huawei.com> wrote:

>
>
> Hi All,
>
> Any thoughts?
>
>  Looks Hbase is going to address this issue.
> https://issues.apache.org/jira/browse/HBASE-4177.
>
> Do we need to address from HDFS as well?
>
> If read request comes before completing client recovery process, do we need
> to make the read operation wait until recovery completes successfully?
>
>
> Regards,
> Uma
> > -Original Message-
> > From: Ramkrishna S Vasudevan [mailto:ramakrish...@huawei.com]
> > Sent: Friday, August 05, 2011 9:52 AM
> > To: hdfs-dev@hadoop.apache.org; d...@hbase.apache.org
> > Subject: RE: Handling read failures during recovery
> >
> > Hi
> >
> > As Laxman pointed out, there is a potential problem here.  We
> > expect the
> > Namenode recovery to happen within a specified time and we tend to
> > sleep for
> > one second in the splitLogs logic.  But we carry on with reading
> > the HLog
> > file which will result in failure.  So if the logs are not split
> > properlythere could be a data loss.
> >
> >
> >
> > Regards
> > Ram
> >
> >
> >
> > -Original Message-
> > From: Laxman [mailto:lakshman...@huawei.com]
> > Sent: Tuesday, August 02, 2011 10:47 AM
> > To: hdfs-dev@hadoop.apache.org; d...@hbase.apache.org
> > Subject: FW: Handling read failures during recovery
> >
> > Partial mail was sent accidentally. Sorry for that.
> > Resending with complete details, analysis and logs.
> >
> > 20-append version we are using.
> >
> > To summarize there are two problems [One each from HDFS and HBase] we
> > noticed in this flow.
> >
> >
> > 1) From HDFS
> > Even though client is getting the updated block info from Namenode
> > on first
> > read failure, client is discarding the new info and using the old
> > info only
> > to retrieve the data from datanode. So, all the read
> > retries are failing. [Method parameter reassignment - Not
> > reflected in
> > caller]
> >
> >
> > HDFS Code snippet
> > org.apache.hadoop.hdfs.DFSClient.DFSInputStream.chooseDataNode
> >
> > private DNAddrPair chooseDataNode(LocatedBlock block)
> >  throws IOException {
> > ...
> > ...
> > block = getBlockAt(block.getStartOffset(), false);
> > ...
> > ...
> > }
> >
> > Here method parameter "block" is assigned with the new block info
> > which is
> > not reflected in the caller "blockSeekTo(long target)".
> >
> > 2) From HBase
> >
> > Excerpt from my previous mail.
> >
> > > As the recovery is an asynchronous operation recoverLease call
> > will return
> > > immediately and may end up with read failure as the recovery is in
> > progress.
> > >
> > > This may lead to some regions to be in offline state only
> >
> > > One approach is to introduce a delay in between recovery and
> > read. But,
> > this
> > > may not be a fool proof way to address this.
> >
> > I've noticed the delay is already present in HBase code. But as I
> > mentionedthis may not be a fool proof mechanism to handle this
> > scenario.
> > HBase Code snippet
> > In the class HLogSplitter the splitLog() calls recoverFileLease().
> >
> > In recoverFileLease()
> >
> >  try {
> >Thread.sleep(1000);
> >  } catch (InterruptedException ex) {
> >new InterruptedIOException().initCause(ex);
> >  }
> >
> > Once the recover call is made we sleep for one sec and proceed with
> > parseHLog().
> >
> >
> > Here is the log
> > 2011-07-21 17:01:19,642 INFO org.apache.hadoop.hdfs.DFSClient:
> > Could not
> > obtain block blk_1311262402613_3094 from any node:
> > java.io.IOException: No
> > live nodes contain current block. Will get new block locations
> > from namenode
> > and retry...
> > 2011-07-21 17:01:20,650 INFO org.apache.hadoop.hdfs.DFSClient:
> > Could not
> > obtain block blk_1311262402613_3094 from any node:
> > java.io.IOException: No
> > live nodes contain current block. Will get new block locations
> > from namenode
> > and retry...
> > 2011-07-21 17:01:21,669 INFO org.apache.hadoop.hdfs.DFSClient:
> > Could not
> > obtain block blk_1311262402613_3094 from any node:
> > java.io.IOException: No
> > live nodes contain current block. Will get new block locations
> > from namenode
> > and retry...
> > 2011-07-21 17:01:22,677 WARN org.apache.hadoop.hdfs.DFSClient: DFS
> > Read:java.io.IOException: Could not obtain block:
> > blk_1311262402613_3318file=/hbase/.logs/158-1-101-
> > 222,20020,1311260346420/158-1-101-222%3A20020.13
> > 11265398432
> > at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.jav
> > a:2491)
> > at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2
> > 256)
> > at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2441)at
> java.io.DataInputStream.read(DataInputStream.java:132)
> > at java.io.DataInputStream.readF

Fwd: strange SIGPIPE from "bin/ls -ld" in Shell.runCommand seen when running HBase tests

2011-12-07 Thread Ted Yu
Hi,
Seeking advice from experts.

Thanks

-- Forwarded message --
From: Mikhail Bautin 
Date: Wed, Dec 7, 2011 at 11:52 AM
Subject: Re: strange SIGPIPE from "bin/ls -ld" in Shell.runCommand seen
when running HBase tests
To: d...@hbase.apache.org, common-...@hadoop.apache.org


I am already using umask 022.  Permissions on all components of the path
are also OK.  Also, "ls -ld" succeeds sometimes, but other times it fails
with a SIGPIPE and no error message. Additionally, I saw cases where it
SIGPIPE'd but produced correct output (a "drwxr-xr-x ..." line).  Here is
my patch for Hadoop to work around the ls -ld SIGPIPE issue (I just
overrode the hadoop-0.20.205.0 jar in my local maven repository to run unit
tests).

Index: src/core/org/apache/hadoop/fs/RawLocalFileSystem.java
===
--- src/core/org/apache/hadoop/fs/RawLocalFileSystem.java (revision 1198126)
+++ src/core/org/apache/hadoop/fs/RawLocalFileSystem.java (working copy)
@@ -416,7 +416,7 @@
  IOException e = null;
  try {
StringTokenizer t = new StringTokenizer(
-FileUtil.execCommand(new File(getPath().toUri()),
+FileUtil.execCommandWithRetries(new File(getPath().toUri()),
 Shell.getGET_PERMISSION_COMMAND()));
//expected format
//-rw---1 username groupname ...
Index: src/core/org/apache/hadoop/fs/FileUtil.java
===
--- src/core/org/apache/hadoop/fs/FileUtil.java (revision 1198126)
+++ src/core/org/apache/hadoop/fs/FileUtil.java (working copy)
@@ -19,6 +19,7 @@
 package org.apache.hadoop.fs;

 import java.io.*;
+import java.util.Arrays;
 import java.util.Enumeration;
 import java.util.zip.ZipEntry;
 import java.util.zip.ZipFile;
@@ -703,6 +704,20 @@
String output = Shell.execCommand(args);
return output;
  }
+
+  static String execCommandWithRetries(File f, String... cmd)
+  throws IOException {
+for (int attempt = 0; attempt < 10; ++attempt) {
+  try {
+return execCommand(f, cmd);
+  } catch (IOException ex) {
+LOG.error("Failed to execute command: f=" + f + " cmd=" +
+Arrays.toString(cmd) + " (attempt " + attempt + ")",
+ex);
+  }
+}
+return execCommand(f, cmd);
+  }

  /**
   * Create a tmp file for a base file.
Index: src/core/org/apache/hadoop/util/Shell.java
===
--- src/core/org/apache/hadoop/util/Shell.java (revision 1198126)
+++ src/core/org/apache/hadoop/util/Shell.java (working copy)
@@ -239,6 +239,7 @@
  String line = inReader.readLine();
  while(line != null) {
line = inReader.readLine();
+LOG.error("Additional line from output: " + line);
  }
  // wait for the process to finish and check the exit code
  exitCode  = process.waitFor();
@@ -251,6 +252,25 @@
  completed.set(true);
  //the timeout thread handling
  //taken care in finally block
+  LOG.error("exitCode=" + exitCode);
+  if (exitCode == 141 && this instanceof ShellCommandExecutor) {
+String[] execStr = getExecString();
+String outStr = ((ShellCommandExecutor) this).getOutput();
+LOG.error("execStr=" + java.util.Arrays.toString(execStr) +
+", outStr=" + outStr);
+if (execStr.length >= 2 &&
+execStr[0].equals("/bin/ls") &&
+execStr[1].equals("-ld") &&
+outStr.startsWith("d") &&
+outStr.length() >= 11 &&
+outStr.charAt(10) == ' ') {
+  // A work-around for a weird SIGPIPE bug on ls -ld.
+  LOG.error("Ignoring exit code " + exitCode + " for /bin/ls -ld:
" +
+      "got output " + outStr);
+  exitCode = 0;
+}
+  }
+
  if (exitCode != 0) {
throw new ExitCodeException(exitCode, errMsg.toString());
  }

Thanks,
--Mikhail

On Wed, Dec 7, 2011 at 11:31 AM, Ted Yu  wrote:

> A tip from Jonathan Hsieh is related to the problem Mikhail was
> experiencing:
> 
> Run:
> umask 022
>
> before running the test on whatever machine you are testing on.
>
> 
>
> Cheers
>
> On Wed, Dec 7, 2011 at 10:29 AM, Ted Yu  wrote:
>
> > Mikhail:
> > Your patch was stripped by email server.
> >
> > I assume you have verified permission for all components of path:
> >
> >
>
/data/users/mbautin/workdirs/hb-os/target/test-data/37d6e996-cba6-4a12-85bc-dbcf2e91d297
> >
> > Cheers
> >
> >
> > On Tue, Dec 6, 2011 at 5:07 PM, Mikhail Bautin <
> > bautin.mailing.li

Re: [VOTE] Release Apache Hadoop 2.7.2 RC1

2015-12-17 Thread Ted Yu
Hi,
I have run test suite for tip of hbase 0.98 branch against this RC.

All tests passed.

+1

On Wed, Dec 16, 2015 at 6:49 PM, Vinod Kumar Vavilapalli  wrote:

> Hi all,
>
> I've created a release candidate RC1 for Apache Hadoop 2.7.2.
>
> As discussed before, this is the next maintenance release to follow up
> 2.7.1.
>
> The RC is available for validation at:
> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/ <
> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/>
>
> The RC tag in git is: release-2.7.2-RC1
>
> The maven artifacts are available via repository.apache.org <
> http://repository.apache.org/> at
> https://repository.apache.org/content/repositories/orgapachehadoop-1026/ <
> https://repository.apache.org/content/repositories/orgapachehadoop-1026/>
>
> The release-notes are inside the tar-balls at location
> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
> hosted this at
> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html for
> quick perusal.
>
> As you may have noted,
>  - The RC0 related voting thread got halted due to some critical issues.
> It took a while again for getting all those blockers out of the way. See
> the previous voting thread [3] for details.
>  - Before RC0, an unusually long 2.6.3 release caused 2.7.2 to slip by
> quite a bit. This release's related discussion threads are linked below:
> [1] and [2].
>
> Please try the release and vote; the vote will run for the usual 5 days.
>
> Thanks,
> Vinod
>
> [1]: 2.7.2 release plan: http://markmail.org/message/oozq3gvd4nhzsaes <
> http://markmail.org/message/oozq3gvd4nhzsaes>
> [2]: Planning Apache Hadoop 2.7.2
> http://markmail.org/message/iktqss2qdeykgpqk <
> http://markmail.org/message/iktqss2qdeykgpqk>
> [3]: [VOTE] Release Apache Hadoop 2.7.2 RC0:
> http://markmail.org/message/5txhvr2qdiqglrwc
>
>


Re: [VOTE] Release Apache Hadoop 2.6.4 RC0

2016-02-03 Thread Ted Yu
I modified hbase pom.xml (0.98 branch) to point to staged maven artifacts.

All unit tests passed.

Cheers

On Tue, Feb 2, 2016 at 11:01 PM, Junping Du  wrote:

> Hi community folks,
>I've created a release candidate RC0 for Apache Hadoop 2.6.4 (the next
> maintenance release to follow up 2.6.3.) according to email thread of
> release plan 2.6.4 [1]. Below is details of this release candidate:
>
> The RC is available for validation at:
> *http://people.apache.org/~junping_du/hadoop-2.6.4-RC0/
> *
>
> The RC tag in git is: release-2.6.4-RC0
>
> The maven artifacts are staged via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1028/?
>  >*
>
> You can find my public key at:
> http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS
>
> Please try the release and vote. The vote will run for the usual 5 days.
>
> Thanks!
>
>
> Cheers,
>
> Junping
>
>
> [1]: 2.6.4 release plan: http://markmail.org/message/fk3ud3c665lscvx5?
>
>


Re: Hadoop JIRA web access is down?

2016-06-13 Thread Ted Yu
Please see:
https://status.apache.org/

Access to JIRA has been flaky today.

FYI

On Mon, Jun 13, 2016 at 1:18 PM, Xiaoyu Yao  wrote:

> Both hadoop hdfs and common JIRA sites are down. Any known issue?
> https://issues.apache.org/jira/browse/hdfs
> https://issues.apache.org/jira/browse/HADOOP
>
>
> issues.apache.org is currently unable to handle this request.
>
>


Re: Is anyone seeing this during trunk build?

2016-09-28 Thread Ted Yu
I used the same command but didn't see the error you saw.

Here is my environment:

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
MaxPermSize=512M; support was removed in 8.0
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
2015-11-10T08:41:47-08:00)
Maven home: /Users/tyu/apache-maven-3.3.9
Java version: 1.8.0_91, vendor: Oracle Corporation
Java home:
/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.11.3", arch: "x86_64", family: "mac"

FYI

On Wed, Sep 28, 2016 at 3:54 PM, Kihwal Lee 
wrote:

> I just noticed this during a trunk build. I was doing "mvn clean install
> -DskipTests".  The build succeeds.
> Is anyone seeing this?  I am using openjdk8u102.
>
>
>
> ===
> [WARNING] Unable to process class org/apache/hadoop/hdfs/StripeReader.class
> in JarAnalyzer File /home1/kihwal/devel/apache/hadoop/hadoop-hdfs-project/
> hadoop-hdfs-client/target/hadoop-hdfs-client-3.0.0-alpha2-SNAPSHOT.jar
> org.apache.bcel.classfile.ClassFormatException: Invalid byte tag in
> constant pool: 18
> at org.apache.bcel.classfile.Constant.readConstant(Constant.java:146)
> at org.apache.bcel.classfile.ConstantPool.(ConstantPool.java:67)
> at org.apache.bcel.classfile.ClassParser.readConstantPool(
> ClassParser.java:222)
> at org.apache.bcel.classfile.ClassParser.parse(ClassParser.java:136)
> at org.apache.maven.shared.jar.classes.JarClassesAnalysis.
> analyze(JarClassesAnalysis.java:92)
> at org.apache.maven.report.projectinfo.dependencies.Dependencies.
> getJarDependencyDetails(Dependencies.java:255)
> at org.apache.maven.report.projectinfo.dependencies.
> renderer.DependenciesRenderer.hasSealed(DependenciesRenderer.java:1454)
> at org.apache.maven.report.projectinfo.dependencies.
> renderer.DependenciesRenderer.renderSectionDependencyFileDet
> ails(DependenciesRenderer.java:536)
> at org.apache.maven.report.projectinfo.dependencies.
> renderer.DependenciesRenderer.renderBody(DependenciesRenderer.java:263)
> at org.apache.maven.reporting.AbstractMavenReportRenderer.render(
> AbstractMavenReportRenderer.java:79)
> at org.apache.maven.report.projectinfo.DependenciesReport.
> executeReport(DependenciesReport.java:186)
> at org.apache.maven.reporting.AbstractMavenReport.generate(
> AbstractMavenReport.java:190)
> at org.apache.maven.report.projectinfo.AbstractProjectInfoReport.
> execute(AbstractProjectInfoReport.java:202)
> at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(
> DefaultBuildPluginManager.java:101)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute(
> MojoExecutor.java:209)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute(
> MojoExecutor.java:153)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute(
> MojoExecutor.java:145)
> at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.
> buildProject(LifecycleModuleBuilder.java:84)
> at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.
> buildProject(LifecycleModuleBuilder.java:59)
> at org.apache.maven.lifecycle.internal.LifecycleStarter.
> singleThreadedBuild(LifecycleStarter.java:183)
> at org.apache.maven.lifecycle.internal.LifecycleStarter.
> execute(LifecycleStarter.java:161)
> at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
> at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
> at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
> at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
> at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.codehaus.plexus.classworlds.launcher.Launcher.
> launchEnhanced(Launcher.java:290)
> at org.codehaus.plexus.classworlds.launcher.Launcher.
> launch(Launcher.java:230)
> at org.codehaus.plexus.classworlds.launcher.Launcher.
> mainWithExitCode(Launcher.java:414)
> at org.codehaus.plexus.classworlds.launcher.Launcher.
> main(Launcher.java:357)
> ===
>


Re: YARN Jenkins Build get consistent failed.

2016-12-21 Thread Ted Yu
Precommit build #14423 has completed.

The exclusion (H5 and H6) has been done.

See the other thread started by Sangjin.

On Wed, Dec 21, 2016 at 11:59 AM, Junping Du  wrote:

> Hi hadoop folks,
>
>I noticed that our recent YARN jenkins tests are consistently failed (
> https://builds.apache.org/job/PreCommit-YARN-Build) due to test
> environment issues below.
>
>I already filed blocker issue https://issues.apache.org/
> jira/browse/INFRA-13141 to our INFRA team yesterday but haven't get any
> response yet. All commit work on YARN project are fully blocked. Anyone
> have ideas on how to move things forward?
>
> btw, Jenkins tests for hadoop/hdfs/mapreduce seems to be OK.
>
>
> FATAL: Command "git clean -fdx" returned status code 1:
> stdout:
> stderr: warning: failed to remove hadoop-common-project/hadoop-
> common/target/test/data/3
>
> hudson.plugins.git.GitException search?query=hudson.plugins.git.GitException>: Command "git clean -fdx"
> returned status code 1:
> stdout:
> stderr: warning: failed to remove hadoop-common-project/hadoop-
> common/target/test/data/3
>
> at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.
> launchCommandIn(CliGitAPIImpl.java:1723)
> at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.
> launchCommandIn(CliGitAPIImpl.java:1699)
> at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.
> launchCommandIn(CliGitAPIImpl.java:1695)
> at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.
> launchCommand(CliGitAPIImpl.java:1317)
>
>
>
>
> Thanks,
>
>
> Junping
>
>


Re: Pre-commit Build is failing

2017-04-25 Thread Ted Yu
Please see:
INFRA-13985

> On Apr 25, 2017, at 5:18 AM, Brahma Reddy Battula 
>  wrote:
> 
> Hi All
> 
> 
> Pre-commit build for all the project is failing with following error, any 
> idea on this..?
> 
> 
> 
> 
> HEAD is now at 2ba21d6 YARN-6392. Add submit time to Application Summary log. 
> (Zhihai Xu via wangda)
> 
> Already on 'trunk'
> 
> Your branch is up-to-date with 'origin/trunk'.
> 
> fatal: unable to access 
> 'https://git-wip-us.apache.org/repos/asf/hadoop.git/': server certificate 
> verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
> 
> ERROR: git pull is failing
> 
> 
> 
> 
> 
> References:
> 
> https://builds.apache.org/view/PreCommit%20Builds/job/PreCommit-HADOOP-Build/12178/console
> 
> https://builds.apache.org/view/PreCommit%20Builds/job/PreCommit-HDFS-Build/19194/console
> 
> https://builds.apache.org/view/PreCommit%20Builds/job/PreCommit-YARN-Build/15733/console
> 
> 
> 
> 
> Regards
> Brahma Reddy Battula
> 

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Jenkins precommit build for HDFS failing

2018-09-27 Thread Ted Yu
Over in hbase precommit, I saw this:
https://builds.apache.org/job/PreCommit-HBASE-Build/14514/console

Resolving deltas:  86% (114758/133146), completed with 87 local
objects.*21:06:47* fatal: pack has 18388 unresolved deltas*21:06:47*
fatal: index-pack failed*21:06:47* *21:06:47*   at
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2002)*21:06:47*
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1721)*21:06:47*
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:72)


It seems QA machine(s) may have trouble accessing git.


I wonder if the 'index-pack failed' error would lead to patch not
being recognized.


FYI


On Thu, Sep 27, 2018 at 3:02 PM Ajay Kumar 
wrote:

> Hi,
>
> Jenkins precommit build for HDFS is failing with error that patch doesn’t
> apply to trunk, even when patch applies to trunk.
> I see other build failures with same error for other patches as well.
> Wanted to reach out to know if it is a known issue.
>
>
> Vote
>
> Subsystem
>
> Runtime
>
> Comment
>
> 0
>
> reexec
>
> 0m 0s
>
> Docker mode activated.
>
> -1
>
> patch
>
> 0m 5s
>
> HDFS-13941 does not apply to trunk. Rebase required? Wrong Branch? See
> https://wiki.apache.org/hadoop/HowToContribute for help.
>
>
> Subsystem
>
> Report/Notes
>
> JIRA Issue
>
> HDFS-13941
>
> Console output
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/25154/console
>
> Powered by
>
> Apache Yetus 0.8.0 http://yetus.apache.org
>
>
>
> -1 overall
>
>
> Vote
>
> Subsystem
>
> Runtime
>
> Comment
>
> 0
>
> reexec
>
> 0m 0s
>
> Docker mode activated.
>
> -1
>
> patch
>
> 0m 4s
>
> HDFS-13877 does not apply to trunk. Rebase required? Wrong Branch? See
> https://wiki.apache.org/hadoop/HowToContribute for help.
>
>
> Subsystem
>
> Report/Notes
>
> JIRA Issue
>
> HDFS-13877
>
> Console output
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/25155/console
>
> Powered by
>
> Apache Yetus 0.8.0 http://yetus.apache.org
>
>
>
>
>
> Thanks,
> Ajay Kumar
>


Re: [ANNOUNCE] Apache Hadoop Ozone 0.2.1-alpha release

2018-10-01 Thread Ted Yu
Are the artifacts published on maven ?

I did a quick search but didn't find anything.

Cheers

On Mon, Oct 1, 2018 at 5:24 PM Elek, Marton  wrote:

>
> It gives me great pleasure to announce that the Apache Hadoop community
> has voted to release Apache Hadoop Ozone 0.2.1-alpha.
>
> Apache Hadoop Ozone is an object store for Hadoop built using Hadoop
> Distributed Data Store.
>
> For more information and to download, please check
>
> https://hadoop.apache.org/ozone
>
> Note: This release is alpha quality, it's not recommended to use in
> production.
>
> Many thanks to everyone who contributed to the release, and everyone in
> the Apache Hadoop community! The release is a result of work from many
> contributors. Thank you for all of them.
>
> On behalf of the Hadoop community,
> Márton Elek
>
>
> ps: Hadoop Ozone and HDDS are released separately from the main Hadoop
> releases, this release doesn't include new Hadoop Yarn/Mapreduce/Hdfs
> versions.
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 2.6.1 RC0

2015-09-10 Thread Ted Yu
I pointed master branch of hbase to 2.6.1 RC0.
Ran unit test suite and results are good.

Cheers

On Thu, Sep 10, 2015 at 5:16 PM, Sangjin Lee  wrote:

> I verified the signatures for both source and the binary tarballs. I
> started up a pseudo-distributed cluster, and tested simple apps such as
> sleep and terasort.
>
> I do see one issue with the RM UI where the sorting by id is broken. The
> table is not rendered in the expected id-descending order, and when I click
> the sort control, nothing happens. Sorting by other columns works fine.
>
> Is anyone else able to reproduce the issue? I checked 2.6.0, and it works
> fine on 2.6.0.
>
> On Wed, Sep 9, 2015 at 6:00 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org>
> wrote:
>
> > Hi all,
> >
> > After a nearly month long [1] toil, with loads of help from Sangjin Lee
> > and Akira Ajisaka, and 153 commits later, I've created a release
> candidate
> > RC0 for hadoop-2.6.1.
> >
> > The RC is available at:
> > http://people.apache.org/~vinodkv/hadoop-2.6.1-RC0/
> >
> > The RC tag in git is: release-2.6.1-RC0
> >
> > The maven artifacts are available via repository.apache.org at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1020
> >
> > Some notes from our release process
> >  -  - Sangjin and I moved out a bunch of items pending from 2.6.1 [2] -
> > non-committed but desired patches. 2.6.1 is already big as is and is late
> > by any standard, we can definitely include them in the next release.
> >  - The 2.6.1 wiki page [3] captures some (but not all) of the context of
> > the patches that we pushed in.
> >  - Given the number of fixes pushed [4] in, we had to make a bunch of
> > changes to our original plan - we added a few improvements that helped us
> > backport patches easier (or in many cases made backports possible), and
> we
> > dropped a few that didn't make sense (HDFS-7831, HDFS-7926, HDFS-7676,
> > HDFS-7611, HDFS-7843, HDFS-8850).
> >  - I ran all the unit tests which (surprisingly?) passed. (Except for
> one,
> > which pointed out a missing fix HDFS-7552).
> >
> > As discussed before [5]
> >  - This release is the first point release after 2.6.0
> >  - I’d like to use this as a starting release for 2.6.2 in a few weeks
> > and then follow up with more of these.
> >
> > Please try the release and vote; the vote will run for the usual 5 days.
> >
> > Thanks,
> > Vinod
> >
> > [1] Hadoop 2.6.1 Release process thread:
> > http://markmail.org/thread/wkbgkxkhntx5tlux
> > [2] 2.6.1 Pending tickets:
> > https://issues.apache.org/jira/issues/?filter=12331711
> > [3] 2.6.1 Wiki page:
> > https://wiki.apache.org/hadoop/Release-2.6.1-Working-Notes
> > [4] List of 2.6.1 patches pushed:
> >
> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%202.6.1%20and%20labels%20%3D%20%222.6.1-candidate%22
> > [5] Planning Hadoop 2.6.1 release:
> > http://markmail.org/thread/sbykjn5xgnksh6wg
> >
> > PS:
> >  - Note that branch-2.6 which will be the base for 2.6.2 doesn't have
> > these fixes yet. Once 2.6.1 goes through, I plan to rebase branch-2.6
> based
> > off 2.6.1.
> >  - Patches that got into 2.6.1 all the way from 2.8 are NOT in 2.7.2 yet,
> > this will be done as a followup.
> >
> >
>


[jira] [Created] (HDFS-5012) replica.getGenerationStamp() may be >= recoveryId

2013-07-18 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5012:


 Summary: replica.getGenerationStamp() may be >= recoveryId
 Key: HDFS-5012
 URL: https://issues.apache.org/jira/browse/HDFS-5012
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Ted Yu
 Attachments: testReplicationQueueFailover.txt

We saw the following in TestReplicationQueueFailover running against 
2.0.5-alpha:
{code}
2013-07-16 17:14:33,340 ERROR [IPC Server handler 7 on 35081] 
security.UserGroupInformation(1481): PriviledgedActionException as:ec2-user 
(auth:SIMPLE) cause:java.io.IOException: THIS IS NOT SUPPOSED TO HAPPEN: 
replica.getGenerationStamp() >= recoveryId = 1041, 
block=blk_4297992342878601848_1041, replica=FinalizedReplica, 
blk_4297992342878601848_1041, FINALIZED
  getNumBytes() = 794
  getBytesOnDisk()  = 794
  getVisibleLength()= 794
  getVolume()   = 
/home/ec2-user/jenkins/workspace/HBase-0.95-Hadoop-2/hbase-server/target/test-data/f2763e32-fe49-4988-ac94-eeca82431821/dfscluster_643a635e-4e39-4aa5-974c-25e01db16ff7/dfs/data/data3/current
  getBlockFile()= 
/home/ec2-user/jenkins/workspace/HBase-0.95-Hadoop-2/hbase-server/target/test-data/f2763e32-fe49-4988-ac94-eeca82431821/dfscluster_643a635e-4e39-4aa5-974c-25e01db16ff7/dfs/data/data3/current/BP-1477359609-10.197.55.49-1373994849464/current/finalized/blk_4297992342878601848
  unlinked  =false
2013-07-16 17:14:33,341 WARN  
[org.apache.hadoop.hdfs.server.datanode.DataNode$2@64a1fcba] 
datanode.DataNode(1894): Failed to obtain replica info for block 
(=BP-1477359609-10.197.55.49-1373994849464:blk_4297992342878601848_1041) from 
datanode (=127.0.0.1:47006)
java.io.IOException: THIS IS NOT SUPPOSED TO HAPPEN: 
replica.getGenerationStamp() >= recoveryId = 1041, 
block=blk_4297992342878601848_1041, replica=FinalizedReplica, 
blk_4297992342878601848_1041, FINALIZED
  getNumBytes() = 794
  getBytesOnDisk()  = 794
  getVisibleLength()= 794
  getVolume()   = 
/home/ec2-user/jenkins/workspace/HBase-0.95-Hadoop-2/hbase-server/target/test-data/f2763e32-fe49-4988-ac94-eeca82431821/dfscluster_643a635e-4e39-4aa5-974c-25e01db16ff7/dfs/data/data3/current
  getBlockFile()= 
/home/ec2-user/jenkins/workspace/HBase-0.95-Hadoop-2/hbase-server/target/test-data/f2763e32-fe49-4988-ac94-eeca82431821/dfscluster_643a635e-4e39-4aa5-974c-25e01db16ff7/dfs/data/data3/current/BP-1477359609-10.197.55.49-1373994849464/current/finalized/blk_4297992342878601848
  unlinked  =false
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5018) Misspelled DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT in javadoc of DatanodeInfo#isStale()

2013-07-21 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5018:


 Summary: Misspelled 
DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT in javadoc of 
DatanodeInfo#isStale()
 Key: HDFS-5018
 URL: https://issues.apache.org/jira/browse/HDFS-5018
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu


DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT was misspelled in 
javadoc of DatanodeInfo#isStale()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5041) Add the time of last heartbeat to dead server Web UI

2013-07-29 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5041:


 Summary: Add the time of last heartbeat to dead server Web UI
 Key: HDFS-5041
 URL: https://issues.apache.org/jira/browse/HDFS-5041
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ted Yu
Priority: Minor


In Live Server page, there is a column 'Last Contact'.

On the dead server page, similar column can be added which shows when the last 
heartbeat came from the respective dead node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5081) DistributedFileSystem#listStatus() throws FileNotFoundException when target of symlink doesn't exist

2013-08-08 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5081:


 Summary: DistributedFileSystem#listStatus() throws 
FileNotFoundException when target of symlink doesn't exist
 Key: HDFS-5081
 URL: https://issues.apache.org/jira/browse/HDFS-5081
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu


I was running HBase trunk test suite against hadoop 2.1.1-SNAPSHOT.
One test failed due to:
{code}
org.apache.hadoop.hbase.catalog.TestMetaMigrationConvertingToPB  Time elapsed: 
1,594,938.629 sec  <<< ERROR!
java.io.FileNotFoundException: File 
hdfs://localhost:61300/user/tyu/hbase/.archive does not exist.
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:656)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:92)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:710)
  at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:78)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:710)
  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1478)
  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1518)
  at org.apache.hadoop.hbase.util.FSUtils.getLocalTableDirs(FSUtils.java:1317)
  at 
org.apache.hadoop.hbase.migration.NamespaceUpgrade.migrateTables(NamespaceUpgrade.java:114)
  at 
org.apache.hadoop.hbase.migration.NamespaceUpgrade.upgradeTableDirs(NamespaceUpgrade.java:87)
  at 
org.apache.hadoop.hbase.migration.NamespaceUpgrade.run(NamespaceUpgrade.java:206)
{code}
TestMetaMigrationConvertToPB.tgz was generated from previous release of HBase.
TestMetaMigrationConvertToPB would upgrade to current release of HBase.

The test is at 
hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaMigrationConvertingToPB.java
 under HBase trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5352) Server#initLog() doesn't close InputStream

2013-10-11 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5352:


 Summary: Server#initLog() doesn't close InputStream
 Key: HDFS-5352
 URL: https://issues.apache.org/jira/browse/HDFS-5352
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-5352.patch

Here is related code snippet in 
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/server/Server.java:
{code}
  Properties props = new Properties();
  try {
InputStream is = getResource(DEFAULT_LOG4J_PROPERTIES);
props.load(is);
  } catch (IOException ex) {
throw new ServerException(ServerException.ERROR.S03, 
DEFAULT_LOG4J_PROPERTIES, ex.getMessage(), ex);
  }
{code}
is should be closed after loading.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HDFS-4642) Allow lease recovery for multiple paths to be issued in one request

2013-11-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HDFS-4642.
--

Resolution: Later

> Allow lease recovery for multiple paths to be issued in one request
> ---
>
> Key: HDFS-4642
> URL: https://issues.apache.org/jira/browse/HDFS-4642
> Project: Hadoop HDFS
>  Issue Type: Improvement
>    Reporter: Ted Yu
>
> Currently client can only request lease recovery for one Path:
> {code}
>   public boolean recoverLease(Path f) throws IOException {
> {code}
> For HBase distributed log splitting, Nicolas made a suggestion here:
> https://issues.apache.org/jira/browse/HBASE-7878?focusedCommentId=13615364&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13615364
> HBase master collects the files that should be split, it issues lease 
> recovery for the files (in one request), then distribute log splitting.
> This would help shorten MTTR.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5576) RPC#stopProxy() should log the class of proxy when IllegalArgumentException is encountered

2013-11-27 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5576:


 Summary: RPC#stopProxy() should log the class of proxy when 
IllegalArgumentException is encountered
 Key: HDFS-5576
 URL: https://issues.apache.org/jira/browse/HDFS-5576
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ted Yu
Priority: Minor


When investigating HBASE-10029, [~szetszwo] made the suggestion of logging the 
class of proxy when IllegalArgumentException is thrown.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5672) TestHASafeMode#testSafeBlockTracking fails in trunk

2013-12-16 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5672:


 Summary: TestHASafeMode#testSafeBlockTracking fails in trunk
 Key: HDFS-5672
 URL: https://issues.apache.org/jira/browse/HDFS-5672
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu


>From build #1614:
{code}
 TestHASafeMode.testSafeBlockTracking:623->assertSafeMode:488 Bad safemode 
status: 'Safe mode is ON. The reported blocks 3 needs additional 7 blocks to 
reach the threshold 0.9990 of total blocks 10.
Safe mode will be turned off automatically'
{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HDFS-5679) TestCacheDirectives should handle the case where native code is not available

2013-12-18 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5679:


 Summary: TestCacheDirectives should handle the case where native 
code is not available
 Key: HDFS-5679
 URL: https://issues.apache.org/jira/browse/HDFS-5679
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu


TestCacheDirectives fails on trunk due to:
{code}
testBasicPoolOperations(org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives)
  Time elapsed: 1.618 sec  <<< ERROR!
java.lang.RuntimeException: Cannot start datanode because the configured max 
locked memory size (dfs.datanode.max.locked.memory) is greater than zero and 
native code is not available.
{code}
Configuration of max locked memory size should be dependent on whether native 
code is available.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5672) TestHASafeMode#testSafeBlockTracking fails in trunk

2013-12-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HDFS-5672.
--

Resolution: Cannot Reproduce

> TestHASafeMode#testSafeBlockTracking fails in trunk
> ---
>
> Key: HDFS-5672
> URL: https://issues.apache.org/jira/browse/HDFS-5672
> Project: Hadoop HDFS
>  Issue Type: Test
>    Reporter: Ted Yu
>
> From build #1614:
> {code}
>  TestHASafeMode.testSafeBlockTracking:623->assertSafeMode:488 Bad safemode 
> status: 'Safe mode is ON. The reported blocks 3 needs additional 7 blocks to 
> reach the threshold 0.9990 of total blocks 10.
> Safe mode will be turned off automatically'
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5705) TestSecondaryNameNodeUpgrade#testChangeNsIDFails may fail due to ConcurrentModificationException

2013-12-29 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5705:


 Summary: TestSecondaryNameNodeUpgrade#testChangeNsIDFails may fail 
due to ConcurrentModificationException
 Key: HDFS-5705
 URL: https://issues.apache.org/jira/browse/HDFS-5705
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1626/testReport/org.apache.hadoop.hdfs.server.namenode/TestSecondaryNameNodeUpgrade/testChangeNsIDFails/
> :
{code}
java.util.ConcurrentModificationException: null
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
at java.util.HashMap$EntryIterator.next(HashMap.java:834)
at java.util.HashMap$EntryIterator.next(HashMap.java:832)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:251)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:218)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1414)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1309)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1464)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1439)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1423)
at 
org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade.doIt(TestSecondaryNameNodeUpgrade.java:97)
at 
org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade.testChangeNsIDFails(TestSecondaryNameNodeUpgrade.java:116)
{code}
The above happens when shutdown() is called in parallel to addBlockPool() or 
shutdownBlockPool().



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5710) FSDirectory#getFullPathName should check for inodes against null

2013-12-31 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5710:


 Summary: FSDirectory#getFullPathName should check for inodes 
against null
 Key: HDFS-5710
 URL: https://issues.apache.org/jira/browse/HDFS-5710
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/
> :
{code}
2014-01-01 00:10:15,571 INFO  [IPC Server handler 2 on 50198] 
blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: 
blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 
2014-01-01 00:10:16,559 WARN  
[org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
 namenode.FSDirectory(1854): Could not get full path. Corresponding file might 
have deleted already.
2014-01-01 00:10:16,560 FATAL 
[org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
 blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor 
thread received Runtime exception. 
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871)
at 
org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112)
at java.lang.Thread.run(Thread.java:724)
{code}
Looks like getRelativePathINodes() returned null but getFullPathName() didn't 
check inodes against null, leading to NPE.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5718) TestHttpsFileSystem intermittently fails with Port in use error

2014-01-04 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5718:


 Summary: TestHttpsFileSystem intermittently fails with Port in use 
error
 Key: HDFS-5718
 URL: https://issues.apache.org/jira/browse/HDFS-5718
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


>From 
>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1634/testReport/junit/org.apache.hadoop.hdfs.web/TestHttpsFileSystem/org_apache_hadoop_hdfs_web_TestHttpsFileSystem/
> :
{code}
java.net.BindException: Port in use: localhost:50475
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:383)
at java.net.ServerSocket.bind(ServerSocket.java:328)
at java.net.ServerSocket.(ServerSocket.java:194)
at javax.net.ssl.SSLServerSocket.(SSLServerSocket.java:106)
at 
com.sun.net.ssl.internal.ssl.SSLServerSocketImpl.(SSLServerSocketImpl.java:108)
at 
com.sun.net.ssl.internal.ssl.SSLServerSocketFactoryImpl.createServerSocket(SSLServerSocketFactoryImpl.java:72)
at 
org.mortbay.jetty.security.SslSocketConnector.newServerSocket(SslSocketConnector.java:478)
at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73)
at org.apache.hadoop.http.HttpServer.openListeners(HttpServer.java:973)
at org.apache.hadoop.http.HttpServer.start(HttpServer.java:914)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:412)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:315)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1846)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1746)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1203)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:673)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:342)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:323)
at 
org.apache.hadoop.hdfs.web.TestHttpsFileSystem.setUp(TestHttpsFileSystem.java:64)
{code}
This could have been caused by concurrent test(s).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5719) FSImage#doRollback() should close prevState before return

2014-01-05 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5719:


 Summary: FSImage#doRollback() should close prevState before return
 Key: HDFS-5719
 URL: https://issues.apache.org/jira/browse/HDFS-5719
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
FSImage prevState = new FSImage(conf);
{code}
prevState should be closed before return from doRollback()



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns

2014-01-06 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5721:


 Summary: sharedEditsImage in Namenode#initializeSharedEdits() 
should be closed before method returns
 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


At line 901:
{code}
  FSImage sharedEditsImage = new FSImage(conf,
  Lists.newArrayList(),
  sharedEditsDirs);
{code}
sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5750) JHLogAnalyzer#parseLogFile() should close stm upon return

2014-01-09 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5750:


 Summary: JHLogAnalyzer#parseLogFile() should close stm upon return
 Key: HDFS-5750
 URL: https://issues.apache.org/jira/browse/HDFS-5750
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


stm is assigned to in
But stm may point to another InputStream :
{code}
if(compressionClass != null) {
  CompressionCodec codec = (CompressionCodec)
ReflectionUtils.newInstance(compressionClass, new Configuration());
  in = codec.createInputStream(stm);
{code}
stm should be closed in the finally block.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5834) TestCheckpoint#testCheckpoint may fail due to Bad value assertion

2014-01-26 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5834:


 Summary: TestCheckpoint#testCheckpoint may fail due to Bad value 
assertion
 Key: HDFS-5834
 URL: https://issues.apache.org/jira/browse/HDFS-5834
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


I saw the following when running test suite on Linux:
{code}
testCheckpoint(org.apache.hadoop.hdfs.server.namenode.TestCheckpoint)  Time 
elapsed: 3.058 sec  <<< FAILURE!
java.lang.AssertionError: Bad value for metric GetImageNumOps
Expected: gt(0)
 got: <0L>

at org.junit.Assert.assertThat(Assert.java:780)
at 
org.apache.hadoop.test.MetricsAsserts.assertCounterGt(MetricsAsserts.java:318)
at 
org.apache.hadoop.hdfs.server.namenode.TestCheckpoint.testCheckpoint(TestCheckpoint.java:1058)
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5839) TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk

2014-01-27 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5839:


 Summary: TestWebHDFS#testNamenodeRestart fails with 
NullPointerException in trunk
 Key: HDFS-5839
 URL: https://issues.apache.org/jira/browse/HDFS-5839
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5886) Potential null pointer deference in RpcProgramNfs3#readlink()

2014-02-04 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5886:


 Summary: Potential null pointer deference in 
RpcProgramNfs3#readlink()
 Key: HDFS-5886
 URL: https://issues.apache.org/jira/browse/HDFS-5886
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu


Here is related code:
{code}
  if (MAX_READ_TRANSFER_SIZE < target.getBytes().length) {
return new READLINK3Response(Nfs3Status.NFS3ERR_IO, postOpAttr, null);
  }
{code}
READLINK3Response constructor would dereference the third parameter:
{code}
this.path = new byte[path.length];
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5892) TestDeleteBlockPool fails in branch-2

2014-02-05 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5892:


 Summary: TestDeleteBlockPool fails in branch-2
 Key: HDFS-5892
 URL: https://issues.apache.org/jira/browse/HDFS-5892
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


Running test suite on Linux, I got:
{code}
testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) 
 Time elapsed: 8.143 sec  <<< ERROR!
java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting...
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5897) TestNNWithQJM#testNewNamenodeTakesOverWriter occasionally fails in trunk

2014-02-06 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5897:


 Summary: TestNNWithQJM#testNewNamenodeTakesOverWriter occasionally 
fails in trunk
 Key: HDFS-5897
 URL: https://issues.apache.org/jira/browse/HDFS-5897
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1665/testReport/junit/org.apache.hadoop.hdfs.qjournal/TestNNWithQJM/testNewNamenodeTakesOverWriter/
> :
{code}
java.lang.Exception: test timed out after 3 milliseconds
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:412)
at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:401)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
{code}
I saw:
{code}
2014-02-06 11:38:37,970 ERROR namenode.EditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(221)) - Got error reading edit log 
input stream 
http://localhost:40509/getJournal?jid=myjournal&segmentTxId=3&storageInfo=-51%3A1571339494%3A0%3AtestClusterID;
 failing over to edit log 
http://localhost:56244/getJournal?jid=myjournal&segmentTxId=3&storageInfo=-51%3A1571339494%3A0%3AtestClusterID
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
 got premature end-of-file at txid 0; expected file to go up to 4
at 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:194)
at 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:83)
at 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:140)
at 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:178)
at 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:83)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:167)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:120)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:708)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:606)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:263)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:874)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:634)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:446)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:502)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:658)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:643)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1291)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:939)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:824)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:678)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
at 
org.apache.hadoop.hdfs.qjournal.TestNNWithQJM.testNewNamenodeTakesOverWriter(TestNNWithQJM.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.j

[jira] [Created] (HDFS-5913) Nfs3Utils#getWccAttr() should check attr parameter against null

2014-02-07 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5913:


 Summary: Nfs3Utils#getWccAttr() should check attr parameter 
against null
 Key: HDFS-5913
 URL: https://issues.apache.org/jira/browse/HDFS-5913
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


In RpcProgramNfs3#commit() :
{code}
  Nfs3FileAttributes postOpAttr = null;
  try {
postOpAttr = writeManager.getFileAttr(dfsClient, handle, iug);
  } catch (IOException e1) {
LOG.info("Can't get postOpAttr for fileId: " + handle.getFileId());
  }
  WccData fileWcc = new WccData(Nfs3Utils.getWccAttr(preOpAttr), 
postOpAttr);
{code}
If there is exception, postOpAttr would be null.
However, Nfs3Utils#getWccAttr() dereferences attr parameter directly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5953) TestBlockReaderFactory fails in trunk

2014-02-14 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5953:


 Summary: TestBlockReaderFactory fails in trunk
 Key: HDFS-5953
 URL: https://issues.apache.org/jira/browse/HDFS-5953
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1673/testReport/junit/org.apache.hadoop.hdfs/TestBlockReaderFactory/testFallbackFromShortCircuitToUnixDomainTraffic/
> :
{code}
java.lang.RuntimeException: Although a UNIX domain socket path is configured as 
/tmp/socks.1392383436573.1418778351/testFallbackFromShortCircuitToUnixDomainTraffic._PORT,
 we cannot start a localDataXceiverServer because libhadoop cannot be loaded.
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:601)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:573)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:769)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:315)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1864)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
at 
org.apache.hadoop.hdfs.TestBlockReaderFactory.testFallbackFromShortCircuitToUnixDomainTraffic(TestBlockReaderFactory.java:99)
{code}
This test failure can be reproduced locally (on Mac).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5997) TestHASafeMode#testBlocksAddedWhileStandbyIsDown fails in trunk

2014-02-22 Thread Ted Yu (JIRA)
Ted Yu created HDFS-5997:


 Summary: TestHASafeMode#testBlocksAddedWhileStandbyIsDown fails in 
trunk
 Key: HDFS-5997
 URL: https://issues.apache.org/jira/browse/HDFS-5997
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu


>From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1681/ :

REGRESSION:  
org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown

Error Message:
{code}
Bad safemode status: 'Safe mode is ON. The reported blocks 7 has reached the 
threshold 0.9990 of total blocks 6. The number of live datanodes 3 has reached 
the minimum number 0. Safe mode will be turned off automatically in 28 seconds.'
{code}

Stack Trace:
{code}
java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported 
blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of live 
datanodes 3 has reached the minimum number 0. Safe mode will be turned off 
automatically in 28 seconds.'
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660)
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-6037) TestIncrementalBlockReports#testReplaceReceivedBlock fails occasionally in trunk

2014-03-01 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6037:


 Summary: TestIncrementalBlockReports#testReplaceReceivedBlock 
fails occasionally in trunk
 Key: HDFS-6037
 URL: https://issues.apache.org/jira/browse/HDFS-6037
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1688/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestIncrementalBlockReports/testReplaceReceivedBlock/
> :
{code}
datanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(
,
,

);
Wanted 1 time:
-> at 
org.apache.hadoop.hdfs.server.datanode.TestIncrementalBlockReports.testReplaceReceivedBlock(TestIncrementalBlockReports.java:198)
But was 2 times. Undesired invocation:
-> at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportReceivedDeletedBlocks(BPServiceActor.java:303)
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-6081) TestRetryCacheWithHA#testCreateSymlink occasionally fails in trunk

2014-03-09 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6081:


 Summary: TestRetryCacheWithHA#testCreateSymlink occasionally fails 
in trunk
 Key: HDFS-6081
 URL: https://issues.apache.org/jira/browse/HDFS-6081
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1696/testReport/junit/org.apache.hadoop.hdfs.server.namenode.ha/TestRetryCacheWithHA/testCreateSymlink/
> :
{code}
2014-03-09 13:18:47,515 WARN  security.UserGroupInformation 
(UserGroupInformation.java:doAs(1600)) - PriviledgedActionException as:jenkins 
(auth:SIMPLE) cause:java.io.IOException: failed to create link /testlink either 
because the filename is invalid or the file exists
2014-03-09 13:18:47,515 INFO  ipc.Server (Server.java:run(2093)) - IPC Server 
handler 0 on 39303, call 
org.apache.hadoop.hdfs.protocol.ClientProtocol.createSymlink from 
127.0.0.1:32909 Call#682 Retry#1: error: java.io.IOException: failed to create 
link /testlink either because the filename is invalid or the file exists
java.io.IOException: failed to create link /testlink either because the 
filename is invalid or the file exists
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.createSymlinkInt(FSNamesystem.java:2053)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.createSymlink(FSNamesystem.java:2023)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.createSymlink(NameNodeRpcServer.java:965)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.createSymlink(ClientNamenodeProtocolServerSideTranslatorPB.java:844)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2071)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2067)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1597)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2065)
2014-03-09 13:18:47,522 INFO  blockmanagement.BlockManager 
(BlockManager.java:processMisReplicatesAsync(2475)) - Total number of blocks
= 1
2014-03-09 13:18:47,523 INFO  blockmanagement.BlockManager 
(BlockManager.java:processMisReplicatesAsync(2476)) - Number of invalid blocks  
= 0
2014-03-09 13:18:47,523 INFO  blockmanagement.BlockManager 
(BlockManager.java:processMisReplicatesAsync(2477)) - Number of 
under-replicated blocks = 0
2014-03-09 13:18:47,523 INFO  ha.TestRetryCacheWithHA 
(TestRetryCacheWithHA.java:run(1162)) - Got Exception while calling 
createSymlink
org.apache.hadoop.ipc.RemoteException(java.io.IOException): failed to create 
link /testlink either because the filename is invalid or the file exists
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.createSymlinkInt(FSNamesystem.java:2053)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.createSymlink(FSNamesystem.java:2023)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.createSymlink(NameNodeRpcServer.java:965)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.createSymlink(ClientNamenodeProtocolServerSideTranslatorPB.java:844)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2071)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2067)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1597)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2065)

at org.apache.hadoop.ipc.Client.call(Client.java:1409)
at org.apache.hadoop.ipc.Client.call(Client.java:1362)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy17.createSymlink(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.createSymlink(ClientNamenodeProtocolTranslatorPB.java:794)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 

[jira] [Created] (HDFS-6083) TestQuorumJournalManager#testChangeWritersLogsOutOfSync2 occasionally fails

2014-03-09 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6083:


 Summary: TestQuorumJournalManager#testChangeWritersLogsOutOfSync2 
occasionally fails
 Key: HDFS-6083
 URL: https://issues.apache.org/jira/browse/HDFS-6083
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


>From 
>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1695/testReport/junit/org.apache.hadoop.hdfs.qjournal.client/TestQuorumJournalManager/testChangeWritersLogsOutOfSync2/
> :
{code}
Leaked thread: "IPC Client (26533782) connection to /127.0.0.1:57898 from 
jenkins" Id=590 RUNNABLE
 at java.lang.System.arraycopy(Native Method)
 at java.lang.ThreadGroup.remove(ThreadGroup.java:885)
 at java.lang.Thread.exit(Thread.java:672)
{code}
The following check should give more time for the threads to shutdown:
{code}
// Should not leak clients between tests -- this can cause flaky tests.
// (See HDFS-4643)
GenericTestUtils.assertNoThreadsMatching(".*IPC Client.*");
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6092) DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port

2014-03-11 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6092:


 Summary: DistributedFileSystem#getCanonicalServiceName() and 
DistributedFileSystem#getUri() may return inconsistent results w.r.t. port
 Key: HDFS-6092
 URL: https://issues.apache.org/jira/browse/HDFS-6092
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Ted Yu


I discovered this when working on HBASE-10717
Here is sample code to reproduce the problem:
{code}
Path desPath = new Path("hdfs://127.0.0.1/");
FileSystem desFs = desPath.getFileSystem(conf);

String s = desFs.getCanonicalServiceName();
URI uri = desFs.getUri();
{code}
Canonical name string contains the default port - 8020
But uri doesn't contain port.
This would result in the following exception:
{code}
testIsSameHdfs(org.apache.hadoop.hbase.util.TestFSHDFSUtils)  Time elapsed: 
0.001 sec  <<< ERROR!
java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.(InetSocketAddress.java:224)
at 
org.apache.hadoop.hbase.util.FSHDFSUtils.getNNAddresses(FSHDFSUtils.java:88)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HDFS-5672) TestHASafeMode#testSafeBlockTracking fails in trunk

2014-03-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HDFS-5672:
--


> TestHASafeMode#testSafeBlockTracking fails in trunk
> ---
>
> Key: HDFS-5672
> URL: https://issues.apache.org/jira/browse/HDFS-5672
> Project: Hadoop HDFS
>  Issue Type: Test
>    Reporter: Ted Yu
>
> From build #1614:
> {code}
>  TestHASafeMode.testSafeBlockTracking:623->assertSafeMode:488 Bad safemode 
> status: 'Safe mode is ON. The reported blocks 3 needs additional 7 blocks to 
> reach the threshold 0.9990 of total blocks 10.
> Safe mode will be turned off automatically'
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6160) TestSafeMode occasionally fails

2014-03-26 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6160:


 Summary: TestSafeMode occasionally fails
 Key: HDFS-6160
 URL: https://issues.apache.org/jira/browse/HDFS-6160
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


>From 
>https://builds.apache.org/job/PreCommit-HDFS-Build/6511//testReport/org.apache.hadoop.hdfs/TestSafeMode/testInitializeReplQueuesEarly/
> :
{code}
java.lang.AssertionError: expected:<13> but was:<0>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.hadoop.hdfs.TestSafeMode.testInitializeReplQueuesEarly(TestSafeMode.java:212)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6177) TestHttpFSServer fails occasionally in trunk

2014-03-30 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6177:


 Summary: TestHttpFSServer fails occasionally in trunk
 Key: HDFS-6177
 URL: https://issues.apache.org/jira/browse/HDFS-6177
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


>From https://builds.apache.org/job/Hadoop-hdfs-trunk/1716/consoleFull :
{code}
Running org.apache.hadoop.fs.http.server.TestHttpFSServer
Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.424 sec <<< 
FAILURE! - in org.apache.hadoop.fs.http.server.TestHttpFSServer
testDelegationTokenOperations(org.apache.hadoop.fs.http.server.TestHttpFSServer)
  Time elapsed: 0.559 sec  <<< FAILURE!
java.lang.AssertionError: expected:<401> but was:<403>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.hadoop.fs.http.server.TestHttpFSServer.testDelegationTokenOperations(TestHttpFSServer.java:352)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6257) TestCacheDirectives#testExceedsCapacity fails occasionally in trunk

2014-04-18 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6257:


 Summary: TestCacheDirectives#testExceedsCapacity fails 
occasionally in trunk
 Key: HDFS-6257
 URL: https://issues.apache.org/jira/browse/HDFS-6257
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


>From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/ :

REGRESSION:  
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity
{code}
Error Message:
Namenode should not send extra CACHE commands expected:<0> but was:<2>

Stack Trace:
java.lang.AssertionError: Namenode should not send extra CACHE commands 
expected:<0> but was:<2>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1419)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6264) Provide FileSystem#create() variant which throws exception if parent directory doesn't exist

2014-04-21 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6264:


 Summary: Provide FileSystem#create() variant which throws 
exception if parent directory doesn't exist
 Key: HDFS-6264
 URL: https://issues.apache.org/jira/browse/HDFS-6264
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Ted Yu
Priority: Minor


FileSystem#createNonRecursive() is deprecated.

However, there is no DistributedFileSystem#create() implementation which throws 
exception if parent directory doesn't exist.
This limits clients' migration away from the deprecated method.

Variant of create() method should be added which throws exception if parent 
directory doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6290) File is not closed in OfflineImageViewerPB#run()

2014-04-26 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6290:


 Summary: File is not closed in OfflineImageViewerPB#run()
 Key: HDFS-6290
 URL: https://issues.apache.org/jira/browse/HDFS-6290
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
  } else if (processor.equals("XML")) {
new PBImageXmlWriter(conf, out).visit(new RandomAccessFile(inputFile,
"r"));
{code}
The RandomAccessFile instance should be closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6291) FSImage may be left unclosed in BootstrapStandby#doRun()

2014-04-26 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6291:


 Summary: FSImage may be left unclosed in BootstrapStandby#doRun()
 Key: HDFS-6291
 URL: https://issues.apache.org/jira/browse/HDFS-6291
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


At around line 203:
{code}
  if (!checkLogsAvailableForRead(image, imageTxId, curTxId)) {
return ERR_CODE_LOGS_UNAVAILABLE;
  }
{code}
If we return following the above check, image is not closed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6369) RemoteBlockReader#available() should call FSInputChecker.available()

2014-05-10 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6369:


 Summary: RemoteBlockReader#available() should call 
FSInputChecker.available()
 Key: HDFS-6369
 URL: https://issues.apache.org/jira/browse/HDFS-6369
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Trivial


Currently DFSClient.TCP_WINDOW_SIZE is directly returned.
However, FSInputChecker.available(), in the superclass, may return value lower 
than the constant.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6083) TestQuorumJournalManager#testChangeWritersLogsOutOfSync2 occasionally fails

2014-05-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HDFS-6083.
--

Resolution: Cannot Reproduce

> TestQuorumJournalManager#testChangeWritersLogsOutOfSync2 occasionally fails
> ---
>
> Key: HDFS-6083
> URL: https://issues.apache.org/jira/browse/HDFS-6083
> Project: Hadoop HDFS
>  Issue Type: Test
>    Reporter: Ted Yu
>Priority: Minor
>
> From 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1695/testReport/junit/org.apache.hadoop.hdfs.qjournal.client/TestQuorumJournalManager/testChangeWritersLogsOutOfSync2/
>  :
> {code}
> Leaked thread: "IPC Client (26533782) connection to /127.0.0.1:57898 from 
> jenkins" Id=590 RUNNABLE
>  at java.lang.System.arraycopy(Native Method)
>  at java.lang.ThreadGroup.remove(ThreadGroup.java:885)
>  at java.lang.Thread.exit(Thread.java:672)
> {code}
> The following check should give more time for the threads to shutdown:
> {code}
> // Should not leak clients between tests -- this can cause flaky tests.
> // (See HDFS-4643)
> GenericTestUtils.assertNoThreadsMatching(".*IPC Client.*");
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6368) TransferFsImage#receiveFile() should perform validation on fsImageName parameter

2014-05-15 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6368:


 Summary: TransferFsImage#receiveFile() should perform validation 
on fsImageName parameter
 Key: HDFS-6368
 URL: https://issues.apache.org/jira/browse/HDFS-6368
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Currently only null check is performed:
{code}
  if (fsImageName == null) {
throw new IOException("No filename header provided by server");
  }
  newLocalPaths.add(new File(localPath, fsImageName));
{code}
Value of fsImageName, obtained from HttpURLConnection header, may be tainted.
This may allow an attacker to access, modify, or test the existence of critical 
or sensitive files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6415) Missing null check in FSImageSerialization#writePermissionStatus()

2014-05-16 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6415:


 Summary: Missing null check in 
FSImageSerialization#writePermissionStatus()
 Key: HDFS-6415
 URL: https://issues.apache.org/jira/browse/HDFS-6415
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
PermissionStatus.write(out, inode.getUserName(), inode.getGroupName(), p);
{code}
getUserName() / getGroupName() may return null.
null check should be added for these two calls.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6437) TestBookKeeperHACheckpoints#TestStandbyCheckpoints fails in trunk

2014-05-20 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6437:


 Summary: TestBookKeeperHACheckpoints#TestStandbyCheckpoints fails 
in trunk
 Key: HDFS-6437
 URL: https://issues.apache.org/jira/browse/HDFS-6437
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


The following test failure can be reproduced locally:
{code}
testSBNCheckpoints(org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints)
  Time elapsed: 2.79 sec  <<< ERROR!
java.lang.NullPointerException: null
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints.testSBNCheckpoints(TestStandbyCheckpoints.java:138)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs

2014-06-03 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6481:


 Summary: DatanodeManager#getDatanodeStorageInfos() should check 
the length of storageIDs
 Key: HDFS-6481
 URL: https://issues.apache.org/jira/browse/HDFS-6481
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Ted Yu


Ian Brooks reported the following stack trace:
{code}
2014-06-03 13:05:03,915 WARN  [DataStreamer for file 
/user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200
 block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] 
hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException):
 0
at 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)
2014-06-03 13:05:48,489 ERROR [RpcServer.handler=22,port=16020] wal.FSHLog: 
syncer encountered error, will retry. txid=211
org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException):
 0
at 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2

[jira] [Created] (HDFS-6501) TestCrcCorruption#testCorruptionDuringWrt sometimes fails in trunk

2014-06-07 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6501:


 Summary: TestCrcCorruption#testCorruptionDuringWrt sometimes fails 
in trunk
 Key: HDFS-6501
 URL: https://issues.apache.org/jira/browse/HDFS-6501
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


>From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1767/ :
{code}
REGRESSION:  org.apache.hadoop.hdfs.TestCrcCorruption.testCorruptionDuringWrt

Error Message:
test timed out after 5 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 5 milliseconds
at java.lang.Object.wait(Native Method)
at 
org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2024)
at 
org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:2008)
at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2107)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:98)
at 
org.apache.hadoop.hdfs.TestCrcCorruption.testCorruptionDuringWrt(TestCrcCorruption.java:133)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6582) Missing null check in RpcProgramNfs3#read(XDR, SecurityHandler)

2014-06-20 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6582:


 Summary: Missing null check in RpcProgramNfs3#read(XDR, 
SecurityHandler)
 Key: HDFS-6582
 URL: https://issues.apache.org/jira/browse/HDFS-6582
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Around line 691:
{code}
FSDataInputStream fis = clientCache.getDfsInputStream(userName,
Nfs3Utils.getFileIdPath(handle));

try {
  readCount = fis.read(offset, readbuffer, 0, count);
{code}
fis may be null, leading to NullPointerException



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk

2014-06-22 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6586:


 Summary: TestBalancer#testExitZeroOnSuccess sometimes fails in 
trunk
 Key: HDFS-6586
 URL: https://issues.apache.org/jira/browse/HDFS-6586
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


>From 
>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/
> :
{code}
Stacktrace
java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to 
become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more than 
4 msec.
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550)
{code}




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6726) TestNamenodeCapacityReport fails intermittently

2014-07-22 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6726:


 Summary: TestNamenodeCapacityReport fails intermittently
 Key: HDFS-6726
 URL: https://issues.apache.org/jira/browse/HDFS-6726
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


>From 
>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1812/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestNamenodeCapacityReport/testXceiverCount/
> :
{code}
java.io.IOException: Unable to close file because the last block does not have 
enough number of replicas.
at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2141)
at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2109)
at 
org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
{code}
There were multiple occurrences of 'Broken pipe', 'Connection reset by peer' 
and 'Premature EOF from inputStream' exceptions in test output



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6810) storageMap is accessed without proper synchronization in DatanodeDescriptor#getStorageReports

2014-08-02 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6810:


 Summary: storageMap is accessed without proper synchronization in 
DatanodeDescriptor#getStorageReports
 Key: HDFS-6810
 URL: https://issues.apache.org/jira/browse/HDFS-6810
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Here is related code:
{code}
  public StorageReport[] getStorageReports() {
final StorageReport[] reports = new StorageReport[storageMap.size()];
{code}
Other methods use the following construct:
{code}
synchronized (storageMap) {
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6842) TestHttpFSFWithWebhdfsFileSystem fails in trunk

2014-08-12 Thread Ted Yu (JIRA)
Ted Yu created HDFS-6842:


 Summary: TestHttpFSFWithWebhdfsFileSystem fails in trunk
 Key: HDFS-6842
 URL: https://issues.apache.org/jira/browse/HDFS-6842
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu


This can be reproduced locally:
{code}
testOperationDoAs[21](org.apache.hadoop.fs.http.client.TestHttpFSFWithWebhdfsFileSystem)
  Time elapsed: 0.315 sec  <<< ERROR!
org.apache.hadoop.ipc.RemoteException: User: zy is not allowed to impersonate 
user1
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy24.mkdirs(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:512)
at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at com.sun.proxy.$Proxy25.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2546)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2517)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:821)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:817)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6842) TestHttpFSFWithWebhdfsFileSystem fails in trunk

2014-08-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HDFS-6842.
--

Resolution: Duplicate

Covered by HADOOP-10836

> TestHttpFSFWithWebhdfsFileSystem fails in trunk
> ---
>
> Key: HDFS-6842
> URL: https://issues.apache.org/jira/browse/HDFS-6842
> Project: Hadoop HDFS
>  Issue Type: Test
>    Reporter: Ted Yu
>
> This can be reproduced locally:
> {code}
> testOperationDoAs[21](org.apache.hadoop.fs.http.client.TestHttpFSFWithWebhdfsFileSystem)
>   Time elapsed: 0.315 sec  <<< ERROR!
> org.apache.hadoop.ipc.RemoteException: User: zy is not allowed to impersonate 
> user1
>   at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy24.mkdirs(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:512)
>   at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
>   at com.sun.proxy.$Proxy25.mkdirs(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2546)
>   at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2517)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:821)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:817)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >