[jira] [Created] (HADOOP-10858) Specify the charset explicitly rather than rely on the default

2014-07-18 Thread Liang Xie (JIRA)
Liang Xie created HADOOP-10858:
--

 Summary: Specify the charset explicitly rather than rely on the 
default
 Key: HADOOP-10858
 URL: https://issues.apache.org/jira/browse/HADOOP-10858
 Project: Hadoop Common
  Issue Type: Improvement
  Components: metrics
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie


Findbugs 2 warns about relying on the default Java charset instead of 
specifying it explicitly. Given that we're porting Hadoop to different 
platforms it's better to be explicit



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Build failed in Jenkins: Hadoop-Common-0.23-Build #1014

2014-07-18 Thread Apache Jenkins Server
See 

--
[...truncated 11334 lines...]
Running org.apache.hadoop.fs.TestGlobPattern
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.089 sec
Running org.apache.hadoop.fs.TestFcLocalFsUtil
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.581 sec
Running org.apache.hadoop.fs.TestLocalFileSystemPermission
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.528 sec
Running org.apache.hadoop.fs.TestDFVariations
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.129 sec
Running org.apache.hadoop.fs.permission.TestFsPermission
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.47 sec
Running org.apache.hadoop.fs.TestTrash
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.106 sec
Running org.apache.hadoop.fs.TestFileStatus
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.199 sec
Running org.apache.hadoop.fs.TestChecksumFileSystem
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.848 sec
Running org.apache.hadoop.fs.shell.TestPathExceptions
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.092 sec
Running org.apache.hadoop.fs.shell.TestCommandFactory
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.179 sec
Running org.apache.hadoop.fs.shell.TestPathData
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.772 sec
Running org.apache.hadoop.fs.shell.TestCopy
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.725 sec
Running org.apache.hadoop.fs.TestLocalFileSystem
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.947 sec
Running org.apache.hadoop.fs.TestFileContextDeleteOnExit
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.54 sec
Running org.apache.hadoop.fs.TestPath
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.803 sec
Running org.apache.hadoop.fs.TestLocalDirAllocator
Tests run: 30, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.077 sec
Running org.apache.hadoop.fs.TestFileSystemTokens
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.495 sec
Running org.apache.hadoop.fs.TestFileSystemCaching
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.88 sec
Running org.apache.hadoop.fs.TestFsShellCopy
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.364 sec
Running org.apache.hadoop.fs.TestListFiles
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.542 sec
Running org.apache.hadoop.fs.TestHardLink
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.257 sec
Running org.apache.hadoop.fs.TestLocalFSFileContextMainOperations
Tests run: 56, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.242 sec
Running org.apache.hadoop.fs.TestLocal_S3FileContextURI
Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.357 sec
Running org.apache.hadoop.fs.TestFsShellReturnCode
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.617 sec
Running org.apache.hadoop.fs.TestS3_LocalFileContextURI
Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.351 sec
Running org.apache.hadoop.fs.TestFileUtil
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.693 sec
Running org.apache.hadoop.fs.s3native.TestInMemoryNativeS3FileSystemContract
Tests run: 36, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.179 sec
Running org.apache.hadoop.fs.TestLocalFSFileContextSymlink
Tests run: 61, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 2.611 sec
Running org.apache.hadoop.fs.TestFsShell
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.281 sec
Running org.apache.hadoop.fs.TestBlockLocation
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.055 sec
Running org.apache.hadoop.fs.TestTruncatedInputBug
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.47 sec
Running org.apache.hadoop.fs.TestCommandFormat
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.109 sec
Running org.apache.hadoop.fs.TestLocalFSFileContextCreateMkdir
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.772 sec
Running org.apache.hadoop.fs.TestFSMainOperationsLocalFileSystem
Tests run: 49, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.677 sec
Running org.apache.hadoop.fs.s3.TestINode
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.055 sec
Running org.apache.hadoop.fs.s3.TestS3Credentials
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.128 sec
Running org.apache.hadoop.fs.s3.TestS3FileSystem
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.153 sec
Running org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract
Tests run: 29, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.977 sec
Running org.apache.hadoop.fs.TestHarFileSystem
Tests run: 3, Failures: 0, Errors: 0, Skippe

[jira] [Resolved] (HADOOP-3494) Improve S3FileSystem data integrity using MD5 checksums

2014-07-18 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White resolved HADOOP-3494.
---

Resolution: Invalid

> Improve S3FileSystem data integrity using MD5 checksums
> ---
>
> Key: HADOOP-3494
> URL: https://issues.apache.org/jira/browse/HADOOP-3494
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Tom White
>
> Make use of S3 MD5 checksums to verify writes and reads.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3495) Support legacy S3 buckets containing underscores

2014-07-18 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White resolved HADOOP-3495.
---

Resolution: Won't Fix

Resolving this since as Ken points out we should let S3 report the problem.

> Support legacy S3 buckets containing underscores
> 
>
> Key: HADOOP-3495
> URL: https://issues.apache.org/jira/browse/HADOOP-3495
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Tom White
>Priority: Minor
>
> For bucket names containing an underscore we fail with an exception, however 
> it should be possible to support them. See proposal in 
> https://issues.apache.org/jira/browse/HADOOP-930?focusedCommentId=12601991#action_12601991
>  by Chris K Wensel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10859) Native implementation of java Checksum interface

2014-07-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-10859:


 Summary: Native implementation of java Checksum interface
 Key: HADOOP-10859
 URL: https://issues.apache.org/jira/browse/HADOOP-10859
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


Some parts of our code such as IFileInputStream/IFileOutputStream use the java 
Checksum interface to calculate/verify checksums. Currently we don't have a 
native implementation of these. For CRC32C in particular, we can get a very big 
speedup with a native implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-8568) DNS#reverseDns fails on IPv6 addresses

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-8568.
--

Resolution: Duplicate

I'm closing this JIRA out in favor of HADOOP-3619 since it has a more modern 
patch associated with it.

> DNS#reverseDns fails on IPv6 addresses
> --
>
> Key: HADOOP-8568
> URL: https://issues.apache.org/jira/browse/HADOOP-8568
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Tony Kew
>  Labels: newbie
> Attachments: HADOOP-8568.patch
>
>
> DNS#reverseDns assumes hostIp is a v4 address (4 parts separated by dots), 
> blows up if given a v6 address:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
> at org.apache.hadoop.net.DNS.reverseDns(DNS.java:79)
> at org.apache.hadoop.net.DNS.getHosts(DNS.java:237)
> at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:340)
> at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:358)
> at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:337)
> at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:235)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1649)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3632) Fix speculative execution or allow premature stop

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3632.
--

Resolution: Incomplete

Spec exec has changed a lot. Closing as stale.

> Fix speculative execution or allow premature stop
> -
>
> Key: HADOOP-3632
> URL: https://issues.apache.org/jira/browse/HADOOP-3632
> Project: Hadoop Common
>  Issue Type: New Feature
>Affects Versions: 0.16.3
>Reporter: Severin Hacker
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I run 50 iterations of a program with 500 maps and no reduces. I have noticed 
> the following behaviour:
> In 50% of the iterations:
> 499 maps finish in 50 seconds
> 1 map finishes after 4 minutes
> Total time is 4 minutes.
> In 50% of the iterations:
> 500 maps finish in 50 seconds
> Total time is 50 seconds.
> It would be nice if I could tell hadoop to stop after 99% of the maps have 
> finished (and not wait for that last straggler). In my application it's 
> perfectly fine if I only get 99% of the results, as long as the straggler is 
> not always using the same data.
> Please fix!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3634) Create tests for Hadoop metrics

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3634.
--

Resolution: Fixed

> Create tests for Hadoop metrics
> ---
>
> Key: HADOOP-3634
> URL: https://issues.apache.org/jira/browse/HADOOP-3634
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: metrics
>Affects Versions: 0.19.0
>Reporter: Lohit Vijayarenu
>
> It would be good to have a test case for hadoop metrics. We could use 
> FileContext or derive something out of NullContext to check the values 
> returned via metrics are correct. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3629) Document the metrics produced by hadoop

2014-07-18 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA resolved HADOOP-3629.
---

Resolution: Duplicate

Already documented by HADOOP-6350. Closing this issue.

> Document the metrics produced by hadoop
> ---
>
> Key: HADOOP-3629
> URL: https://issues.apache.org/jira/browse/HADOOP-3629
> Project: Hadoop Common
>  Issue Type: Task
>  Components: documentation, metrics
>Reporter: Rob Weltman
>  Labels: newbie
>
> This information is needed in order to collect, monitor, and report on hadoop 
> metrics.
> Subject: Re: [Fwd: Specification of hadoop metrics?]
> Date: Mon, 23 Jun 2008 14:36:04 -0700
> From: "Owen O'Malley" 
> To: "Rob Weltman" 
> On Jun 23, 2008, at 12:40 PM, Rob Weltman wrote:
> >   Is there a JIRA, forrest, or wiki document that defines all the
> > metrics produced by Hadoop (DFS and MR) for each release? If not,
> > should there be?
> I don't know of any documentation of the exported metrics. There
> probably should be forrest documentation of the metrics, but it
> probably makes sense to do.
> -- Owen



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3890) distcp: PathFilter for source files

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3890.
--

Resolution: Incomplete

Closing since distcpv1 has been replaced!

> distcp: PathFilter for source files
> ---
>
> Key: HADOOP-3890
> URL: https://issues.apache.org/jira/browse/HADOOP-3890
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: util
>Reporter: Koji Noguchi
>Priority: Minor
> Attachments: H-3890-1.patch
>
>
> I'd like distcp to be able to skip _logs/_temporary directories and files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3753) metrics: FileContext support overwrite mode

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3753.
--

Resolution: Incomplete

Closing as stale.

> metrics: FileContext support overwrite mode
> ---
>
> Key: HADOOP-3753
> URL: https://issues.apache.org/jira/browse/HADOOP-3753
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Craig Macdonald
>Priority: Minor
> Attachments: HADOOP-3753.v1.patch
>
>
> FileContext currently continually appends to the metrics log file(s), 
> generating an ever lengthening file.
> In some scenarios, it would be useful to simply write the current statistics 
> to the file once every period, then overwrite the file for the next period.
> For instance, this could be useful if an external application parsed the 
> metrics output - e.g. Cacti to create realtime graphs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3834) Checkin the design document for HDFS appends into source control repository

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3834.
--

Resolution: Won't Fix

> Checkin the design document for HDFS appends into source control repository
> ---
>
> Key: HADOOP-3834
> URL: https://issues.apache.org/jira/browse/HADOOP-3834
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Reporter: dhruba borthakur
>
> The design document for HDFS needs to be converted into forrest and checked 
> into repository.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3838) Add hdfs random-reading tests to gridmix

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3838.
--

Resolution: Incomplete

gridmix? who uses that anymore?

> Add hdfs random-reading tests to gridmix
> 
>
> Key: HADOOP-3838
> URL: https://issues.apache.org/jira/browse/HADOOP-3838
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: stack
> Attachments: gridmix-randread_v1.patch
>
>
> Gridmix needs a nice-little random read test so we can track how hdfs is 
> improving over time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3839) hadoop should handle no cygwin on windows more gracefully

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3839.
--

Resolution: Won't Fix

With Microsoft's help, Hadoop now uses batch files on Windows. Won't fix!

> hadoop should handle no cygwin on windows more gracefully
> -
>
> Key: HADOOP-3839
> URL: https://issues.apache.org/jira/browse/HADOOP-3839
> Project: Hadoop Common
>  Issue Type: Improvement
> Environment: Windows XP, cygwin not installed
>Reporter: Steve Loughran
>Priority: Minor
>
> There have been a couple of postings to hadoop core-user in which people 
> can't get hdfs to come up on windows because whoami isnt on the path, which 
> fails with an IOException error code 2. 
> To people not experienced in DOS error codes, this is a fairly meanless 
> number which invariably leads to time wasted and support emails.
> 1. the error could be caught and handled by printing some better hints (point 
> to a wiki page?)
> 2. is whoami really needed on DOS-based filesystems?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3862) Create network topology plugin that uses a configured rack ip mask

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3862.
--

Resolution: Duplicate

Closing this as a dupe of HADOOP-3625, which actually has a patch on it!

> Create network topology plugin that uses a configured rack ip mask
> --
>
> Key: HADOOP-3862
> URL: https://issues.apache.org/jira/browse/HADOOP-3862
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: net
>Reporter: Owen O'Malley
>
> We should have a Java class that answers network topology questions by 
> implementing DNSToSwitchMapping by using the host's IP address and a mask 
> that defines the addresses that are on the same rack. Therefore, if your 
> racks are defined by using the top 3 octets, you'd configure ff.ff.ff.00.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3876) Hadoop Core should support source filesfor multiple schedulers

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3876.
--

Resolution: Fixed

Closing this, as Hadoop supports multiple schedulers now.

> Hadoop Core should support source filesfor multiple schedulers
> --
>
> Key: HADOOP-3876
> URL: https://issues.apache.org/jira/browse/HADOOP-3876
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Vivek Ratan
>
> Besides the default JT scheduling algorithm, there is work going on with at 
> least two more schedulers (HADOOP-3445, HADOOP-3746). HADOOP-3412 makes it 
> easier to plug in new schedulers into the JT. Where do we place the source 
> files for various schedulers so that it's easy for users to choose their 
> scheduler of choice during deployment, and easy for developers to add in more 
> schedulers into the framework (without inundating it). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3881) IPC client doesnt time out if far end handler hangs

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3881.
--

Resolution: Incomplete

Closing this as stale.

> IPC client doesnt time out if far end handler hangs
> ---
>
> Key: HADOOP-3881
> URL: https://issues.apache.org/jira/browse/HADOOP-3881
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 1.0.0
>Reporter: Steve Loughran
>Priority: Minor
>
> This is what appears to be happening in some changes of mine that 
> (inadventently) blocked JobTracker: if the client can connect to the far end 
> and invoke an operation, the far end has forever to deal with the request: 
> the client blocks too.
> Clearly the far end shouldn't do this; its a serious problem to address. but 
> should the client hang? Should it not time out after some specifiable time 
> and signal that the far end isn't processing requests in a timely manner? 
> (marked as minor as this shouldn't arise in day to day operation. but it 
> should be easy to create a mock object to simulate this, and timeouts are 
> considered useful in an IPC)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3902) Ordering of the output statistics in the report page (jobtracker-details for a job)

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3902.
--

Resolution: Incomplete

Closing this as stale.

> Ordering of the output statistics in the report page (jobtracker-details for 
> a job)
> ---
>
> Key: HADOOP-3902
> URL: https://issues.apache.org/jira/browse/HADOOP-3902
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.18.0
> Environment: Fedora Core X86 (amazon ec2), JDK 1.6
>Reporter: Flo Leibert
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The ordering of the job statistics in the jobdetails.jsp seems very 
> unintuitive - and not in sync with previous versions. 
> It seems as if the rows should be ordered by their respective function (maps, 
> combines, reduces). 
> Example: 
> Map-Reduce Framework  
> Reduce input groups   0   1,936   1,936
> Combine output records0   0   0
> Map input records 41,580,847  0   41,580,847
> Reduce output records 0   664,803,173 664,803,173
> Map output bytes  988,918,560 0   988,918,560
> Map input bytes   1,100,931,203   0   1,100,931,203
> Map output records41,580,847  0   41,580,847
> Combine input records 0   0   0
> Reduce input records  0   41,580,847  41,580,847



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3915) reducers hang, jobtracker loosing completely track of them.

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3915.
--

Resolution: Incomplete

Closing this as stale.

> reducers hang, jobtracker loosing completely track of them.
> ---
>
> Key: HADOOP-3915
> URL: https://issues.apache.org/jira/browse/HADOOP-3915
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 0.17.1
> Environment: EC2, Debian Etch  (but not the ec2-contrib stuff)
> streaming.jar
>Reporter: Andreas Kostyrka
> Attachments: 
> hadoop-hadoop-jobtracker-ec2-67-202-58-97.compute-1.amazonaws.com.log
>
>
> I just noticed the following curious situation:
> -) 18 of 22 reducers are waiting for 3 hours or so with 0.01MB/s and no 
> progress.
> -) hadoop job -kill-task does not work on the ids shown
> -) killing all reduce work tasks (the spawned Python processes, not java 
> TaskTracker$Child) gets completely ignored by the JobTracker, the jobtracker 
> shows them still as running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3969) Provide Mechanism to optionally expose public org.apache.hadoop.util.Services APIs

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3969.
--

Resolution: Incomplete

Having REST interfaces for everything would be great, but this JIRA isn't the 
place to do it at this point.

> Provide Mechanism to optionally expose public org.apache.hadoop.util.Services 
> APIs 
> ---
>
> Key: HADOOP-3969
> URL: https://issues.apache.org/jira/browse/HADOOP-3969
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Pete Wyckoff
>
> Enhance manageability of Hadoop Services by providing Jute, Thrift, REST, ... 
> APIs to select methods in the Service class (or potentially others).
> This will allow external tools written in a myriad of languages to query the 
> state of Hadoop Servers and/or interact with them.
> This can be encapsulated in the Service class by defining a very simple 
> interface and then optionally instantiating such an implementation provided 
> in the Configuration.
> Some methods to be implemented include all the public methods in Service:
> {code}
> ping()
> isRunning()
> terminate()
> getServiceState()
> verifyServiceState()
> isTerminated(),
> {code}
> INTERFACE:
> {code}
> package org.apache.hadoop.util;
> public interface ExposeServiceAPIs {
>/**
>* @param service - the service whose APIs are to be exposed
>   * @param  serviceName - a symbolic name for the service
>   * @param configuration - the hadoop configuration object
>  **/
>   public void initialize(Service service, String serviceName, Configuration 
> conf) throws IOException,
>   public boolean start();
>   public boolean stop();
> } ;
> {code}
> Two straightforward implementations of this would be:
> 1. Servlet that exposes the APIs via REST
> 2. Thrift DDL of the service APIs and an implementation in Java + bindings in 
> C++, Java, Perl, Python, Ruby, PHP



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3999) Dynamic host configuration system (via node side plugins)

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-3999.
--

Resolution: Incomplete

Closing this as stale.  Much of this functionality has since been added to YARN 
and HDFS.  Holes are slowly being closed! 

> Dynamic host configuration system (via node side plugins)
> -
>
> Key: HADOOP-3999
> URL: https://issues.apache.org/jira/browse/HADOOP-3999
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: benchmarks, conf, metrics
> Environment: Any
>Reporter: Kai Mosebach
> Attachments: cloud_divide.jpg
>
>
> The MapReduce paradigma is limited to run MapReduce jobs with the lowest 
> common factor of all nodes in the cluster.
> On the one hand this is wanted (cloud computing, throw simple jobs in, 
> nevermind who does it)
> On the other hand this is limiting the possibilities quite a lot, for 
> instance if you had data which could/needs to be fed to a 3rd party interface 
> like Mathlab, R, BioConductor you could solve a lot more jobs via hadoop.
> Furthermore it could be interesting to know about the OS, the architecture, 
> the performance of the node in relation to the rest of the cluster. 
> (Performance ranking)
> i.e. if i'd know about a sub cluster of very computing performant nodes or a 
> sub cluster of very fast disk-io nodes, the job tracker could select these 
> nodes regarding a so called job profile (i.e. my job is a heavy computing job 
> / heavy disk-io job), which can usually be estimated by a developer before.
> To achieve this, node capabilities could be introduced and stored in the DFS, 
> giving you
> a1.) basic information about each node (OS, ARCH)
> a2.) more sophisticated infos (additional software, path to software, 
> version). 
> a3.) PKI collected about the node (disc-io, cpu power, memory)
> a4.) network throughput to neighbor hosts, which might allow generating a 
> network performance map over the cluster
> This would allow you to
> b1.) generate jobs that have a profile (computing intensive, disk io 
> intensive, net io intensive)
> b2.) generate jobs that have software dependencies (run on Linux only, run on 
> nodes with MathLab only)
> b3.) generate a performance map of the cluster (sub clusters of fast disk 
> nodes, sub clusters of fast CPU nodes, network-speed-relation-map between 
> nodes)
> From step b3) you could then even acquire statistical information which could 
> again be fed into the DFS Namenode to see if we could store data on fast disk 
> subclusters only (that might need to be a tool outside of hadoop core though)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10862) Miscellaneous trivial corrections to KMS classes

2014-07-18 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HADOOP-10862:
---

 Summary: Miscellaneous trivial corrections to KMS classes
 Key: HADOOP-10862
 URL: https://issues.apache.org/jira/browse/HADOOP-10862
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh


{{KMSRESTConstants.java}}, {{KEY_OP}} should be {{KEYS}} and value should be 
{{keys}}.

{{KMS.java}} should be annotated with Jersey {{@Singleton}} to avoid creating 
an instance on every request, it is thread safe already.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10863) KMS should have a blacklist for decrypting EEKs

2014-07-18 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HADOOP-10863:
---

 Summary: KMS should have a blacklist for decrypting EEKs
 Key: HADOOP-10863
 URL: https://issues.apache.org/jira/browse/HADOOP-10863
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh


In particular, we'll need to put HDFS admin user there by default to prevent an 
HDFS admin from getting file encryption keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4047) Metrics for connection cleanup by the ipc/Server: cleanupConnections()

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4047.
--

Resolution: Fixed

I think this has been implemented.  I'm sure Koji will re-open if not. :D

> Metrics for connection cleanup by the ipc/Server: cleanupConnections()
> --
>
> Key: HADOOP-4047
> URL: https://issues.apache.org/jira/browse/HADOOP-4047
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc, metrics
>Reporter: Koji Noguchi
>Priority: Minor
>
> Request for metrics that shows the number of idle ipc connections closed from 
> the Server side.
> This metrics would have helped when debugging HADOOP-4040.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4034) Redundant deprecation warnings in hadoop logs

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4034.
--

Resolution: Fixed

> Redundant deprecation warnings in hadoop logs
> -
>
> Key: HADOOP-4034
> URL: https://issues.apache.org/jira/browse/HADOOP-4034
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 0.18.0
>Reporter: Yoram Kulbak
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Warnings in the form of 
> "org.apache.hadoop.fs.FileSystem - "localhost:57367" is a deprecated 
> filesystem name. Use "hdfs://localhost:57367/" instead."
> are frequently emitted into the hadoop log. The problem is that the frequency 
> of these warnings floods the logs and makes it difficult to discover real 
> issues.
> A short investigation reveals that while FileSystem.getFileSysName(URI) 
> returns the file system name without the hdfs:// scheme part, the method 
> FileSystem.fixName(String) complains about it and appends the hdfs:// back. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4048) ipc.Client: Log when Server side closes the socket while request is still pending

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4048.
--

Resolution: Fixed

> ipc.Client:  Log when Server side closes the socket while request is still 
> pending
> --
>
> Key: HADOOP-4048
> URL: https://issues.apache.org/jira/browse/HADOOP-4048
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Koji Noguchi
>Priority: Minor
>
> ipc/Client.java
> {noformat}
> 316   } catch (EOFException eof) {
> 317 // This is what happens when the remote side goes down
> 318   } 
> {noformat}
> Request to log  when Server side closes the socket while some requests are 
> still pending.
> This would have helped when debugging HADOOP-4040.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4055) clean up UGI to support savetoConf

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4055.
--

Resolution: Incomplete

We should really burn UGI to the ground.  But that's a different jira.

Closing this as stale.

> clean up UGI to support savetoConf
> --
>
> Key: HADOOP-4055
> URL: https://issues.apache.org/jira/browse/HADOOP-4055
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Reporter: Owen O'Malley
>
> Currently clients have to use UnixUserGroupInfo instead of UserGroupInfo 
> because saveToConf is only defined in UnixUGI. We should add the abstract 
> saveToConf in UGI.
> {code}
> UserGroupInfo:
>   public abstract void saveToConf(Configuration conf);
> {code}
> and the matching body in UnixUGI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4059) Add tests that try starting the various hadoop command line scripts

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4059.
--

Resolution: Duplicate

marking this as a duplicate of one of the followups to HADOOP-9902!

> Add tests that try starting the various hadoop command line scripts
> ---
>
> Key: HADOOP-4059
> URL: https://issues.apache.org/jira/browse/HADOOP-4059
> Project: Hadoop Common
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.19.0
>Reporter: Steve Loughran
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Hadoop has lots of tests that start clusters, but I don't see any that test 
> the command line scripts working. Which means that changes to the scripts and 
> the code behind them may not get picked up rapidly, and regressions won't get 
> picked up by Hudson and apportioned to the specific patches.
> Propose:
> * an abstract test case that can exec scripts on startup; wait for them to 
> finish, kill them if they take too long
> * test cases for every service (namenode, tasktracker, datanode, etc)
> * tests that try invalid commands, -help options
> * tests that start the services, assert that they are live, and shut them down



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-3983) compile-c++ should honor the jvm size in compiling the c++ code

2014-07-18 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HADOOP-3983.
--

   Resolution: Fixed
Fix Version/s: 2.0.0-alpha

In the CMake build, we now use JVM_ARCH_DATA_MODEL to determine whether to 
build 32-bit or 64-bit libraries, so no, this isn't an issue any more.

> compile-c++ should honor the jvm size in compiling the c++ code
> ---
>
> Key: HADOOP-3983
> URL: https://issues.apache.org/jira/browse/HADOOP-3983
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Owen O'Malley
>  Labels: newbie
> Fix For: 2.0.0-alpha
>
>
> The build scripts for compile-c++ and compile-c++ -examples should honor the 
> word size of the jvm, since it is in the platform name. Currently, the 
> platform names are "Linux-amd64-64" or "Linux-i386-32", but the C++ is always 
> compiled in the platform default size.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4165) test-patch script is showing questionable number of Javac warnings

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4165.
--

Resolution: Fixed

> test-patch script is showing questionable number of Javac warnings
> --
>
> Key: HADOOP-4165
> URL: https://issues.apache.org/jira/browse/HADOOP-4165
> Project: Hadoop Common
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.18.0
>Reporter: Ramya Sunil
>Priority: Minor
>
> test-patch is recording 881 number of javac warnings when run on trunk and 
> 216 javac warnings when run on "ANY" patch.
> This behavior was observed even on an empty patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4196) Possible performance enhancement in Hadoop compress module

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4196.
--

Resolution: Incomplete

I believe the compression code has changed quite a bit since this was filed.  
Closing as stale.

> Possible performance enhancement in Hadoop compress module
> --
>
> Key: HADOOP-4196
> URL: https://issues.apache.org/jira/browse/HADOOP-4196
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.18.0
>Reporter: Hong Tang
>
> There are several less performant implementation issues with the current 
> Hadoop compression module. Generally, the opportunities all come from the 
> fact that the granuarities of I/O operations from the CompressionStream and 
> DecompressionStream are not controllable by the users, and thus users are 
> forced to attach BufferedInputStream or BufferedOutputStream to both ends of 
> the CompressionStream and DecompressionStream:
> - ZlibCompressor: always returns false from needInput() after setInput(), and 
> thus lead to a native call deflateBytesDirect() for almost every write() 
> operation from CompressorStream(). This becomes problematic when applications 
> call write() on the CompressorStream with small write sizes (e.g. one byte at 
> a time). It is better to follow similar code path in LzoCompressor and append 
> to internal uncompressed data buffer.
> - CompressorStream: whenever the compressor produces some compressed data, it 
> will directly issue write() calls to the down stream. Could be improved by 
> keep appending to the byte[] until it is full (or half full) before writing 
> to the down stream. Otherwise, applications have to use a 
> BufferedOutputStream as the down stream in case the output sizes from 
> CompressorStream is too small. This generally causes double buffering.
> - BlockCompressorStream: similar issue as described above.
> - BlockDecompressorStream: getCompressedData() reads only one compressed 
> chunk at a time. Could be better to read a full buffer, and then obtain 
> compressed chunk from buffer (similar to DecompressStream is doing, but 
> admittedly a bit more complicated).
> In generally, the following could be some guideline of 
> Compressor/Decompressor and CompressorStream/DecompressorStream 
> design/implementation that can give users some performance guarantee:
> - Compressor and Decompressor keep two DirectByteBuffer, the size of which 
> should be tuned to be optimal with regard to the specific 
> compression/decompression algorithm. Ensure always call Compressor.compress() 
> will a full (or near full) uncompressed data DirectBuffer.
> - CompressorStream and DecompressorStream maintains a byte[] to read data 
> from the down stream. The size of the byte[] should be user customizable (add 
> a bufferSize parameter to CompressionCodec's createInputStream and 
> createOutputStream interface). Ensure that I/O from the down stream at or 
> near the granularity of the size of the byte[]. So applications can simply 
> rely on the buffering inside CompressorStream and DecompressorStream (for the 
> case of LZO: BlockCompressorStream and BlockDecompressorStream).
> A more radical change would be to let the downward InputStream to directly 
> deposit data to a ByteBuffer or the downard OutputStream to accept input data 
> from ByteBuffer. We may call it ByteBufferInputStream and 
> ByteBufferOutputStream. The CompressorStream and DecompressorStream may 
> simply test whether the down stream indeed implements such interfaces and 
> bypass its own byte[] buffer if true.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4201) [patch] native library build script uses unportable sed(1) regexp's

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4201.
--

Resolution: Fixed

> [patch] native library build script uses unportable sed(1) regexp's
> ---
>
> Key: HADOOP-4201
> URL: https://issues.apache.org/jira/browse/HADOOP-4201
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.18.0
> Environment: FreeBSD 7.0-STABLE amd64
>Reporter: Ruslan Ermilov
> Attachments: HADOOP-4201.patch, hadoop-0.18.0.patch
>
>
> The native library build script uses unportable sed(1) regular expressions, 
> making it impossible to compile a library using non-GNU sed(1).
> In particular, any POSIX-conformant sed(1) implementation will fail. The 
> following patch has been tested to work on both Linux (Ubuntu) and FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4208) Shouldn't delete the system dir after a JobTracker recovery

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4208.
--

Resolution: Won't Fix

JT recovery is just plain broken and at this point won't get fixed

> Shouldn't delete the system dir after a JobTracker recovery
> ---
>
> Key: HADOOP-4208
> URL: https://issues.apache.org/jira/browse/HADOOP-4208
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Amar Kamat
>
> Debugging JobTracker crashes will be easier if the files are preserved rather 
> than deleted on recovery, probably in a sub-directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4215) test-patch should have a mode where developers can turn on running core and contrib tests.

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4215.
--

Resolution: Not a Problem

> test-patch should have a mode where developers can turn on running core and 
> contrib tests.
> --
>
> Key: HADOOP-4215
> URL: https://issues.apache.org/jira/browse/HADOOP-4215
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: test
>Reporter: Hemanth Yamijala
>
> There are scenarios, such as on a feature freeze date, *smile*, when 
> developers rely on test-patch, rather than hudson to ensure their patches are 
> not causing regressions. For tests though, developers still have to run core 
> and contrib tests and grep on the output to detect failures, which can lead 
> to human errors. Having a mode where test-patch can optionally be asked to 
> run the core and contrib tests also and report on the consolidated status 
> would greatly help.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4243) Serialization framework use SequenceFile/TFile/Other metadata to instantiate deserializer

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4243.
--

Resolution: Duplicate

> Serialization framework use SequenceFile/TFile/Other metadata to instantiate 
> deserializer
> -
>
> Key: HADOOP-4243
> URL: https://issues.apache.org/jira/browse/HADOOP-4243
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: contrib/serialization
>Reporter: Pete Wyckoff
>
> SequenceFile metadata is useful for storing additional information about the 
> serialized data, for example, for RecordIO, whether the data is CSV or 
> Binary.  For thrift, the same thing - Binary, JSON, ...
> For Hive, this may be especially important, because it has a Dynamic generic 
> serializer/deserializer that takes its DDL at runtime (as opposed to RecordIO 
> and Thrift which require pre-compilation into a specific class whose name can 
> be stored in the sequence file key or value class).   In this case, the class 
> name is like Record.java in RecordIO - it doesn't tell you anything without 
> the DDL.
> One way to address this could be adding the sequence file metadata to the 
> getDeserializer call in Serialization interface.  The api would then be 
> something like getDeserializer(Class, Map metadata) or 
> Properties metadata.
> But, I am open to proposals.
> This also means that saying a class implements Writable is not enough to 
> necessarily deserialize it since it may do specific actions based on the 
> metadata - e.g., RecordIO might determine whether to use CSV rather than the 
> default Binary deserialization.
> There's the other issue of the getSerializer returning the metadata to be 
> written to the Sequence/T File.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4270) hive cli does not pass hadoop location correctly and does not work in correctly in local mode

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4270.
--

Resolution: Not a Problem

Remember when Hive was a sub-project? Yeah, me neither.

> hive cli does not pass hadoop location correctly and does not work in 
> correctly in local mode
> -
>
> Key: HADOOP-4270
> URL: https://issues.apache.org/jira/browse/HADOOP-4270
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> the hadoop shell script location has to be set in conf variable 
> hadoop.bin.path. this is being passed in by hive shell script while invoking 
> the hive cli - but incorrect switch is being used as a result of which it is 
> not available when a subprocess is spawned for map-reduce.
> only affect local mode since for normal mode we don't use sub-processes while 
> spawning jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4264) DFSIO is failing on 500 nodes cluster

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4264.
--

Resolution: Fixed

> DFSIO is failing on 500 nodes cluster
> -
>
> Key: HADOOP-4264
> URL: https://issues.apache.org/jira/browse/HADOOP-4264
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io, test
>Affects Versions: 0.19.0
>Reporter: Suman Sehgal
>
> On executing following command : 
> bin/hadoop jar ~/hadoop/hadoop-0.19.0-test.jar TestDFSIO -write -nrFiles 990 
> -fileSize 320 
> This error occurs:
> 08/09/24 06:15:03 INFO mapred.JobClient:  map 98% reduce 32%
> java.io.IOException: Job failed!
>   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1201)
>   at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:236)
>   at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:218)
>   at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:354)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>   at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>   at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>   at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>   at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> On looking at hadoop logs, It seems that file names are clashing
> 2008-09-24 06:21:41,618 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'attempt_200809240600_0005_m_000802_2_136048515' from 
> 'tracker_/client x.x.x.x:x'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
> 'attempt_200809240600_0005_m_000802_4_136048515' to tip 
> task_200809240600_0005_m_000802, for tracker 'tracker_/client 
> x.x.x.x:x'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobInProgress: Choosing 
> rack-local task task_200809240600_0005_m_000802
> 2008-09-24 06:21:41,724 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
> from attempt_200809240600_0005_m_000900_2_136048515: 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
> create file /benchmarks/TestDFSIO/io_data/test_io_20 for 
> DFSClient_attempt_200809240600_0005_m_000900_2_136048515 on client client 
> x.x.x.x, because this file is already being created by 
> DFSClient_attempt_200809240600_0005_m_000900_0_136048515 on client x.x.x.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4312) Checksum error during execution of unit tests on linux environment.

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4312.
--

Resolution: Fixed

> Checksum error during execution of  unit tests on linux environment.
> 
>
> Key: HADOOP-4312
> URL: https://issues.apache.org/jira/browse/HADOOP-4312
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.18.1
>Reporter: Suman Sehgal
>
> Following unit tests are failing for 0.18.1_H4277_H4271
> org.apache.hadoop.fs.TestLocalFileSystem.testAppend
> Error Message
> Checksum error: 
> /xx/workspace/hadoop-0.18.1_H4277_H4271/build/test/data/append/f at 0
> Stacktrace
> org.apache.hadoop.fs.ChecksumException: Checksum error: 
> /xx/workspace/hadoop-0.18.1_H4277_H4271/build/test/data/append/f at 0
>   at 
> org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
>   at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:242)
>   at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190)
>   at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
>   at java.io.DataInputStream.read(DataInputStream.java:132)
>   at 
> org.apache.hadoop.fs.TestLocalFileSystem.readFile(TestLocalFileSystem.java:43)
>   at 
> org.apache.hadoop.fs.TestLocalFileSystem.testAppend(TestLocalFileSystem.java:173)
> org.apache.hadoop.dfs.TestDFSShell.testPut
> Error Message
> Checksum error: /xx/workspace/hadoop-0.18.1_H4277_H4271/build/test/data/f2 at 
> 0 
> Stacktrace
> java.io.IOException: Checksum error: 
> /xx/workspace/hadoop-0.18.1_H4277_H4271/build/test/data/f2 at 0
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:187)
>   at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1173)
>   at org.apache.hadoop.dfs.TestDFSShell.testPut(TestDFSShell.java:254)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4334) ObjectFile on top of TFile

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4334.
--

Resolution: Fixed

I think TFile is mostly deprecated at this point. Closing as won't fix.

> ObjectFile on top of TFile
> --
>
> Key: HADOOP-4334
> URL: https://issues.apache.org/jira/browse/HADOOP-4334
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Amir Youssefi
>Assignee: Amir Youssefi
> Attachments: ObjectFile_1.patch
>
>
> Problem: 
> We need to have Object (Serialization/Deserialization)  support for TFile. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4322) Input/Output Format for TFile

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4322.
--

Resolution: Won't Fix

I think TFile is mostly deprecated at this point. Closing as won't fix.

> Input/Output Format for TFile
> -
>
> Key: HADOOP-4322
> URL: https://issues.apache.org/jira/browse/HADOOP-4322
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Amir Youssefi
>Assignee: Amir Youssefi
> Attachments: ObjectFileInputOutputFormat_1.patch
>
>
> Input/Output Format for TFile



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4352) a job stays in running state forever, even though all the tasks completed a long time ago

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4352.
--

Resolution: Cannot Reproduce

> a job stays in running state forever, even though all the tasks completed a 
> long time ago
> -
>
> Key: HADOOP-4352
> URL: https://issues.apache.org/jira/browse/HADOOP-4352
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 0.17.2
>Reporter: Runping Qi
> Attachments: jobtracker_jstatck_trace.out
>
>
> I encountered a job  that stays in running state forever, even though all the 
> tasks completed a long time ago.
> The last lines in the job tracker log complain that it cannot connect to the 
> namenode of the dfs, although the dfs namenode works fine at present time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4357) Rename all the FSXxx classes to FsXxx

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4357.
--

Resolution: Incomplete

> Rename all the FSXxx classes to FsXxx
> -
>
> Key: HADOOP-4357
> URL: https://issues.apache.org/jira/browse/HADOOP-4357
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>
> There are two naming conventions in Hadoop, FSXxx and FsXxx.  We should 
> rename all the FSXxx classes to FsXxx.  See also
> http://issues.apache.org/jira/browse/HADOOP-4044?focusedCommentId=12637296#action_12637296



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4605) should run old version of unit tests to check back-compatibility

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4605.
--

Resolution: Not a Problem

I'm going to close this as Not a Problem.  With API classification, new 
behavior is old behavior for Stable and unit tests shouldn't to change.

> should run old version of unit tests to check back-compatibility
> 
>
> Key: HADOOP-4605
> URL: https://issues.apache.org/jira/browse/HADOOP-4605
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: test
>Reporter: Doug Cutting
>
> We should test back-compatibility by running unit tests from a prior release.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4611) Documentation for Tool interface is a bit busted

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4611.
--

Resolution: Not a Problem

Tool is still broken, but for different reasons. I'll file a different jira for 
that.

> Documentation for Tool interface is a bit busted
> 
>
> Key: HADOOP-4611
> URL: https://issues.apache.org/jira/browse/HADOOP-4611
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation, util
>Reporter: Jeff Hammerbacher
>Priority: Minor
>  Labels: newbie
>
> The documentation for the Tool interface will not work out of the box. It 
> seems to have taken the Sort() implementation in examples, but has ripped out 
> some important information.
> 1) args[1] and args[2] should probably be args[0] and args[1], as most 
> MapReduce tasks don't take the first argument that examples.jar takes
> 2) int run() needs to actually return an int
> 3) JobConf.setInputPath() and JobConf.setOutputPath() are deprecated.
> 4) the call to ToolRunner.run() in main() should take "new MyApp()" instead 
> of "Sort()" as an argument
> More generally, a working implementation of Tool in the docs would be handy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10864) Tool documentenation is broken

2014-07-18 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-10864:
-

 Summary: Tool documentenation is broken
 Key: HADOOP-10864
 URL: https://issues.apache.org/jira/browse/HADOOP-10864
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Reporter: Allen Wittenauer
Priority: Minor


Looking at 
http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/util/Tool.html, at 
least one of the links is non-existent.  There are likely other bugs in this 
documentation too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-4637) Unhandled failures starting jobs with S3 as backing store

2014-07-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-4637.
--

Resolution: Incomplete

Closing this as a stale issue.

> Unhandled failures starting jobs with S3 as backing store
> -
>
> Key: HADOOP-4637
> URL: https://issues.apache.org/jira/browse/HADOOP-4637
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 0.18.1
>Reporter: Robert
>
> I run Hadoop 0.18.1 on Amazon EC2, with S3 as the backing store.
> When starting jobs, I sometimes get the following failure, which causes the 
> job to be abandoned:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveBlock(Jets3tFileSystemStore.java:222)
>   at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>   at $Proxy4.retrieveBlock(Unknown Source)
>   at 
> org.apache.hadoop.fs.s3.S3InputStream.blockSeekTo(S3InputStream.java:160)
>   at org.apache.hadoop.fs.s3.S3InputStream.read(S3InputStream.java:119)
>   at java.io.DataInputStream.read(DataInputStream.java:83)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:214)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:150)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1212)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1193)
>   at org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:177)
>   at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783)
>   at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>   at org.apache.hadoop.ipc.Client.call(Client.java:715)
>   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>   at org.apache.hadoop.mapred.$Proxy5.submitJob(Unknown Source)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
>   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> The stack trace suggests that copying the job file fails, because the HDFS S3 
> filesystem can't find all of the expected block objects when it needs them.
> Since S3 is an "eventually consistent" kind of a filesystem, and does not 
> always provide an up-to-date view of the stored data, this execution path 
> probably should be strengthened - at least to retry these failed operations, 
> or wait for the expected block file if it hasn't shown up yet. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)