[jira] [Commented] (HADOOP-12975) Add jitter to CachingGetSpaceUsed's thread

2016-06-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348567#comment-15348567
 ] 

Colin Patrick McCabe commented on HADOOP-12975:
---

Thanks for the heads up, [~vinayrpet].  I fixed the accidental revert of 
HADOOP-13072.

> Add jitter to CachingGetSpaceUsed's thread
> --
>
> Key: HADOOP-12975
> URL: https://issues.apache.org/jira/browse/HADOOP-12975
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.8.0
>
> Attachments: HADOOP-12975v0.patch, HADOOP-12975v1.patch, 
> HADOOP-12975v2.patch, HADOOP-12975v3.patch, HADOOP-12975v4.patch, 
> HADOOP-12975v5.patch, HADOOP-12975v6.patch
>
>
> Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike. We should add some 
> jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13305) Define common statistics names across schemes

2016-06-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345784#comment-15345784
 ] 

Colin Patrick McCabe commented on HADOOP-13305:
---

Great idea, [~liuml07].

Should some of these variables be {{final}}?

> Define common statistics names across schemes
> -
>
> Key: HADOOP-13305
> URL: https://issues.apache.org/jira/browse/HADOOP-13305
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HADOOP-13305.000.patch
>
>
> The {{StorageStatistics}} provides a pretty general interface, i.e. 
> {{getLong(name)}} and {{getLongStatistics()}}. There is no shared or standard 
> names for the storage statistics and thus the getLong(name) is up to the 
> implementation of storage statistics. The problems:
> # For the common statistics, downstream applications expect the same 
> statistics name across different storage statistics and/or file system 
> schemes. Chances are they have to use 
> {{DFSOpsCountStorageStatistics#getLong(“getStatus”)}} and 
> {{S3A.Statistics#getLong(“get_status”)}} for retrieving the getStatus 
> operation stat.
> # Moreover, probing per-operation stats is hard if there is no 
> standard/shared common names.
> It makes a lot of sense for different schemes to issue the per-operation 
> stats of the same name. Meanwhile, every FS will have its own internal things 
> to count, which can't be centrally defined or managed. But there are some 
> common which would be easier managed if they all had the same name.
> Another motivation is that having a common set of names here will encourage 
> uniform instrumentation of all filesystems; it will also make it easier to 
> analyze the output of runs, were the stats to be published to a "performance 
> log" similar to the audit log. See Steve's work for S3  (e.g. [HADOOP-13171])
> This jira is track the effort of defining common StorageStatistics entry 
> names. Thanks to [~cmccabe], [~ste...@apache.org], [~hitesh] and [~jnp] for 
> offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12949) Add HTrace to the s3a connector

2016-06-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340829#comment-15340829
 ] 

Colin Patrick McCabe commented on HADOOP-12949:
---

Yeah, we certainly could use the UA header for this.  That assumes that 
Amazon's s3 implementation will start looking for this (which maybe they 
will?).  In the short term, the big win will be just connecting up the job 
being run with the operations being done at the s3a level.

> Add HTrace to the s3a connector
> ---
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Madhawa Gunasekara
>Assignee: Madhawa Gunasekara
>
> Hi All, 
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly 
> important in Hadoop. But we don't have distributed tracing for these yet. It 
> would be interesting to add distributed tracing here. It would enable 
> collecting really interesting data like probability distributions of PUT and 
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this 
> Thanks,
> Madhawa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13288) Guard null stats key in FileSystemStorageStatistics

2016-06-20 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-13288:
--
  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

> Guard null stats key in FileSystemStorageStatistics
> ---
>
> Key: HADOOP-13288
> URL: https://issues.apache.org/jira/browse/HADOOP-13288
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HADOOP-13288.000.patch, HADOOP-13288.001.patch
>
>
> Currently in {{FileSystemStorageStatistics}} we simply returns data from 
> {{FileSystem#Statistics}}. However there is no null key check, which leads to 
>  NPE problems to downstream applications. For example, we got a NPE when 
> passing a null key to {{FileSystemStorageStatistics#getLong()}}, exception 
> stack as following:
> {quote}
> NullPointerException
> at 
> org.apache.hadoop.fs.FileSystemStorageStatistics.fetch(FileSystemStorageStatistics.java:80)
> at 
> org.apache.hadoop.fs.FileSystemStorageStatistics.getLong(FileSystemStorageStatistics.java:108)
> at 
> org.apache.tez.runtime.metrics.FileSystemStatisticsUpdater2.updateCounters(FileSystemStatisticsUpdater2.java:60)
> at 
> org.apache.tez.runtime.metrics.TaskCounterUpdater.updateCounters(TaskCounterUpdater.java:118)
> at 
> org.apache.tez.runtime.RuntimeTask.setFrameworkCounters(RuntimeTask.java:172)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:100)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
> This jira is to add null stat key check to {{FileSystemStorageStatistics}}.
> Thanks [~hitesh] for trying in Tez and reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13280) FileSystemStorageStatistics#getLong(“readOps“) should return readOps + largeReadOps

2016-06-20 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-13280:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> FileSystemStorageStatistics#getLong(“readOps“) should return readOps + 
> largeReadOps
> ---
>
> Key: HADOOP-13280
> URL: https://issues.apache.org/jira/browse/HADOOP-13280
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HADOOP-13280-branch-2.8.000.patch, 
> HADOOP-13280.000.patch, HADOOP-13280.001.patch
>
>
> Currently {{FileSystemStorageStatistics}} instance simply returns data from 
> {{FileSystem$Statistics}}. As to {{readOps}}, the 
> {{FileSystem$Statistics#getReadOps()}} returns {{readOps + largeReadOps}}. We 
> should make the {{FileSystemStorageStatistics#getLong(“readOps“)}} return the 
> sum as well.
> Moreover, there is no unit tests for {{FileSystemStorageStatistics}} and this 
> JIRA will also address this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13288) Guard null stats key in FileSystemStorageStatistics

2016-06-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340428#comment-15340428
 ] 

Colin Patrick McCabe commented on HADOOP-13288:
---

+1.  Thanks, [~liuml07].

> Guard null stats key in FileSystemStorageStatistics
> ---
>
> Key: HADOOP-13288
> URL: https://issues.apache.org/jira/browse/HADOOP-13288
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-13288.000.patch, HADOOP-13288.001.patch
>
>
> Currently in {{FileSystemStorageStatistics}} we simply returns data from 
> {{FileSystem#Statistics}}. However there is no null key check, which leads to 
>  NPE problems to downstream applications. For example, we got a NPE when 
> passing a null key to {{FileSystemStorageStatistics#getLong()}}, exception 
> stack as following:
> {quote}
> NullPointerException
> at 
> org.apache.hadoop.fs.FileSystemStorageStatistics.fetch(FileSystemStorageStatistics.java:80)
> at 
> org.apache.hadoop.fs.FileSystemStorageStatistics.getLong(FileSystemStorageStatistics.java:108)
> at 
> org.apache.tez.runtime.metrics.FileSystemStatisticsUpdater2.updateCounters(FileSystemStatisticsUpdater2.java:60)
> at 
> org.apache.tez.runtime.metrics.TaskCounterUpdater.updateCounters(TaskCounterUpdater.java:118)
> at 
> org.apache.tez.runtime.RuntimeTask.setFrameworkCounters(RuntimeTask.java:172)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:100)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
> This jira is to add null stat key check to {{FileSystemStorageStatistics}}.
> Thanks [~hitesh] for trying in Tez and reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13280) FileSystemStorageStatistics#getLong(“readOps“) should return readOps + largeReadOps

2016-06-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340017#comment-15340017
 ] 

Colin Patrick McCabe commented on HADOOP-13280:
---

java should be able to widen from int to long without a typecast.  However, 
let's get this important fix  in, and then worry about making it prettier.

Thanks, [~liuml07].  +1.

> FileSystemStorageStatistics#getLong(“readOps“) should return readOps + 
> largeReadOps
> ---
>
> Key: HADOOP-13280
> URL: https://issues.apache.org/jira/browse/HADOOP-13280
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HADOOP-13280-branch-2.8.000.patch, 
> HADOOP-13280.000.patch, HADOOP-13280.001.patch
>
>
> Currently {{FileSystemStorageStatistics}} instance simply returns data from 
> {{FileSystem$Statistics}}. As to {{readOps}}, the 
> {{FileSystem$Statistics#getReadOps()}} returns {{readOps + largeReadOps}}. We 
> should make the {{FileSystemStorageStatistics#getLong(“readOps“)}} return the 
> sum as well.
> Moreover, there is no unit tests for {{FileSystemStorageStatistics}} and this 
> JIRA will also address this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13280) FileSystemStorageStatistics#getLong(“readOps“) should return readOps + largeReadOps

2016-06-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336647#comment-15336647
 ] 

Colin Patrick McCabe commented on HADOOP-13280:
---

Thanks, [~liuml07].  Does {{Long.valueOf(...)}} work?  It would be nice to 
avoid the typecast if possible.

> FileSystemStorageStatistics#getLong(“readOps“) should return readOps + 
> largeReadOps
> ---
>
> Key: HADOOP-13280
> URL: https://issues.apache.org/jira/browse/HADOOP-13280
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HADOOP-13280-branch-2.8.000.patch, 
> HADOOP-13280.000.patch, HADOOP-13280.001.patch
>
>
> Currently {{FileSystemStorageStatistics}} instance simply returns data from 
> {{FileSystem$Statistics}}. As to {{readOps}}, the 
> {{FileSystem$Statistics#getReadOps()}} returns {{readOps + largeReadOps}}. We 
> should make the {{FileSystemStorageStatistics#getLong(“readOps“)}} return the 
> sum as well.
> Moreover, there is no unit tests for {{FileSystemStorageStatistics}} and this 
> JIRA will also address this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-12975) Add jitter to CachingGetSpaceUsed's thread

2016-06-16 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-12975:
--
Affects Version/s: (was: 2.9.0)
   2.8.0

> Add jitter to CachingGetSpaceUsed's thread
> --
>
> Key: HADOOP-12975
> URL: https://issues.apache.org/jira/browse/HADOOP-12975
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.8.0
>
> Attachments: HADOOP-12975v0.patch, HADOOP-12975v1.patch, 
> HADOOP-12975v2.patch, HADOOP-12975v3.patch, HADOOP-12975v4.patch, 
> HADOOP-12975v5.patch, HADOOP-12975v6.patch
>
>
> Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike. We should add some 
> jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-12975) Add jitter to CachingGetSpaceUsed's thread

2016-06-16 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-12975:
--
  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

> Add jitter to CachingGetSpaceUsed's thread
> --
>
> Key: HADOOP-12975
> URL: https://issues.apache.org/jira/browse/HADOOP-12975
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.8.0
>
> Attachments: HADOOP-12975v0.patch, HADOOP-12975v1.patch, 
> HADOOP-12975v2.patch, HADOOP-12975v3.patch, HADOOP-12975v4.patch, 
> HADOOP-12975v5.patch, HADOOP-12975v6.patch
>
>
> Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike. We should add some 
> jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12975) Add jitter to CachingGetSpaceUsed's thread

2016-06-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335017#comment-15335017
 ] 

Colin Patrick McCabe commented on HADOOP-12975:
---

I was just adding jitter to the commit date.

+1.  Thanks, [~eclark].

> Add jitter to CachingGetSpaceUsed's thread
> --
>
> Key: HADOOP-12975
> URL: https://issues.apache.org/jira/browse/HADOOP-12975
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12975v0.patch, HADOOP-12975v1.patch, 
> HADOOP-12975v2.patch, HADOOP-12975v3.patch, HADOOP-12975v4.patch, 
> HADOOP-12975v5.patch, HADOOP-12975v6.patch
>
>
> Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike. We should add some 
> jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13284) FileSystemStorageStatistics must not attempt to read non-existent rack-aware read stats in branch-2.8

2016-06-16 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-13284:
--
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

> FileSystemStorageStatistics must not attempt to read non-existent rack-aware 
> read stats in branch-2.8
> -
>
> Key: HADOOP-13284
> URL: https://issues.apache.org/jira/browse/HADOOP-13284
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HADOOP-13284-branch-2.8.000.patch
>
>
> As [HDFS-9579] was not committed to {{branch-2.8}}, 
> {{FileSystemStorageStatistics#KEYS}} should not include those rack aware read 
> stats brought by [HDFS-9579], including {{bytesReadLocalHost, 
> bytesReadDistanceOfOneOrTwo, bytesReadDistanceOfThreeOrFour, 
> bytesReadDistanceOfFiveOrLarger}}. Or else, the long iterator will throw NPE 
> when traversing. See detailed exception stack as following (it happens when 
> Tez uses the new FileSystemStorageStatistics).
> {code}
> 2016-06-15 15:56:59,242 [DEBUG] [TezChild] |impl.TezProcessorContextImpl|: 
> Cleared TezProcessorContextImpl related information
> 2016-06-15 15:56:59,243 [WARN] [main] |task.TezTaskRunner2|: Exception from 
> RunnerCallable
> java.lang.NullPointerException
> at 
> org.apache.hadoop.fs.FileSystemStorageStatistics$LongStatisticIterator.next(FileSystemStorageStatistics.java:74)
> at 
> org.apache.hadoop.fs.FileSystemStorageStatistics$LongStatisticIterator.next(FileSystemStorageStatistics.java:51)
> at 
> org.apache.tez.runtime.metrics.FileSystemStatisticsUpdater2.updateCounters(FileSystemStatisticsUpdater2.java:51)
> at 
> org.apache.tez.runtime.metrics.TaskCounterUpdater.updateCounters(TaskCounterUpdater.java:118)
> at 
> org.apache.tez.runtime.RuntimeTask.setFrameworkCounters(RuntimeTask.java:172)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:100)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2016-06-15 15:56:59,245 [DEBUG] [main] |task.TaskReporter|: Sending heartbeat 
> to AM, request={  containerId=container_1466028486194_0005_01_02, 
> requestId=10, startIndex=0, preRoutedStartIndex=1, maxEventsToGet=500, 
> taskAttemptId=attempt_1466028486194_0005_1_00_00_0, eventCount=4 }
> {code}
> Thanks [~hitesh] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13284) FileSystemStorageStatistics must not attempt to read non-existent rack-aware read stats in branch-2.8

2016-06-16 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-13284:
--
Summary: FileSystemStorageStatistics must not attempt to read non-existent 
rack-aware read stats in branch-2.8  (was: Remove the rack-aware read stats in 
FileSystemStorageStatistics from branch-2.8)

> FileSystemStorageStatistics must not attempt to read non-existent rack-aware 
> read stats in branch-2.8
> -
>
> Key: HADOOP-13284
> URL: https://issues.apache.org/jira/browse/HADOOP-13284
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HADOOP-13284-branch-2.8.000.patch
>
>
> As [HDFS-9579] was not committed to {{branch-2.8}}, 
> {{FileSystemStorageStatistics#KEYS}} should not include those rack aware read 
> stats brought by [HDFS-9579], including {{bytesReadLocalHost, 
> bytesReadDistanceOfOneOrTwo, bytesReadDistanceOfThreeOrFour, 
> bytesReadDistanceOfFiveOrLarger}}. Or else, the long iterator will throw NPE 
> when traversing. See detailed exception stack as following (it happens when 
> Tez uses the new FileSystemStorageStatistics).
> {code}
> 2016-06-15 15:56:59,242 [DEBUG] [TezChild] |impl.TezProcessorContextImpl|: 
> Cleared TezProcessorContextImpl related information
> 2016-06-15 15:56:59,243 [WARN] [main] |task.TezTaskRunner2|: Exception from 
> RunnerCallable
> java.lang.NullPointerException
> at 
> org.apache.hadoop.fs.FileSystemStorageStatistics$LongStatisticIterator.next(FileSystemStorageStatistics.java:74)
> at 
> org.apache.hadoop.fs.FileSystemStorageStatistics$LongStatisticIterator.next(FileSystemStorageStatistics.java:51)
> at 
> org.apache.tez.runtime.metrics.FileSystemStatisticsUpdater2.updateCounters(FileSystemStatisticsUpdater2.java:51)
> at 
> org.apache.tez.runtime.metrics.TaskCounterUpdater.updateCounters(TaskCounterUpdater.java:118)
> at 
> org.apache.tez.runtime.RuntimeTask.setFrameworkCounters(RuntimeTask.java:172)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:100)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2016-06-15 15:56:59,245 [DEBUG] [main] |task.TaskReporter|: Sending heartbeat 
> to AM, request={  containerId=container_1466028486194_0005_01_02, 
> requestId=10, startIndex=0, preRoutedStartIndex=1, maxEventsToGet=500, 
> taskAttemptId=attempt_1466028486194_0005_1_00_00_0, eventCount=4 }
> {code}
> Thanks [~hitesh] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13284) Remove the rack-aware read stats in FileSystemStorageStatistics from branch-2.8

2016-06-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334989#comment-15334989
 ] 

Colin Patrick McCabe commented on HADOOP-13284:
---

Thanks for spotting this, [~liuml07].  Good find.

+1, will commit to 2.8 shortly.

> Remove the rack-aware read stats in FileSystemStorageStatistics from 
> branch-2.8
> ---
>
> Key: HADOOP-13284
> URL: https://issues.apache.org/jira/browse/HADOOP-13284
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-13284-branch-2.8.000.patch
>
>
> As [HDFS-9579] was not committed to {{branch-2.8}}, 
> {{FileSystemStorageStatistics#KEYS}} should not include those rack aware read 
> stats brought by [HDFS-9579], including {{bytesReadLocalHost, 
> bytesReadDistanceOfOneOrTwo, bytesReadDistanceOfThreeOrFour, 
> bytesReadDistanceOfFiveOrLarger}}. Or else, the long iterator will throw NPE 
> when traversing. See detailed exception stack as following (it happens when 
> Tez uses the new FileSystemStorageStatistics).
> {code}
> 2016-06-15 15:56:59,242 [DEBUG] [TezChild] |impl.TezProcessorContextImpl|: 
> Cleared TezProcessorContextImpl related information
> 2016-06-15 15:56:59,243 [WARN] [main] |task.TezTaskRunner2|: Exception from 
> RunnerCallable
> java.lang.NullPointerException
> at 
> org.apache.hadoop.fs.FileSystemStorageStatistics$LongStatisticIterator.next(FileSystemStorageStatistics.java:74)
> at 
> org.apache.hadoop.fs.FileSystemStorageStatistics$LongStatisticIterator.next(FileSystemStorageStatistics.java:51)
> at 
> org.apache.tez.runtime.metrics.FileSystemStatisticsUpdater2.updateCounters(FileSystemStatisticsUpdater2.java:51)
> at 
> org.apache.tez.runtime.metrics.TaskCounterUpdater.updateCounters(TaskCounterUpdater.java:118)
> at 
> org.apache.tez.runtime.RuntimeTask.setFrameworkCounters(RuntimeTask.java:172)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:100)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2016-06-15 15:56:59,245 [DEBUG] [main] |task.TaskReporter|: Sending heartbeat 
> to AM, request={  containerId=container_1466028486194_0005_01_02, 
> requestId=10, startIndex=0, preRoutedStartIndex=1, maxEventsToGet=500, 
> taskAttemptId=attempt_1466028486194_0005_1_00_00_0, eventCount=4 }
> {code}
> Thanks [~hitesh] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13280) FileSystemStorageStatistics#getLong(“readOps“) should return readOps + largeReadOps

2016-06-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334982#comment-15334982
 ] 

Colin Patrick McCabe commented on HADOOP-13280:
---

Thanks for the patch, [~liuml07].  You are right that it should be readOps + 
largeReadOps.  It's great to have a test as well.

{code}
  return (long) (data.getReadOps() + data.getLargeReadOps());
{code}
Do we need the typecast here?  Seems like it shouldn't be required since the 
int should be promoted to a long automatically.  +1 once that's addressed.

> FileSystemStorageStatistics#getLong(“readOps“) should return readOps + 
> largeReadOps
> ---
>
> Key: HADOOP-13280
> URL: https://issues.apache.org/jira/browse/HADOOP-13280
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HADOOP-13280.000.patch, HADOOP-13280.001.patch
>
>
> Currently {{FileSystemStorageStatistics}} instance simply returns data from 
> {{FileSystem$Statistics}}. As to {{readOps}}, the 
> {{FileSystem$Statistics#getReadOps()}} returns {{readOps + largeReadOps}}. We 
> should make the {{FileSystemStorageStatistics#getLong(“readOps“)}} return the 
> sum as well.
> Moreover, there is no unit tests for {{FileSystemStorageStatistics}} and this 
> JIRA will also address this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321328#comment-15321328
 ] 

Colin Patrick McCabe commented on HADOOP-13223:
---

Thanks for the explanation, [~cnauroth].  Migrating functionality to the DLL 
seems like a good idea long-term for a lot of reasons.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319318#comment-15319318
 ] 

Colin Patrick McCabe commented on HADOOP-13223:
---

Hmm.  It's not clear to me why a DLL would be less prone to path problems than 
an EXE.  It seems like we should just be putting a version number on the EXE, 
so that we avoid these conflicts.  We have the same problem with libhadoop-- 
see HADOOP-11127.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13137) TraceAdmin should support Kerberized cluster

2016-05-31 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-13137:
--
  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

> TraceAdmin should support Kerberized cluster
> 
>
> Key: HADOOP-13137
> URL: https://issues.apache.org/jira/browse/HADOOP-13137
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tracing
>Affects Versions: 2.6.0, 3.0.0-alpha1
> Environment: CDH5.5.1 cluster with Kerberos
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: Kerberos
> Fix For: 2.8.0
>
> Attachments: HADOOP-13137.001.patch, HADOOP-13137.002.patch, 
> HADOOP-13137.003.patch, HADOOP-13137.004.patch, HADOOP-13137.005.patch
>
>
> When I run {{hadoop trace}} command for a Kerberized NameNode, it failed with 
> the following error:
> [hdfs@weichiu-encryption-1 root]$ hadoop trace -list  -host 
> weichiu-encryption-1.vpc.cloudera.com:802216/05/12 00:02:13 WARN ipc.Client: 
> Exception encountered while connecting to the server : 
> java.lang.IllegalArgumentException: Failed to specify server's Kerberos 
> principal name
> 16/05/12 00:02:13 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@vpc.cloudera.com (auth:KERBEROS) 
> cause:java.io.IOException: java.lang.IllegalArgumentException: Failed to 
> specify server's Kerberos principal name
> Exception in thread "main" java.io.IOException: Failed on local exception: 
> java.io.IOException: java.lang.IllegalArgumentException: Failed to specify 
> server's Kerberos principal name; Host Details : local host is: 
> "weichiu-encryption-1.vpc.cloudera.com/172.26.8.185"; destination host is: 
> "weichiu-encryption-1.vpc.cloudera.com":8022;
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1470)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>   at com.sun.proxy.$Proxy11.listSpanReceivers(Unknown Source)
>   at 
> org.apache.hadoop.tracing.TraceAdminProtocolTranslatorPB.listSpanReceivers(TraceAdminProtocolTranslatorPB.java:58)
>   at 
> org.apache.hadoop.tracing.TraceAdmin.listSpanReceivers(TraceAdmin.java:68)
>   at org.apache.hadoop.tracing.TraceAdmin.run(TraceAdmin.java:177)
>   at org.apache.hadoop.tracing.TraceAdmin.main(TraceAdmin.java:195)
> Caused by: java.io.IOException: java.lang.IllegalArgumentException: Failed to 
> specify server's Kerberos principal name
>   at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:645)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
>   at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1442)
>   ... 7 more
> Caused by: java.lang.IllegalArgumentException: Failed to specify server's 
> Kerberos principal name
>   at 
> org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:322)
>   at 
> org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:231)
>   at 
> org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:159)
>   at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:555)
>   at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:370)
>   at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725)
>   at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:720)
>   ... 10 more
> It is failing because {{TraceAdmin}} does not set up the property 
> {{CommonConfigurationKeys.HADOOP_SECURITY_SERVICE_USER_NAME_KEY}}
> Fixing it may require some restructuring, as the NameNode principal 
> {{dfs.namenode.kerberos.principal

[jira] [Commented] (HADOOP-13137) TraceAdmin should support Kerberized cluster

2016-05-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308999#comment-15308999
 ] 

Colin Patrick McCabe commented on HADOOP-13137:
---

bq. The test failures look unrelated.

I agree-- I ran them locally, and they passed.

Thanks, [~jojochuang] and [~steve_l].  +1.

> TraceAdmin should support Kerberized cluster
> 
>
> Key: HADOOP-13137
> URL: https://issues.apache.org/jira/browse/HADOOP-13137
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tracing
>Affects Versions: 2.6.0, 3.0.0-alpha1
> Environment: CDH5.5.1 cluster with Kerberos
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: Kerberos
> Attachments: HADOOP-13137.001.patch, HADOOP-13137.002.patch, 
> HADOOP-13137.003.patch, HADOOP-13137.004.patch, HADOOP-13137.005.patch
>
>
> When I run {{hadoop trace}} command for a Kerberized NameNode, it failed with 
> the following error:
> [hdfs@weichiu-encryption-1 root]$ hadoop trace -list  -host 
> weichiu-encryption-1.vpc.cloudera.com:802216/05/12 00:02:13 WARN ipc.Client: 
> Exception encountered while connecting to the server : 
> java.lang.IllegalArgumentException: Failed to specify server's Kerberos 
> principal name
> 16/05/12 00:02:13 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@vpc.cloudera.com (auth:KERBEROS) 
> cause:java.io.IOException: java.lang.IllegalArgumentException: Failed to 
> specify server's Kerberos principal name
> Exception in thread "main" java.io.IOException: Failed on local exception: 
> java.io.IOException: java.lang.IllegalArgumentException: Failed to specify 
> server's Kerberos principal name; Host Details : local host is: 
> "weichiu-encryption-1.vpc.cloudera.com/172.26.8.185"; destination host is: 
> "weichiu-encryption-1.vpc.cloudera.com":8022;
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1470)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>   at com.sun.proxy.$Proxy11.listSpanReceivers(Unknown Source)
>   at 
> org.apache.hadoop.tracing.TraceAdminProtocolTranslatorPB.listSpanReceivers(TraceAdminProtocolTranslatorPB.java:58)
>   at 
> org.apache.hadoop.tracing.TraceAdmin.listSpanReceivers(TraceAdmin.java:68)
>   at org.apache.hadoop.tracing.TraceAdmin.run(TraceAdmin.java:177)
>   at org.apache.hadoop.tracing.TraceAdmin.main(TraceAdmin.java:195)
> Caused by: java.io.IOException: java.lang.IllegalArgumentException: Failed to 
> specify server's Kerberos principal name
>   at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:645)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
>   at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1442)
>   ... 7 more
> Caused by: java.lang.IllegalArgumentException: Failed to specify server's 
> Kerberos principal name
>   at 
> org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:322)
>   at 
> org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:231)
>   at 
> org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:159)
>   at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:555)
>   at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:370)
>   at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725)
>   at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:720)
>   ... 10 more
> It is failing because {{TraceAdmin}} does not set up the property 
> {{CommonConfigurationKeys.HADOOP_SECURITY_SERVICE_USER_NAME_KEY}}
> Fixing it may require some restructuring, as the NameNode principal 
> {{dfs.namenode.

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-05-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301319#comment-15301319
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

Thanks for your work on this, [~drankye].  +1.  Let's continue the discussion 
on the follow-on JIRAs.

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, 
> HADOOP-13010-v3.patch, HADOOP-13010-v4.patch, HADOOP-13010-v5.patch, 
> HADOOP-13010-v6.patch, HADOOP-13010-v7.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-05-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298600#comment-15298600
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

{{TestCodecRawCoderMapping}} fails for me:

{code}
testRSDefaultRawCoder(org.apache.hadoop.io.erasurecode.TestCodecRawCoderMapping)
  Time elapsed: 0.015 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.io.erasurecode.TestCodecRawCoderMapping.testRSDefaultRawCoder(TestCodecRawCoderMapping.java:54)
{code}

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, 
> HADOOP-13010-v3.patch, HADOOP-13010-v4.patch, HADOOP-13010-v5.patch, 
> HADOOP-13010-v6.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-05-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298587#comment-15298587
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

It was nice talking to you, [~drankye].  It's too bad that we didn't have more 
time (it was a busy week because I was going out of town).

bq. As I explained as above, \[the configuration-based\] approach might not 
work in all cases, because: there are more than one codecs to be configured and 
for each of these codecs there may be more than one coder implementation to be 
configured, and it's not easy to flatten the two layers into one dimension 
(here you used algorithm).

I think these are really configuration questions, not questions about how the 
code should be structured.  What does the user actually need to configure?  If 
the user just configures a coder implementation, does that fully determine the 
codec which is being used?  If so, we should have only one configuration knob-- 
coder.  If a coder could be used for multiple codecs, then we need to have at 
least two knobs that the user can configure-- one for codec, and another for 
coder.  Once we know what the configuration knobs are, we probably only need 
one or two functions to create the objects we need based on a {{Configuration}} 
object, not a whole mess of factory objects.

Anyway, let's talk about refactoring codec configuration and factories in a 
follow-on JIRA.  I think we've made a lot of good progress here and it will 
helpful to get this patch committed.

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, 
> HADOOP-13010-v3.patch, HADOOP-13010-v4.patch, HADOOP-13010-v5.patch, 
> HADOOP-13010-v6.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-7352) Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error

2016-05-23 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296947#comment-15296947
 ] 

Colin Patrick McCabe commented on HADOOP-7352:
--

This should be easier with the new jdk7 changes.  We now have access to 
directory listing APIs like DirectoryStream that throw IOEs on problems instead 
of returning null.

> Contracts of LocalFileSystem and DistributedFileSystem should require 
> FileSystem::listStatus throw IOException not return null upon access error
> 
>
> Key: HADOOP-7352
> URL: https://issues.apache.org/jira/browse/HADOOP-7352
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/s3
>Reporter: Matt Foley
>Assignee: Matt Foley
>
> In HADOOP-6201 and HDFS-538 it was agreed that FileSystem::listStatus should 
> throw FileNotFoundException instead of returning null, when the target 
> directory did not exist.
> However, in LocalFileSystem implementation today, FileSystem::listStatus 
> still may return null, when the target directory exists but does not grant 
> read permission.  This causes NPE in many callers, for all the reasons cited 
> in HADOOP-6201 and HDFS-538.  See HADOOP-7327 and its linked issues for 
> examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13137) TraceAdmin should support Kerberized cluster

2016-05-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293773#comment-15293773
 ] 

Colin Patrick McCabe commented on HADOOP-13137:
---

The patch looks good to me.  I think there are a few other commands that might 
need to get an argument like this, if it's necessary when communicating 
directly with a kerberized Hadoop server.  I do wonder why we need a new file, 
TestKerberizedTraceAdmin.java, when it could have been a test in 
TestTraceAdmin.java, but I don't feel that strongly about it.  Thanks, 
[~jojochuang] and [~steve_l].

> TraceAdmin should support Kerberized cluster
> 
>
> Key: HADOOP-13137
> URL: https://issues.apache.org/jira/browse/HADOOP-13137
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tracing
>Affects Versions: 2.6.0, 3.0.0-alpha1
> Environment: CDH5.5.1 cluster with Kerberos
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: Kerberos
> Attachments: HADOOP-13137.001.patch, HADOOP-13137.002.patch
>
>
> When I run {{hadoop trace}} command for a Kerberized NameNode, it failed with 
> the following error:
> [hdfs@weichiu-encryption-1 root]$ hadoop trace -list  -host 
> weichiu-encryption-1.vpc.cloudera.com:802216/05/12 00:02:13 WARN ipc.Client: 
> Exception encountered while connecting to the server : 
> java.lang.IllegalArgumentException: Failed to specify server's Kerberos 
> principal name
> 16/05/12 00:02:13 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@vpc.cloudera.com (auth:KERBEROS) 
> cause:java.io.IOException: java.lang.IllegalArgumentException: Failed to 
> specify server's Kerberos principal name
> Exception in thread "main" java.io.IOException: Failed on local exception: 
> java.io.IOException: java.lang.IllegalArgumentException: Failed to specify 
> server's Kerberos principal name; Host Details : local host is: 
> "weichiu-encryption-1.vpc.cloudera.com/172.26.8.185"; destination host is: 
> "weichiu-encryption-1.vpc.cloudera.com":8022;
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1470)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>   at com.sun.proxy.$Proxy11.listSpanReceivers(Unknown Source)
>   at 
> org.apache.hadoop.tracing.TraceAdminProtocolTranslatorPB.listSpanReceivers(TraceAdminProtocolTranslatorPB.java:58)
>   at 
> org.apache.hadoop.tracing.TraceAdmin.listSpanReceivers(TraceAdmin.java:68)
>   at org.apache.hadoop.tracing.TraceAdmin.run(TraceAdmin.java:177)
>   at org.apache.hadoop.tracing.TraceAdmin.main(TraceAdmin.java:195)
> Caused by: java.io.IOException: java.lang.IllegalArgumentException: Failed to 
> specify server's Kerberos principal name
>   at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:645)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
>   at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1442)
>   ... 7 more
> Caused by: java.lang.IllegalArgumentException: Failed to specify server's 
> Kerberos principal name
>   at 
> org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:322)
>   at 
> org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:231)
>   at 
> org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:159)
>   at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:555)
>   at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:370)
>   at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725)
>   at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:720)
>   ... 10 more
> It is failing because {{Tra

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-05-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289361#comment-15289361
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

bq. Not sure if it's good to have something like isXOR or isRS, because we'll 
have more coder algorithms other than the current both.

That's a fair point.  It seems unlikely that we need an isXOR or isRS method.

bq. OK, we can have \[createRawEncoder\]. Maybe it wouldn't be bad to have two 
shortcut methods additionally, createRSRawEncoder and createXORRawEncoder, 
because the both are the primitive, essential and most used ones in 
implementing advanced coders and HDFS side. I want the both to be outstanding 
and easily used.

It seems better just to have one function, {{createRawEncoder}}, than to have 
lots of functions for every type of encoder.

bq. I discussed about this with Uma Maheswara Rao G quite some ago when 
introducing these factories. There isn't a clear way to compose or reduce the 
full class name of a raw coder because it should be plugin-able and 
configurable. In current approach, for each codec, there could be some coder 
implementations, and for each, the corresponding coder factory can be 
configured.

We discussed this offline and I think the conclusion is that we don't need the 
factories for anything.

We can just have a configuration key like {{erasure.coder.algorithm}} and then 
code like this:

{code}
RawErasureEncoder createRawEncoder(Configuration conf) {
  String classPrefix = conf.get("erasure.coder.algorithm", 
DEFAULT_ERASURE_CODER_ALGORITHM);
  String name = classPrefix + "Encoder";
  Constructor ctor = 
classLoader.loadClass(name).getConstructor(Configuration.class);
  return ctor.newInstance(conf);
}

RawErasureDecoder createRawDecoder(Configuration conf) {
  String classPrefix = conf.get("erasure.coder.algorithm", 
DEFAULT_ERASURE_CODER_ALGORITHM);
  String name = classPrefix + "Decoder";
  Constructor ctor = 
classLoader.loadClass(name).getConstructor(Configuration.class);
  return ctor.newInstance(conf);
}
{code}

bq. It seems this can simplify the related functions, but am not sure it would 
make the codes more readable. The mentioned variables are very specific to 
encode/decode related calls using on-heap bytebuffer or byte array buffers. 
Maybe DecodingState could be kept simple not putting too many intermediate 
variables because the codes using of them are not suitable to be moved to the 
class.

Reducing the number of function parameters from 8 or 9 to 1 or 2 seems like it 
would make the code much more readable.  I don't understand what the rationale 
is for keeping these parameters out of DecodingState.  Perhaps we could discuss 
this in a follow-on JIRA, though.

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, 
> HADOOP-13010-v3.patch, HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13140) FileSystem#initialize must not attempt to create StorageStatistics objects with null or empty schemes

2016-05-18 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-13140:
--
  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

> FileSystem#initialize must not attempt to create StorageStatistics objects 
> with null or empty schemes
> -
>
> Key: HADOOP-13140
> URL: https://issues.apache.org/jira/browse/HADOOP-13140
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Brahma Reddy Battula
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HADOOP-13140.000.patch, HADOOP-13140.001.patch, 
> HADOOP-13140.002.patch
>
>
> {{org.apache.hadoop.fs.GlobalStorageStatistics#put}} is not checking the null 
> scheme, and the internal map will complain NPE. This was reported by a flaky 
> test {{TestFileSystemApplicationHistoryStore}}. Thanks [~brahmareddy] for 
> reporting.
> To address this,
> # Fix the test by providing a valid URI, e.g. {{file:///}}
> # Guard the null scheme in {{GlobalStorageStatistics#put}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13140) FileSystem#initialize must not attempt to create StorageStatistics objects with null or empty schemes

2016-05-18 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-13140:
--
Summary: FileSystem#initialize must not attempt to create StorageStatistics 
objects with null or empty schemes  (was: GlobalStorageStatistics should check 
null FileSystem scheme to avoid NPE)

> FileSystem#initialize must not attempt to create StorageStatistics objects 
> with null or empty schemes
> -
>
> Key: HADOOP-13140
> URL: https://issues.apache.org/jira/browse/HADOOP-13140
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Brahma Reddy Battula
>Assignee: Mingliang Liu
> Attachments: HADOOP-13140.000.patch, HADOOP-13140.001.patch, 
> HADOOP-13140.002.patch
>
>
> {{org.apache.hadoop.fs.GlobalStorageStatistics#put}} is not checking the null 
> scheme, and the internal map will complain NPE. This was reported by a flaky 
> test {{TestFileSystemApplicationHistoryStore}}. Thanks [~brahmareddy] for 
> reporting.
> To address this,
> # Fix the test by providing a valid URI, e.g. {{file:///}}
> # Guard the null scheme in {{GlobalStorageStatistics#put}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13140) GlobalStorageStatistics should check null FileSystem scheme to avoid NPE

2016-05-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289221#comment-15289221
 ] 

Colin Patrick McCabe commented on HADOOP-13140:
---

+1.  Thanks, [~liuml07].

> GlobalStorageStatistics should check null FileSystem scheme to avoid NPE
> 
>
> Key: HADOOP-13140
> URL: https://issues.apache.org/jira/browse/HADOOP-13140
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Brahma Reddy Battula
>Assignee: Mingliang Liu
> Attachments: HADOOP-13140.000.patch, HADOOP-13140.001.patch, 
> HADOOP-13140.002.patch
>
>
> {{org.apache.hadoop.fs.GlobalStorageStatistics#put}} is not checking the null 
> scheme, and the internal map will complain NPE. This was reported by a flaky 
> test {{TestFileSystemApplicationHistoryStore}}. Thanks [~brahmareddy] for 
> reporting.
> To address this,
> # Fix the test by providing a valid URI, e.g. {{file:///}}
> # Guard the null scheme in {{GlobalStorageStatistics#put}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13140) GlobalStorageStatistics should check null FileSystem scheme to avoid NPE

2016-05-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287297#comment-15287297
 ] 

Colin Patrick McCabe commented on HADOOP-13140:
---

{code}
  /** Called after a new FileSystem instance is constructed.
   * @param name a uri whose authority section names the host, port, etc.
   *   for this FileSystem
   * @param conf the configuration
   */
  public void initialize(URI name, Configuration conf) throws IOException {
statistics = getStatistics(name.getScheme(), getClass());
resolveSymlinks = conf.getBoolean(
CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY,
CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_DEFAULT);
  }
{code}

If {{name#getScheme()}} is empty or null here, we can use 
{{FileSystem#getDefaultUri#getScheme}} to pass a non-null scheme.  That should 
cover almost all the cases where a null scheme would be passed.

If the user intentionally passes a null or empty scheme directly to 
{{FileSystem#getStatistics}}, we should throw an exception.

> GlobalStorageStatistics should check null FileSystem scheme to avoid NPE
> 
>
> Key: HADOOP-13140
> URL: https://issues.apache.org/jira/browse/HADOOP-13140
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Brahma Reddy Battula
>Assignee: Mingliang Liu
> Attachments: HADOOP-13140.000.patch
>
>
> {{org.apache.hadoop.fs.GlobalStorageStatistics#put}} is not checking the null 
> scheme, and the internal map will complain NPE. This was reported by a flaky 
> test {{TestFileSystemApplicationHistoryStore}}. Thanks [~brahmareddy] for 
> reporting.
> To address this,
> # Fix the test by providing a valid URI, e.g. {{file:///}}
> # Guard the null scheme in {{GlobalStorageStatistics#put}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics

2016-05-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285257#comment-15285257
 ] 

Colin Patrick McCabe commented on HADOOP-13065:
---

bq. I filed HADOOP-13140 to track the effort.

Thanks, [~liuml07].

bq. [~mingma] wrote: BTW, is the network distance metrics something general for 
all file system or it is more specific to HDFS? For example, local file system 
doesn't need that. If it is more HDFS specific, wonder if we should move it to 
HDFS specific metrics.

I agree that it's more conceptually consistent to put the distance-related 
metrics in HDFS-specific code.  However, we would have to develop an optimized 
thread-local mechanism to do this, to avoid causing a performance regression in 
HDFS stream performance.  Perhaps it would be better to simply move this to 
HDFS's existing per-stream ReadStatistics for now.

> Add a new interface for retrieving FS and FC Statistics
> ---
>
> Key: HADOOP-13065
> URL: https://issues.apache.org/jira/browse/HADOOP-13065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, 
> HADOOP-13065.009.patch, HADOOP-13065.010.patch, HADOOP-13065.011.patch, 
> HADOOP-13065.012.patch, HADOOP-13065.013.patch, HDFS-10175.000.patch, 
> HDFS-10175.001.patch, HDFS-10175.002.patch, HDFS-10175.003.patch, 
> HDFS-10175.004.patch, HDFS-10175.005.patch, HDFS-10175.006.patch, 
> TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13140) GlobalStorageStatistics should check null FileSystem scheme to avoid NPE

2016-05-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285252#comment-15285252
 ] 

Colin Patrick McCabe commented on HADOOP-13140:
---

Thanks for looking at this.  I don't think it makes sense to use a null, empty 
string, or otherwise invalid string to identify a statistics object.  It has no 
meaning to the user.  I think if the user passes a URL with a null schema, we 
should call {{FileSystem#getDefaultUri}} and use the default schema, similar to 
how {{FileSystem#get}} functions.

> GlobalStorageStatistics should check null FileSystem scheme to avoid NPE
> 
>
> Key: HADOOP-13140
> URL: https://issues.apache.org/jira/browse/HADOOP-13140
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Brahma Reddy Battula
>Assignee: Mingliang Liu
> Attachments: HADOOP-13140.000.patch
>
>
> {{org.apache.hadoop.fs.GlobalStorageStatistics#put}} is not checking the null 
> scheme, and the internal map will complain NPE. This was reported by a flaky 
> test {{TestFileSystemApplicationHistoryStore}}. Thanks [~brahmareddy] for 
> reporting.
> To address this,
> # Fix the test by providing a valid URI, e.g. {{file:///}}
> # Guard the null scheme in {{GlobalStorageStatistics#put}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11505) Various native parts use bswap incorrectly and unportably

2016-05-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285233#comment-15285233
 ] 

Colin Patrick McCabe commented on HADOOP-11505:
---

I think adding a PPC nightly build would be a step in the right direction.  Of 
course, people interested in making Hadoop work well on PPC would still have to 
fix occasional breakages and performance regression.  Apache is a do-ocracy so 
if people want to put in the work to do it, it will get done.

> Various native parts use bswap incorrectly and unportably
> -
>
> Key: HADOOP-11505
> URL: https://issues.apache.org/jira/browse/HADOOP-11505
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Colin Patrick McCabe
>Assignee: Alan Burlison
>Priority: Critical
> Attachments: HADOOP-11505.001.patch, HADOOP-11505.003.patch, 
> HADOOP-11505.004.patch, HADOOP-11505.005.patch, HADOOP-11505.006.patch, 
> HADOOP-11505.007.patch, HADOOP-11505.008.patch
>
>
> hadoop-mapreduce-client-nativetask fails to use x86 optimizations in some 
> cases.  Also, on some alternate, non-x86, non-ARM architectures the generated 
> code is incorrect.  Thanks to Steve Loughran and Edward Nevill for finding 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-10942) Globbing optimizations and regression fix

2016-05-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283108#comment-15283108
 ] 

Colin Patrick McCabe edited comment on HADOOP-10942 at 5/13/16 8:31 PM:


The regression referred to here was fixed in HADOOP-10957.  The optimizations 
are already implemented (we don't perform an RPC on each path component, only 
when we need to do so to implement a wildcard.)


was (Author: cmccabe):
The regression fix referred to here was fixed in HADOOP-10957.  The 
optimizations are already implemented (we don't perform an RPC on each path 
component, only when we need to do so to implement a wildcard.)

> Globbing optimizations and regression fix
> -
>
> Key: HADOOP-10942
> URL: https://issues.apache.org/jira/browse/HADOOP-10942
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.1.0-beta, 3.0.0-alpha1
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-10942.patch
>
>
> When globbing was commonized to support both filesystem and filecontext, it 
> regressed a fix that prevents an intermediate glob that matches a file from 
> throwing a confusing permissions exception.  The hdfs traverse check requires 
> the exec bit which a file does not have.
> Additional optimizations to reduce rpcs actually increases them if 
> directories contain 1 item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-10942) Globbing optimizations and regression fix

2016-05-13 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-10942:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

The regression fix referred to here was fixed in HADOOP-10957.  The 
optimizations are already implemented (we don't perform an RPC on each path 
component, only when we need to do so to implement a wildcard.)

> Globbing optimizations and regression fix
> -
>
> Key: HADOOP-10942
> URL: https://issues.apache.org/jira/browse/HADOOP-10942
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.1.0-beta, 3.0.0-alpha1
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-10942.patch
>
>
> When globbing was commonized to support both filesystem and filecontext, it 
> regressed a fix that prevents an intermediate glob that matches a file from 
> throwing a confusing permissions exception.  The hdfs traverse check requires 
> the exec bit which a file does not have.
> Additional optimizations to reduce rpcs actually increases them if 
> directories contain 1 item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13142) Change project version from 3.0.0 to 3.0.0-alpha1

2016-05-12 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282323#comment-15282323
 ] 

Colin Patrick McCabe commented on HADOOP-13142:
---

+1.  Thanks, [~andrew.wang].

> Change project version from 3.0.0 to 3.0.0-alpha1
> -
>
> Key: HADOOP-13142
> URL: https://issues.apache.org/jira/browse/HADOOP-13142
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hadoop-13142.001.patch
>
>
> We want to rename 3.0.0 to 3.0.0-alpha1 for the first alpha release. However, 
> the version number is also encoded outside of the pom.xml's, so we need to 
> update these too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12975) Add jitter to CachingGetSpaceUsed's thread

2016-05-12 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281731#comment-15281731
 ] 

Colin Patrick McCabe commented on HADOOP-12975:
---

Thanks, [~eclark].  +1 pending jenkins

> Add jitter to CachingGetSpaceUsed's thread
> --
>
> Key: HADOOP-12975
> URL: https://issues.apache.org/jira/browse/HADOOP-12975
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12975v0.patch, HADOOP-12975v1.patch, 
> HADOOP-12975v2.patch, HADOOP-12975v3.patch, HADOOP-12975v4.patch, 
> HADOOP-12975v5.patch, HADOOP-12975v6.patch
>
>
> Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike. We should add some 
> jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests

2016-05-12 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281704#comment-15281704
 ] 

Colin Patrick McCabe commented on HADOOP-13028:
---

bq. This is something to bring up on the dev list, as it is something we 
essentially missed. Colin Patrick McCabe: would you care for the honour?

Sure.  I started a thread on common-dev.

bq. Steve has \[added the stability comment\] in patch v011.

Great.

Here is my +1 as well.  Thanks again, guys.

> add low level counter metrics for S3A; use in read performance tests
> 
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch, 
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch, 
> HADOOP-13028-012.patch, HADOOP-13028-013.patch, 
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch, 
> HADOOP-13028-branch-2-010.patch, HADOOP-13028-branch-2-011.patch, 
> HADOOP-13028-branch-2-012.patch, HADOOP-13028-branch-2-013.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests

2016-05-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281102#comment-15281102
 ] 

Colin Patrick McCabe commented on HADOOP-13028:
---

That's a good point, [~cnauroth].  I guess as long as people don't start 
treating this output as a stable API, it's reasonable to have debugging 
information there.  Can we add a comment to toString stating that this output 
is not stable API and should not be parsed?  +1 once that is done.

Thanks for working on this, [~steve_l]... it's going to be very helpful for 
running Hadoop on s3.

> add low level counter metrics for S3A; use in read performance tests
> 
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch, 
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch, 
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch, 
> HADOOP-13028-branch-2-010.patch, HADOOP-13028-branch-2-011.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11505) Various native parts use bswap incorrectly and unportably

2016-05-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281046#comment-15281046
 ] 

Colin Patrick McCabe commented on HADOOP-11505:
---

I think it would be great to see build slaves with alternate architectures.  
Maybe a good place to start is by emailing the hadoop development list and 
talking to the infrastructure team.

> Various native parts use bswap incorrectly and unportably
> -
>
> Key: HADOOP-11505
> URL: https://issues.apache.org/jira/browse/HADOOP-11505
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Alan Burlison
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HADOOP-11505.001.patch, HADOOP-11505.003.patch, 
> HADOOP-11505.004.patch, HADOOP-11505.005.patch, HADOOP-11505.006.patch, 
> HADOOP-11505.007.patch, HADOOP-11505.008.patch
>
>
> hadoop-mapreduce-client-nativetask fails to use x86 optimizations in some 
> cases.  Also, on some alternate, non-x86, non-ARM architectures the generated 
> code is incorrect.  Thanks to Steve Loughran and Edward Nevill for finding 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics

2016-05-11 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-13065:
--
  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

committed to 2.8.  Thanks, [~liuml07].

> Add a new interface for retrieving FS and FC Statistics
> ---
>
> Key: HADOOP-13065
> URL: https://issues.apache.org/jira/browse/HADOOP-13065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, 
> HADOOP-13065.009.patch, HADOOP-13065.010.patch, HADOOP-13065.011.patch, 
> HADOOP-13065.012.patch, HADOOP-13065.013.patch, HDFS-10175.000.patch, 
> HDFS-10175.001.patch, HDFS-10175.002.patch, HDFS-10175.003.patch, 
> HDFS-10175.004.patch, HDFS-10175.005.patch, HDFS-10175.006.patch, 
> TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-12749) Create a threadpoolexecutor that overrides afterExecute to log uncaught exceptions/errors

2016-05-11 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HADOOP-12749.
---
  Resolution: Fixed
   Fix Version/s: (was: 2.9.0)
  2.8.0
Target Version/s: 2.8.0

Backported to 2.8

> Create a threadpoolexecutor that overrides afterExecute to log uncaught 
> exceptions/errors
> -
>
> Key: HADOOP-12749
> URL: https://issues.apache.org/jira/browse/HADOOP-12749
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Fix For: 2.8.0
>
> Attachments: HADOOP-12749.001.patch, HADOOP-12749.002.patch, 
> HADOOP-12749.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-12749) Create a threadpoolexecutor that overrides afterExecute to log uncaught exceptions/errors

2016-05-11 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reopened HADOOP-12749:
---

> Create a threadpoolexecutor that overrides afterExecute to log uncaught 
> exceptions/errors
> -
>
> Key: HADOOP-12749
> URL: https://issues.apache.org/jira/browse/HADOOP-12749
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Fix For: 2.9.0
>
> Attachments: HADOOP-12749.001.patch, HADOOP-12749.002.patch, 
> HADOOP-12749.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests

2016-05-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280782#comment-15280782
 ] 

Colin Patrick McCabe edited comment on HADOOP-13028 at 5/11/16 8:43 PM:


In the past I've written code for Spark that used reflection to make use of 
APIs that may or may not be present in Hadoop.  HBase often does this as well, 
so that it can use multiple versions of Hadoop.  It seems like this wouldn't be 
a lot of code.  Is that feasible in this case?

I just find the argument that we should overload an existing unrelated API to 
output statistics very off-putting.  I guess you could argue that the 
statistics is part of the stream state, and toString is intended to reflect 
stream state.  But it will result in very long output from toString which 
probably isn't what most existing callers want.  And it's not consistent with 
the way any other hadoop streams work, including other s3 ones like s3n.

[~andrew.wang], [~cnauroth], [~liuml07], what do you think about this?  Is it 
acceptable to overload {{toString}} in this way, to output statistics?  The 
argument seems to be that this easier than using reflection to get the actual 
stream statistics object.


was (Author: cmccabe):
In the past I've written code for Spark that used reflection to make use of 
APIs that may or may not be present in Hadoop.  HBase often does this as well, 
so that it can use multiple versions of Hadoop.  It seems like this wouldn't be 
a lot of code.  Is that feasible in this case?

I just find the argument that we should overload an existing unrelated API to 
output statistics very off-putting.  It's like saying we should override 
hashCode to output the number of times the user called {{seek()}} on the stream.

I guess you could argue that the statistics is part of the stream state, and 
toString is intended to reflect stream state.  But it will result in very long 
output from toString which probably isn't what most existing callers want.  And 
it's not consistent with the way any other hadoop streams work, including other 
s3 ones like s3n.

> add low level counter metrics for S3A; use in read performance tests
> 
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch, 
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch, 
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch, 
> HADOOP-13028-branch-2-010.patch, HADOOP-13028-branch-2-011.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics

2016-05-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280790#comment-15280790
 ] 

Colin Patrick McCabe commented on HADOOP-13065:
---

+1 for version 13.  Thanks, [~liuml07].

> Add a new interface for retrieving FS and FC Statistics
> ---
>
> Key: HADOOP-13065
> URL: https://issues.apache.org/jira/browse/HADOOP-13065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, 
> HADOOP-13065.009.patch, HADOOP-13065.010.patch, HADOOP-13065.011.patch, 
> HADOOP-13065.012.patch, HADOOP-13065.013.patch, HDFS-10175.000.patch, 
> HDFS-10175.001.patch, HDFS-10175.002.patch, HDFS-10175.003.patch, 
> HDFS-10175.004.patch, HDFS-10175.005.patch, HDFS-10175.006.patch, 
> TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests

2016-05-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280782#comment-15280782
 ] 

Colin Patrick McCabe edited comment on HADOOP-13028 at 5/11/16 8:39 PM:


In the past I've written code for Spark that used reflection to make use of 
APIs that may or may not be present in Hadoop.  HBase often does this as well, 
so that it can use multiple versions of Hadoop.  It seems like this wouldn't be 
a lot of code.  Is that feasible in this case?

I just find the argument that we should overload an existing unrelated API to 
output statistics very off-putting.  It's like saying we should override 
hashCode to output the number of times the user called {{seek()}} on the stream.

I guess you could argue that the statistics is part of the stream state, and 
toString is intended to reflect stream state.  But it will result in very long 
output from toString which probably isn't what most existing callers want.  And 
it's not consistent with the way any other hadoop streams work, including other 
s3 ones like s3n.


was (Author: cmccabe):
In the past I've written code for Spark that used reflection to make use of 
APIs that may or may not be present in Hadoop.  HBase often does this as well, 
so that it can use multiple versions of Hadoop.  It seems like this wouldn't be 
a lot of code.  Is that feasible in this case?

I just find the argument that we should overload an existing unrelated API to 
output statistics very off-putting.  It's like saying we should override 
hashCode to output the number of times the user called {{seek()}} on the 
stream.  I also find it concerning that this would be something unique to s3a 
and not present in the toString methods of any other filesystem (including the 
other s3 ones).  It feels like a gross hack.

> add low level counter metrics for S3A; use in read performance tests
> 
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch, 
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch, 
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch, 
> HADOOP-13028-branch-2-010.patch, HADOOP-13028-branch-2-011.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests

2016-05-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280782#comment-15280782
 ] 

Colin Patrick McCabe commented on HADOOP-13028:
---

In the past I've written code for Spark that used reflection to make use of 
APIs that may or may not be present in Hadoop.  HBase often does this as well, 
so that it can use multiple versions of Hadoop.  It seems like this wouldn't be 
a lot of code.  Is that feasible in this case?

I just find the argument that we should overload an existing unrelated API to 
output statistics very off-putting.  It's like saying we should override 
hashCode to output the number of times the user called {{seek()}} on the 
stream.  I also find it concerning that this would be something unique to s3a 
and not present in the toString methods of any other filesystem (including the 
other s3 ones).  It feels like a gross hack.

> add low level counter metrics for S3A; use in read performance tests
> 
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch, 
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch, 
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch, 
> HADOOP-13028-branch-2-010.patch, HADOOP-13028-branch-2-011.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11505) Various native parts use bswap incorrectly and unportably

2016-05-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280588#comment-15280588
 ] 

Colin Patrick McCabe commented on HADOOP-11505:
---

The problematic part of this change was making all the subprojects depend on 
hadoop-common.  It seems like you could avoid doing that by putting all the 
le32to_h, etc. definitions in a standalone header file and having the other 
projects include that file.

> Various native parts use bswap incorrectly and unportably
> -
>
> Key: HADOOP-11505
> URL: https://issues.apache.org/jira/browse/HADOOP-11505
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Alan Burlison
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HADOOP-11505.001.patch, HADOOP-11505.003.patch, 
> HADOOP-11505.004.patch, HADOOP-11505.005.patch, HADOOP-11505.006.patch, 
> HADOOP-11505.007.patch, HADOOP-11505.008.patch
>
>
> hadoop-mapreduce-client-nativetask fails to use x86 optimizations in some 
> cases.  Also, on some alternate, non-x86, non-ARM architectures the generated 
> code is incorrect.  Thanks to Steve Loughran and Edward Nevill for finding 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11505) Various native parts use bswap incorrectly and unportably

2016-05-10 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-11505:
--
Priority: Major  (was: Blocker)

This isn't a blocker, because the affected architectures can fall back on the 
non-native code for accomplishing the same things.

> Various native parts use bswap incorrectly and unportably
> -
>
> Key: HADOOP-11505
> URL: https://issues.apache.org/jira/browse/HADOOP-11505
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Alan Burlison
> Fix For: 3.0.0
>
> Attachments: HADOOP-11505.001.patch, HADOOP-11505.003.patch, 
> HADOOP-11505.004.patch, HADOOP-11505.005.patch, HADOOP-11505.006.patch, 
> HADOOP-11505.007.patch, HADOOP-11505.008.patch
>
>
> hadoop-mapreduce-client-nativetask fails to use x86 optimizations in some 
> cases.  Also, on some alternate, non-x86, non-ARM architectures the generated 
> code is incorrect.  Thanks to Steve Loughran and Edward Nevill for finding 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12975) Add jitter to CachingGetSpaceUsed's thread

2016-05-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278854#comment-15278854
 ] 

Colin Patrick McCabe commented on HADOOP-12975:
---

Thanks, [~eclark].

{code}
169 // add/subtract the jitter.
170 refreshInterval +=
171 ThreadLocalRandom.current()
172  .nextLong(jitter, jitter);
{code}
Hmm, is this a typo?  It seems like this is always going to return exactly 
'jitter' since the 'least' and the 'bound' arguments are the same?  That seems 
to defeat the point of randomization. 
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadLocalRandom.html#nextLong(long,%20long)

{code}
126 if (configuration == null) {
127   return DEFAULT_JITTER;
128 }
{code}
Can we throw an exception in {{GetSpaceUsed#build}} if {{conf == null}}?  It's 
a weird special case to have no {{Configuration}} object, and I'm not sure why 
we'd ever want to do that.  Then this function could just be {{return 
this.conf.getLong(JITTER_KEY, DEFAULT_JITTER);}}.

> Add jitter to CachingGetSpaceUsed's thread
> --
>
> Key: HADOOP-12975
> URL: https://issues.apache.org/jira/browse/HADOOP-12975
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12975v0.patch, HADOOP-12975v1.patch, 
> HADOOP-12975v2.patch, HADOOP-12975v3.patch, HADOOP-12975v4.patch, 
> HADOOP-12975v5.patch
>
>
> Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike. We should add some 
> jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics

2016-05-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278833#comment-15278833
 ] 

Colin Patrick McCabe edited comment on HADOOP-13065 at 5/10/16 8:34 PM:


Thanks, [~liuml07].  {{DFSOpsCountStatistics}} is a nice implementation.  It's 
also nice to have this for webhdfs as well.

{code}
156  @Override
157   public Long getLong(String key) {
158 final OpType type = OpType.fromSymbol(key);
159 return type == null ? 0L : opsCount.get(type).get();
160   }
{code}
I think this should return null in the case where type == null, right?  
Indicating that there is no such statistic.

{code}
159 storageStatistics = (DFSOpsCountStatistics) 
GlobalStorageStatistics.INSTANCE
160 .put(DFSOpsCountStatistics.NAME,
161   new StorageStatisticsProvider() {
162 @Override
163 public StorageStatistics provide() {
164   return new DFSOpsCountStatistics();
165 }
166   });
{code}
Hmm, I wonder if these StorageStatistics objects should be per-FS-instance 
rather than per-class?  I guess let's do that in a follow-on, though, after 
this gets committed.

+1 for HADOOP-13065.012.patch once the null thing is fixed


was (Author: cmccabe):
Thanks, [~liuml07].  {{DFSOpsCountStatistics}} is a nice implementation.  It's 
also nice to have this for webhdfs as well.

{code}
156  @Override
157   public Long getLong(String key) {
158 final OpType type = OpType.fromSymbol(key);
159 return type == null ? 0L : opsCount.get(type).get();
160   }
{code}
I think this should return null in the case where type == null, right?  
Indicating that there is no such statistic.

{code}
159 storageStatistics = (DFSOpsCountStatistics) 
GlobalStorageStatistics.INSTANCE
160 .put(DFSOpsCountStatistics.NAME,
161   new StorageStatisticsProvider() {
162 @Override
163 public StorageStatistics provide() {
164   return new DFSOpsCountStatistics();
165 }
166   });
{code}
Hmm, I wonder if these StorageStatistics objects should be per-FS-instance 
rather than per-class?  I guess let's do that in a follow-on, though, after 
this gets committed.

+1 once the null thing is fixed

> Add a new interface for retrieving FS and FC Statistics
> ---
>
> Key: HADOOP-13065
> URL: https://issues.apache.org/jira/browse/HADOOP-13065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, 
> HADOOP-13065.009.patch, HADOOP-13065.010.patch, HADOOP-13065.011.patch, 
> HADOOP-13065.012.patch, HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics

2016-05-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278833#comment-15278833
 ] 

Colin Patrick McCabe commented on HADOOP-13065:
---

Thanks, [~liuml07].  {{DFSOpsCountStatistics}} is a nice implementation.  It's 
also nice to have this for webhdfs as well.

{code}
156  @Override
157   public Long getLong(String key) {
158 final OpType type = OpType.fromSymbol(key);
159 return type == null ? 0L : opsCount.get(type).get();
160   }
{code}
I think this should return null in the case where type == null, right?  
Indicating that there is no such statistic.

{code}
159 storageStatistics = (DFSOpsCountStatistics) 
GlobalStorageStatistics.INSTANCE
160 .put(DFSOpsCountStatistics.NAME,
161   new StorageStatisticsProvider() {
162 @Override
163 public StorageStatistics provide() {
164   return new DFSOpsCountStatistics();
165 }
166   });
{code}
Hmm, I wonder if these StorageStatistics objects should be per-FS-instance 
rather than per-class?  I guess let's do that in a follow-on, though, after 
this gets committed.

+1 once the null thing is fixed

> Add a new interface for retrieving FS and FC Statistics
> ---
>
> Key: HADOOP-13065
> URL: https://issues.apache.org/jira/browse/HADOOP-13065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, 
> HADOOP-13065.009.patch, HADOOP-13065.010.patch, HADOOP-13065.011.patch, 
> HADOOP-13065.012.patch, HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests

2016-05-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278800#comment-15278800
 ] 

Colin Patrick McCabe commented on HADOOP-13028:
---

bq. Patrick: regarding fs.s3a.readahead.range versus calling it 
fs.s3a.readahead.default, I think "default" could be a bit confusing too. How 
about I make it clear that the if setReadahead() is set, then it supercedes any 
previous value?

Sure.

bq. I absolutely need that printing in there, otherwise the value of this patch 
is significantly reduced. If you want me to add a line like "WARNING: UNSTABLE" 
or something to that string value, I'm happy to do so. Or the output is 
published in a way that is deliberately hard to parse by machine but which we 
humans can read. But without that information, we can't so easily tell which

Perhaps I'm missing something, but why not just do this in 
{{S3AInstrumentation#InputStreamStatistics#toString}}?  I don't see why this is 
"absolutely needed" in {{S3AInputStream#toString}}.

> add low level counter metrics for S3A; use in read performance tests
> 
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch, 
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch, 
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch, 
> HADOOP-13028-branch-2-010.patch, HADOOP-13028-branch-2-011.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests

2016-05-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276778#comment-15276778
 ] 

Colin Patrick McCabe commented on HADOOP-13028:
---

bq. I think this is OK. The whole close method is synchronized, so we won't 
have two threads concurrently doing the actual close. Almost all other accesses 
of closed are within synchronized methods too. It's marked volatile to help 
with one unsynchronized access from readFully, calling into checkNotClosed. 
That's only a read, not an update, so volatile is sufficient.

Thanks for the explanation.  I missed the interaction between {{synchronized}} 
and the assignment.  Suggest adding a comment to the assignment in {{close()}} 
explaining why this is atomic, or simply using AtomicBoolean to future-proof 
this against later code changes.

bq. I'd like to keep \[the toString changes\]. It's very convenient for 
logging. TestS3AInputStreamPerformance uses it for both logging output and 
detailed assertion messages. It's poor practice to rely on a Java object's 
toString output as a stable, parseable format. This is something that I'd like 
to see clarified in our compatibility documentation.

The problem is, this is not consistent with how {{toString}} operates in other 
FS streams.  We also don't have anything in our compatibility documentation 
stating that the output of {{toString}} is not a stable, parseable format.  
We've had many, many JIRAs to "make toString act like some previous behavior" 
for various Hadoop classes.  I think we need to accept that currently the 
stream's {{toString}} method is viewed as a public, stable API whether we like 
it or not.

How about just adding this information to the {{toString}} method of the stream 
statistics object?  That makes more sense anyway.

> add low level counter metrics for S3A; use in read performance tests
> 
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch, 
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch, 
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch, 
> HADOOP-13028-branch-2-010.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests

2016-05-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274641#comment-15274641
 ] 

Colin Patrick McCabe commented on HADOOP-13028:
---

{code}
926 
927   fs.s3a.readahead.range
928   65536
929   Bytes to read ahead during a seek() before closing and
930   re-opening the S3 HTTP connection.
931 
{code}
Hmm, should this be {{fs.s3a.readahead.default}}?  It seems like this is the 
default if the user doesn't call {{FSDataInputStream#setReadahead}},

{{S3AInputStream#closed}}: it seems like this should be an {{AtomicBoolean}}.  
Otherwise two threads could both enter this code block, right?
{code}
362 if (!closed) {
363   closed = true;
364   super.close();
365   closeStream("close() operation", this.contentLength);
366   streamStatistics.close();
367 }
{code}

{code}
  public S3AInstrumentation.InputStreamStatistics getStreamStatistics() {
{code}
Maybe should be called {{getS3StreamStatistics}}, reflecting the fact that this 
API is s3-specific?

Is it really necessary to put statistics information into the {{toString}} 
methods of the streams?  It seems like this could lead to compatibility woes, 
and we have the API described above to provide this information anyway.

> add low level counter metrics for S3A; use in read performance tests
> 
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch, 
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch, 
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch, 
> HADOOP-13028-branch-2-010.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12975) Add jitter to CachingGetSpaceUsed's thread

2016-05-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274625#comment-15274625
 ] 

Colin Patrick McCabe commented on HADOOP-12975:
---

Does this need a rebase?

> Add jitter to CachingGetSpaceUsed's thread
> --
>
> Key: HADOOP-12975
> URL: https://issues.apache.org/jira/browse/HADOOP-12975
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12975v0.patch, HADOOP-12975v1.patch, 
> HADOOP-12975v2.patch, HADOOP-12975v3.patch
>
>
> Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike. We should add some 
> jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics

2016-05-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274617#comment-15274617
 ] 

Colin Patrick McCabe commented on HADOOP-13065:
---

bq. One quick question is that, some of the storage statistics classes (e.g. 
GlobalStorageStatistics are annotated as Stable, do we have to be a bit more 
conservative by making them Unstable before ultimately removing the Statistics?

Good question.  I think that what would happen is that the old API would become 
deprecated in branch-2, and removed in branch-3.  There isn't any need to 
change the annotation since we don't plan to modify the interface, just remove 
it.

bq. As follow-on work, 1. We can move the rack-awareness read bytes to a 
separate storage statistics as it's only used by HDFS, and 2. We can remove 
Statistics API, but keep the thread local implementation in 
FileSystemStorageStatistics class.

That makes sense.  One thing that we've talked about doing in the past is 
moving these statistics to a separate java file, so that they could be used in 
both FileContext and FileSystem.  Maybe we could call them something like 
ThreadLocalFsStatistics or something?

> Add a new interface for retrieving FS and FC Statistics
> ---
>
> Key: HADOOP-13065
> URL: https://issues.apache.org/jira/browse/HADOOP-13065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, 
> HADOOP-13065.009.patch, HADOOP-13065.010.patch, HDFS-10175.000.patch, 
> HDFS-10175.001.patch, HDFS-10175.002.patch, HDFS-10175.003.patch, 
> HDFS-10175.004.patch, HDFS-10175.005.patch, HDFS-10175.006.patch, 
> TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics

2016-05-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273605#comment-15273605
 ] 

Colin Patrick McCabe commented on HADOOP-13065:
---

Thanks for the reviews.

bq. in FileSystem.getStatistics(), For performance, you could try using 
ConcurrentMap for the map, and only if it is not present create the objects and 
call putIfAbsent() (or a synchronized block create and update the maps (with a 
second lookup there to eliminate the small race condition). This will eliminate 
the sync point on a simple lookup when the entry exists.

Hmm.  I don't think that we really need to optimize this function.  When using 
the new API, the only time this function gets called is when a new FileSystem 
object is created, which should be very rare.

bq. For testing a may to reset/remove an entry could be handy.

We do have some tests that zero out the existing statistics objects.  I'm not 
sure if removing the entry really gets us more coverage than we have now, since 
we know that it was created by this code path (therefore the code path was 
tested).

bq. That's said, we can firstly deprecate the FileSystem#getStatistics()?

Agree.

> Add a new interface for retrieving FS and FC Statistics
> ---
>
> Key: HADOOP-13065
> URL: https://issues.apache.org/jira/browse/HADOOP-13065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, 
> HADOOP-13065.009.patch, HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-05-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272936#comment-15272936
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

Thanks, guys.  Sorry for the delay in reviewing.  We've been busy.

{{CodecUtil.java}}: there are a LOT of functions here for creating 
{{RawErasureEncoder}} objects.
We've got:
{code}
createRSRawEncoder(Configuration conf, int numDataUnits, int numParityUnits, 
String codec)
createRSRawEncoder(Configuration conf, int numDataUnit, int numParityUnit)
createRSRawEncoder(Configuration conf, String codec, ErasureCoderOptions 
coderOptions)
createRSRawEncoder(Configuration conf, ErasureCoderOptions coderOptions)
createXORRawEncoder(Configuration conf, ErasureCoderOptions coderOptions)
createXORRawEncoder(Configuration conf, int numDataUnits, int numParityUnits)
createRawEncoder(Configuration conf, String rawCoderFactoryKey, 
ErasureCoderOptions coderOptions)
{code}

Plus a similar number of functions for creating decoders.  Why do we have to 
have so many functions?  Surely the codec, numParityUnits, numDataUnits, 
whether it is XOR or not, etc. etc. should just be included in 
ErasureCoderOptions.  Then we could just have one function:
{code}
createRawEncoder(Configuration conf, ErasureCoderOptions coderOptions)
{code}

On a related note, why does each particular type of encoder need its own 
factory?  It seems like we just need a static function for each encoder type 
that takes a Configuration and ErasureCoderOptions, and we're good to go.  We 
can locate these static functions via reflection.

{code}
  protected void doDecode(DecodingState decodingState, byte[][] inputs,
  int[] inputOffsets, int[] erasedIndexes,
  byte[][] outputs, int[] outputOffsets) {
{code}
Can we just include the inputs, inputOffsets, erasedIndexes, outputs, 
outputOffsets in {{DecodingState}}?

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, 
> HADOOP-13010-v3.patch, HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13079) dfs -ls -q prints non-printable characters

2016-05-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272802#comment-15272802
 ] 

Colin Patrick McCabe commented on HADOOP-13079:
---

bq. It's not a security bug for the reasons you think it's a security bug. 
After all, wc, find, du, ... tons of other UNIX commands will happily print out 
terminal escape sequences with no option to turn them off. It is, however, 
problematic for traditional ftpd implementations since it's a great way to 
inject buffer overflows and then get root on a remote server.

This behavior is exploitable.  That makes it a security bug, even if lots of 
traditional UNIX commands have it.

Just because a behavior is traditional doesn't mean it's right.  There was a 
time when UNIX programs used {{gets()}} everywhere.  When the world became a 
less trusting place, they had to be fixed not to do that.  We should understand 
the motivations behind historical decisions before blindly copying them.

bq. ... and my answer is the same as it was almost a decade ago, in some HDFS 
JIRA somewhere, where a related topic came up before: HDFS would be better 
served by having a limit on what consists of a legal file and directory name. 
With an unlimited namespace, it's impossible to test against and impossible to 
protect every scenario in which oddball characters show up. What's legal in one 
locale may not be legal in another.

That's a very good suggestion.  I think we should tackle that for Hadoop 3.

bq. Also, are you prepared to file a CVE for every single time Hadoop prints 
out a directory or file name to the screen? There are probably hundreds if not 
thousands of places, obvious ones like 'fs -count' and less obvious ones like 
'yarn logs'. This is a 'tilting at windmills' problem. It is MUCH better to 
have ls blow up than be taken by surprise by something else later on.

The problem is, {{ls}} isn't necessarily going to "blow up," just display 
something odd, or even cause your xterm to run arbitrary code by abusing escape 
sequences.

> dfs -ls -q prints non-printable characters
> --
>
> Key: HADOOP-13079
> URL: https://issues.apache.org/jira/browse/HADOOP-13079
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>
> Add option {{-q}} to "hdfs dfs -ls" to print non-printable characters as "?". 
> Non-printable characters are defined by 
> [isprint(3)|http://linux.die.net/man/3/isprint] according to the current 
> locale.
> Default to {{-q}} behavior on terminal; otherwise, print raw characters. See 
> the difference in these 2 command lines:
> * {{hadoop fs -ls /dir}}
> * {{hadoop fs -ls /dir | od -c}}
> In C, {{isatty(STDOUT_FILENO)}} is used to find out whether the output is a 
> terminal. Since Java doesn't have {{isatty}}, I will use JNI to call C 
> {{isatty()}} because the closest test {{System.console() == null}} does not 
> work in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13079) dfs -ls -q prints non-printable characters

2016-05-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271848#comment-15271848
 ] 

Colin Patrick McCabe commented on HADOOP-13079:
---

bq. a) It's not standardized behavior amongst all of the platforms that Apache 
Hadoop runs

Linux, OpenBSD, FreeBSD, and OS X pick the behavior of hiding control 
characters in {{ls}} by default.  That may not be "all the platforms that 
Apache Hadoop runs on," but it's certainly the vast majority of real-world 
deployments.  The remaining important platform, Windows, doesn't deal with 
terminals and control characters in quite the same way, so is probably not 
vulnerable in any case.

In any case, the fact that the behavior isn't standardized is not a valid 
argument either way.  Clearly Hadoop needs to pick one behavior or the other.  
Lack of standardization doesn't dictate that we have to pick one behavior or 
the other.  And certainly it doesn't dictate that we should pick an unpopular 
and surprising behavior that almost nobody has experience with.

bq. b) It's not expected behavior relative to the rest of Apache Hadoop

The fact that one component has a security bug doesn't dictate that the other 
components also need to have the same security bug.  This is like arguing that 
we can't fix a buffer overflow in one component because then it wouldn't match 
all the other buffer-overflowable components.

bq. c) It's not feasible to actually make it expected behavior compared to the 
rest of Apache Hadoop given the proliferation of places where raw file and 
directory names are printed to the console

The only places we've discussed here are ls and fsck.  Perhaps there are more, 
but it hardly seems infeasible to change them based on what we've talked about 
so far.  Perhaps log files are also an issue, but only for people who tail the 
log file of the server.  And to reiterate, a security flaw in X doesn't mean we 
should reproduce the same security flaw in Y.

At the end of the day, this is a security vulnerability and it needs to be 
fixed.  I asked you before: "Should the filename be able use control characters 
to hijack the admin's GNU screen session and execute arbitrary code? I would 
say no, what do you say?"  I would repeat the same question again.

I understand that you have a personal preference for running without {{\-q}}.  
However, it is not constructive to -1 a patch fixing a security vulnerability 
without suggesting an alternate way of fixing that vulnerability.  If this 
stays unfixed, it will probably get a CVE number.

> dfs -ls -q prints non-printable characters
> --
>
> Key: HADOOP-13079
> URL: https://issues.apache.org/jira/browse/HADOOP-13079
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>
> Add option {{-q}} to "hdfs dfs -ls" to print non-printable characters as "?". 
> Non-printable characters are defined by 
> [isprint(3)|http://linux.die.net/man/3/isprint] according to the current 
> locale.
> Default to {{-q}} behavior on terminal; otherwise, print raw characters. See 
> the difference in these 2 command lines:
> * {{hadoop fs -ls /dir}}
> * {{hadoop fs -ls /dir | od -c}}
> In C, {{isatty(STDOUT_FILENO)}} is used to find out whether the output is a 
> terminal. Since Java doesn't have {{isatty}}, I will use JNI to call C 
> {{isatty()}} because the closest test {{System.console() == null}} does not 
> work in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13079) Add -q to fs -ls to print non-printable characters

2016-05-03 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269527#comment-15269527
 ] 

Colin Patrick McCabe commented on HADOOP-13079:
---

bq. Yup, I can't think of why -q should be the default either... but more 
importantly, neither could POSIX to the point that it demanded the standard 
have -q be the default.

Please do not misquote what I said.  I was arguing that echoing control 
characters to the terminal should not be the default behavior.  You are arguing 
the opposite.

bq. ... until such a point that they print the filename to the screen to show 
what files are being processed. At which point this change has accomplished 
absolutely nothing. Changing ls is security theater.

There are a lot of scripts that interact with HDFS via FsShell.  These scripts 
will never "print the filename to the screen" or if they do, it will be a 
filename that they got from {{ls}} itself which does not contain control 
characters.

I could come up with examples of how this is helpful all day if needed.  Here's 
another one: Some sysadmin logs in and does an {{hadoop fs -ls}} of a directory 
created by {{\$BADGUY}}. Should the filename be able use control characters to 
hijack the admin's GNU screen session and execute arbitrary code?  I would say 
no, what do you say?

bq. Are we going to change cat too?

Most system administrators will not {{cat}} a file without checking what type 
it is.  It is well-known that catting an unknown file could mess up the 
terminal.  On the other hand, most system administrators do not think that 
running {{ls}} on a directory could be a security risk.  Linux and other well 
known operating systems also do not protect users from this, so there are no 
pre-existing expectations of protection.

bq. Then stop bringing up (traditional) UNIX if you feel it isn't relevant and 
especially when you've used the term incorrectly.

There are a huge number of sysadmins who grew up with the GNU tools, which do 
have the behavior we're describing here.  It's a powerful argument for 
implementing that behavior.  When you add the fact that it fixes security 
vulnerabilities, it's an extremely compelling argument.

I think it's clear that this change does have a big positive effect in many 
scenarios, does fix real-world security flaws, and does accord with the 
expectations of most system administrators.  That's three powerful reasons to 
do it.  I can find no valid counter-argument for any of these reasons anywhere 
in these comments.

> Add -q to fs -ls to print non-printable characters
> --
>
> Key: HADOOP-13079
> URL: https://issues.apache.org/jira/browse/HADOOP-13079
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: John Zhuge
>Assignee: John Zhuge
>  Labels: supportability
>
> Add option {{-q}} to "hdfs dfs -ls" to print non-printable characters as "?". 
> Non-printable characters are defined by 
> [isprint(3)|http://linux.die.net/man/3/isprint] according to the current 
> locale.
> Default to {{-q}} behavior on terminal; otherwise, print raw characters. See 
> the difference in these 2 command lines:
> * {{hadoop fs -ls /dir}}
> * {{hadoop fs -ls /dir | od -c}}
> In C, {{isatty(STDOUT_FILENO)}} is used to find out whether the output is a 
> terminal. Since Java doesn't have {{isatty}}, I will use JNI to call C 
> {{isatty()}} because the closest test {{System.console() == null}} does not 
> work in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13079) Add -q to fs -ls to print non-printable characters

2016-05-03 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269375#comment-15269375
 ] 

Colin Patrick McCabe commented on HADOOP-13079:
---

OK, so Linux is technically a UNIX-like system rather than a licensee of the 
UNIX trademark.  I don't feel that this is relevant to the discussion here.  I 
feel like you are just being pedantic.  Linux's behavior is still the one that 
most people compare our behavior to, whether we like it or not.  And Linux's 
behavior is to hide control characters by default in ls.

More importantly, Linux's behavior makes more sense than the other behavior you 
are suggesting.  Dumping control characters out on an interactive terminal is a 
security vulnerability as well as a giant annoyance.  I can't think of a single 
reason why we would want this to be the default.

> Add -q to fs -ls to print non-printable characters
> --
>
> Key: HADOOP-13079
> URL: https://issues.apache.org/jira/browse/HADOOP-13079
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: John Zhuge
>Assignee: John Zhuge
>  Labels: supportability
>
> Add option {{-q}} to "hdfs dfs -ls" to print non-printable characters as "?". 
> Non-printable characters are defined by 
> [isprint(3)|http://linux.die.net/man/3/isprint] according to the current 
> locale.
> Default to {{-q}} behavior on terminal; otherwise, print raw characters. See 
> the difference in these 2 command lines:
> * {{hadoop fs -ls /dir}}
> * {{hadoop fs -ls /dir | od -c}}
> In C, {{isatty(STDOUT_FILENO)}} is used to find out whether the output is a 
> terminal. Since Java doesn't have {{isatty}}, I will use JNI to call C 
> {{isatty()}} because the closest test {{System.console() == null}} does not 
> work in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13079) Add -q to fs -ls to print non-printable characters

2016-05-03 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268981#comment-15268981
 ] 

Colin Patrick McCabe commented on HADOOP-13079:
---

Thank you for the background information.  I wasn't aware that the default of 
suppressing non-printing characters was "optional" according to POSIX.

I think the important thing is that we've established that:
* Suppressing non-printing characters by default fixes several serious security 
vulnerabilties, including some that have CVEs,
* This suppression behavior is explicitly allowed by POSIX,
* The most popular UNIX system on Earth, Linux, implements this behavior, so 
nobody will be surprised by it.

bq. Essentially interactive sessions with stdin redirected \[falsely show up as 
non-interactive from Java\]

I guess my concern about adding a JNI dependency here is that it will make 
things too nondeterministic.  I've seen too many clusters where JNI was 
improperly configured.

> Add -q to fs -ls to print non-printable characters
> --
>
> Key: HADOOP-13079
> URL: https://issues.apache.org/jira/browse/HADOOP-13079
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: John Zhuge
>Assignee: John Zhuge
>  Labels: supportability
>
> Add option {{-q}} to "hdfs dfs -ls" to print non-printable characters as "?". 
> Non-printable characters are defined by 
> [isprint(3)|http://linux.die.net/man/3/isprint] according to the current 
> locale.
> Default to {{-q}} behavior on terminal; otherwise, print raw characters. See 
> the difference in these 2 command lines:
> * {{hadoop fs -ls /dir}}
> * {{hadoop fs -ls /dir | od -c}}
> In C, {{isatty(STDOUT_FILENO)}} is used to find out whether the output is a 
> terminal. Since Java doesn't have {{isatty}}, I will use JNI to call C 
> {{isatty()}} because the closest test {{System.console() == null}} does not 
> work in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics

2016-05-03 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268961#comment-15268961
 ] 

Colin Patrick McCabe commented on HADOOP-13065:
---

Interesting post... I wasn't aware that AtomicLong etc. had performance issues.

However, I don't think we need an API for updating metrics.  We only need an 
API for _reading_ metrics.  The current read API in this patch supports reading 
primitive longs, which should work well with {{AtomicLongFieldUpdater}}, or 
whatever else we want to use.

> Add a new interface for retrieving FS and FC Statistics
> ---
>
> Key: HADOOP-13065
> URL: https://issues.apache.org/jira/browse/HADOOP-13065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HADOOP-13065-007.patch, HDFS-10175.000.patch, 
> HDFS-10175.001.patch, HDFS-10175.002.patch, HDFS-10175.003.patch, 
> HDFS-10175.004.patch, HDFS-10175.005.patch, HDFS-10175.006.patch, 
> TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13079) Add -q to fs -ls to print non-printable characters

2016-05-02 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266989#comment-15266989
 ] 

Colin Patrick McCabe commented on HADOOP-13079:
---

bq. No way should -q be the default under any circumstances. That is extremely 
surprising behavior that will definitely break stuff.

It's not surprising, because it matches the traditional UNIX / Linux behavior.  
In Linux, {{/bin/ls}} will not print control characters by default.  you must 
pass the {{--show-control-characters}} option in order to see them.  From the 
man page:

{code}
   --show-control-chars
  show non graphic characters as-is (default unless program is 'ls' 
and output is a terminal)
{code}

{{ls}} blasting raw control characters into an interactive terminal is a very 
bad idea.  It leads to some very serious security vulnerabilities because 
commonly used software like {{xterm}}, {{GNU screen}}, {{tmux}} and so forth 
interpret control characters.  Using control characters, you can convince these 
pieces of software to execute arbitrary code.  See 
http://marc.info/?l=bugtraq&m=104612710031920&q=p3 and 
https://www.proteansec.com/linux/blast-past-executing-code-terminal-emulators-via-escape-sequences/
  There are even CVEs for some of these issues.

We should make the default opt-in for printing control characters in our next 
compatibility-breaking release (Hadoop 3.x).

bq. In C, isatty(STDOUT_FILENO) is used to find out whether the output is a 
terminal. Since Java doesn't have isatty, I will use JNI to call C isatty() 
because the closest test System.console() == null does not work in some cases.

It would really be nice if we could determine this without using JNI, because 
it's often not available.  Under what conditions does the {{System.console() == 
null}} check not work?  The only case I was able to find in a quick Google 
search was inside an eclipse console.  That seems like a case where the 
security issues would not be a concern, because it's a debugging environment.  
Are there other cases where the non-JNI check would fail?

> Add -q to fs -ls to print non-printable characters
> --
>
> Key: HADOOP-13079
> URL: https://issues.apache.org/jira/browse/HADOOP-13079
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: John Zhuge
>Assignee: John Zhuge
>  Labels: supportability
>
> Add option {{-q}} to "hdfs dfs -ls" to print non-printable characters as "?". 
> Non-printable characters are defined by 
> [isprint(3)|http://linux.die.net/man/3/isprint] according to the current 
> locale.
> Default to {{-q}} behavior on terminal; otherwise, print raw characters. See 
> the difference in these 2 command lines:
> * {{hadoop fs -ls /dir}}
> * {{hadoop fs -ls /dir | od -c}}
> In C, {{isatty(STDOUT_FILENO)}} is used to find out whether the output is a 
> terminal. Since Java doesn't have {{isatty}}, I will use JNI to call C 
> {{isatty()}} because the closest test {{System.console() == null}} does not 
> work in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13072) WindowsGetSpaceUsed constructor should be public

2016-05-02 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-13072:
--
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

> WindowsGetSpaceUsed constructor should be public
> 
>
> Key: HADOOP-13072
> URL: https://issues.apache.org/jira/browse/HADOOP-13072
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>  Labels: windows
> Fix For: 2.8.0
>
> Attachments: HADOOP-13072-01.patch, HADOOP-13072-02.patch
>
>
> WindowsGetSpaceUsed constructor should be made public.
> Otherwise building using builder will not work.
> {noformat}2016-04-29 12:49:37,455 [Thread-108] WARN  fs.GetSpaceUsed$Builder 
> (GetSpaceUsed.java:build(127)) - Doesn't look like the class class 
> org.apache.hadoop.fs.WindowsGetSpaceUsed have the needed constructor
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.fs.WindowsGetSpaceUsed.(org.apache.hadoop.fs.GetSpaceUsed$Builder)
>   at java.lang.Class.getConstructor0(Unknown Source)
>   at java.lang.Class.getConstructor(Unknown Source)
>   at 
> org.apache.hadoop.fs.GetSpaceUsed$Builder.build(GetSpaceUsed.java:118)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:165)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:915)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:907)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:413)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13072) WindowsGetSpaceUsed constructor should be public

2016-05-02 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266874#comment-15266874
 ] 

Colin Patrick McCabe commented on HADOOP-13072:
---

+1.  Thanks, [~vinayrpet].

> WindowsGetSpaceUsed constructor should be public
> 
>
> Key: HADOOP-13072
> URL: https://issues.apache.org/jira/browse/HADOOP-13072
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>  Labels: windows
> Attachments: HADOOP-13072-01.patch, HADOOP-13072-02.patch
>
>
> WindowsGetSpaceUsed constructor should be made public.
> Otherwise building using builder will not work.
> {noformat}2016-04-29 12:49:37,455 [Thread-108] WARN  fs.GetSpaceUsed$Builder 
> (GetSpaceUsed.java:build(127)) - Doesn't look like the class class 
> org.apache.hadoop.fs.WindowsGetSpaceUsed have the needed constructor
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.fs.WindowsGetSpaceUsed.(org.apache.hadoop.fs.GetSpaceUsed$Builder)
>   at java.lang.Class.getConstructor0(Unknown Source)
>   at java.lang.Class.getConstructor(Unknown Source)
>   at 
> org.apache.hadoop.fs.GetSpaceUsed$Builder.build(GetSpaceUsed.java:118)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:165)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:915)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:907)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:413)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13072) WindowsGetSpaceUsed constructor should be public

2016-04-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264909#comment-15264909
 ] 

Colin Patrick McCabe commented on HADOOP-13072:
---

Thanks, [~vinayrpet] and [~steve_l].

+1 once the line is trimmed to 80 characters and jenkins has run.

> WindowsGetSpaceUsed constructor should be public
> 
>
> Key: HADOOP-13072
> URL: https://issues.apache.org/jira/browse/HADOOP-13072
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>  Labels: windows
> Attachments: HADOOP-13072-01.patch
>
>
> WindowsGetSpaceUsed constructor should be made public.
> Otherwise building using builder will not work.
> {noformat}2016-04-29 12:49:37,455 [Thread-108] WARN  fs.GetSpaceUsed$Builder 
> (GetSpaceUsed.java:build(127)) - Doesn't look like the class class 
> org.apache.hadoop.fs.WindowsGetSpaceUsed have the needed constructor
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.fs.WindowsGetSpaceUsed.(org.apache.hadoop.fs.GetSpaceUsed$Builder)
>   at java.lang.Class.getConstructor0(Unknown Source)
>   at java.lang.Class.getConstructor(Unknown Source)
>   at 
> org.apache.hadoop.fs.GetSpaceUsed$Builder.build(GetSpaceUsed.java:118)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:165)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:915)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:907)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:413)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests

2016-04-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261354#comment-15261354
 ] 

Colin Patrick McCabe commented on HADOOP-13028:
---

Thanks, [~steve_l].  I withdraw my -1, provided we don't add any new public 
APIs in this patch.

I'm out tomorrow and Friday but hopefully I'll have a chance to review it next 
week (if someone doesn't review it first).

> add low level counter metrics for S3A; use in read performance tests
> 
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics

2016-04-27 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-13065:
--
Attachment: HADOOP-13065-007.patch

> Add a new interface for retrieving FS and FC Statistics
> ---
>
> Key: HADOOP-13065
> URL: https://issues.apache.org/jira/browse/HADOOP-13065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HADOOP-13065-007.patch, HDFS-10175.000.patch, 
> HDFS-10175.001.patch, HDFS-10175.002.patch, HDFS-10175.003.patch, 
> HDFS-10175.004.patch, HDFS-10175.005.patch, HDFS-10175.006.patch, 
> TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics

2016-04-27 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-13065:
--
Summary: Add a new interface for retrieving FS and FC Statistics  (was: add 
per-operation stats to FileSystem.Statistics)

> Add a new interface for retrieving FS and FC Statistics
> ---
>
> Key: HADOOP-13065
> URL: https://issues.apache.org/jira/browse/HADOOP-13065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-13065) add per-operation stats to FileSystem.Statistics

2016-04-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261118#comment-15261118
 ] 

Colin Patrick McCabe commented on HADOOP-13065:
---

Thanks, [~liuml07].

Based on the discussion today, it sounds like we would like to have both global 
statistics per FS class, and per-instance statistics for an individual FS or FC 
instance.  The rationale for this is that in some cases we might want to 
differentiate between, say, the stats when talking to one s3 bucket, and 
another s3 bucket.  Or another example is the stats talking to one HDFS FS 
versus another HDFS FS (if we are using federation, or just multiple HDFS 
instances).

We talked a bit about metrics2, but there were several things that made it not 
a good fit for this statistics interface.  One issue is that metrics2 assumes 
that statistics are permanent once created.  Effectively, it keeps them around 
until the JVM terminates.  metrics2 also tends to use a fair amount of memory 
and require a fair amount of boilerplate code compared to other solutions.  
Finally, because it is global, it can't do per-instance stats very effectively.

It would be nice for the new statistics interface to provide the same stats 
which are currently provided by FileSystem#Statistics.  This would allow us to 
deprecate and eventually remove FileSystem#Statistics as a public interface 
(although we might keep the implementation).  This could be done only in a new 
release of Hadoop, of course.  We also talked about the benefits of providing 
an iterator over all statistics rather than a map of all statistics.  
Relatedly, we talked about the desire to have a new interface that was abstract 
enough to accommodate new, more efficient implementations in the future.

For now, the new interface will deal with per-FS stats, but not per-stream 
ones.  We should revisit per-stream statistics later.

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HADOOP-13065
> URL: https://issues.apache.org/jira/browse/HADOOP-13065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258713#comment-15258713
 ] 

Colin Patrick McCabe commented on HADOOP-13028:
---

It looks really good, [~steve_l].

Just to avoid misunderstandings, I'll drop a -1 here until we finish discussing 
what the interface should be... 
I look forward to giving this a review as soon as we figure that out.

> add low level counter metrics for S3A; use in read performance tests
> 
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12949) Add metrics and HTrace to the s3a connector

2016-04-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257083#comment-15257083
 ] 

Colin Patrick McCabe commented on HADOOP-12949:
---

Since HADOOP-13028 is focusing on metrics for s3a, let's focus this JIRA on 
just HTrace integration.  It's a good idea to read up on HDFS-10175 as well, 
since we've been discussing what interface(s) we'd like the FS metrics to have 
in the future there.

> Add metrics and HTrace to the s3a connector
> ---
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Madhawa Gunasekara
>Assignee: Madhawa Gunasekara
>
> Hi All, 
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly 
> important in Hadoop. But we don't have distributed tracing for these yet. It 
> would be interesting to add distributed tracing here. It would enable 
> collecting really interesting data like probability distributions of PUT and 
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this 
> Thanks,
> Madhawa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12949) Add HTrace to the s3a connector

2016-04-25 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-12949:
--
Summary: Add HTrace to the s3a connector  (was: Add metrics and HTrace to 
the s3a connector)

> Add HTrace to the s3a connector
> ---
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Madhawa Gunasekara
>Assignee: Madhawa Gunasekara
>
> Hi All, 
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly 
> important in Hadoop. But we don't have distributed tracing for these yet. It 
> would be interesting to add distributed tracing here. It would enable 
> collecting really interesting data like probability distributions of PUT and 
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this 
> Thanks,
> Madhawa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-13028) add counter and timer metrics for S3A HTTP & low-level operations

2016-04-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254358#comment-15254358
 ] 

Colin Patrick McCabe commented on HADOOP-13028:
---

Hi [~steve_l],

This is a really interesting idea.  I think this ties in with some of the 
discussions we've been having on HDFS-10175 with adding a way to fetch 
arbitrary statistics from FileSystem (and FileContext) instances.

Basically, HDFS-10175 provides a way for MR to enumerate all the statistics and 
their values.  It also provides interfaces for finding just one statistic, of 
course.  This would also enable the use of those statistics in unit tests, 
since the stats could be per-FS rather than global per type.

> add counter and timer metrics for S3A HTTP & low-level operations
> -
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, metrics
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-04-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250781#comment-15250781
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

bq. Thanks for the discussions Colin / Kai. The "dummy" coders are for tests 
only. Either name sounds fine to me. No-op is a more accurate technical 
description, while dummy better states the purpose (and therefore prevent users 
from actually using it). Maybe we can leave that one open and move this 
refactor forward.

Yeah, I don't have a strong opinion on "Dummy" versus "NoOp."  Either name 
could work.  It also seems reasonable to let users configure this to diagnose 
issues in the field.  So it makes sense to keep it in src/ rather than test/.

bq. The suggested way and sample codes look great. It consolidates 
configurations and coder options together and has an advantage that the coder 
options will also be configurable. I will use it.

Great!  Looking forward to the next revision.

Thanks again, [~drankye] and [~zhz].

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0
>
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, 
> HADOOP-13010-v3.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-04-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246922#comment-15246922
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

{{ErasureCoderConf#setCoderOption}} / {{ErasureCoderConf#getCoderOption}}: I 
don't see why we need to have these.  If these options are generic to all 
erasure encoders, then they can just go as "regular java fields" like  
{{ErasureCoderConf#numDataUnits}}, etc.  On the other hand, if these options 
only apply to one type of Coder, then they should be stored in the particular 
type of coder they apply to.

The usual way to do this is to have your Encoder / Decoder class take a 
Configuration object as an argument, and pull out whatever values it needs.
For example, you might have code like this:
{code}
FoobarEncoder(Configuration conf) {
  this.coderConf = new ErasureCoderConf(conf);
  this.foobarity = conf.getLong("foobarity", 123);
}
{code}

The idea is that things that are specific to a class go in that class, rather 
than trying to handle it with casts to and from Object.

Also, mutable configuration is unpleasant (what happens if you call 
{{ErasureCoderConf#setCoderOption}} when the Encoder / Decoder has already been 
created?  It seems like what we actually want to do in this case is not modify 
the configuration, but build a new Encoder / Decoder with a new configuration.

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0
>
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, 
> HADOOP-13010-v3.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-04-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246913#comment-15246913
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

bq. Yeah, it's good to document this stateless property in JavaDoc. Note by the 
way it doesn't mean these encoder/decoder are to support concurrency though 
it's possible. I would leave this for future consideration.

Sure.  In that case, we should document that these objects are not guaranteed 
to be thread-safe, so that there is no confusion.

bq. Ah yes the names (EncoderState/DecoderState) are bad, actually I meant them 
to be EncodingState/DecodingState.

OK.

bq. Ok, I'm probably convinced by you. Thanks for the lots of insights. I got 
rid of the base class anyway, and introduced ErasureCoderConf for the variables 
and methods in it. As you might check the updated patch, there are some 
duplicate of small shortcuts between the encoder base class and decoder base 
class as they now lack a common base. I suppose it's acceptable.

Great.

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0
>
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, 
> HADOOP-13010-v3.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-04-15 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243883#comment-15243883
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

bq. Yes you're right I meant some data to be shared between multiple concurrent 
encode or decode operations. The data only makes sense for a coder instance 
(binds a schema) so it's not suitable to be static; on the other hand it's also 
decode call specific so it's also not suitable to reside in the coder instance.

Thanks for the explanation.  It sounds like {{Encoder}} and {{Decoder}} will be 
stateless once they're created.  Basically, they just reflect an algorithm and 
some data required to implement that algorithm.  That is reasonable.  We should 
document this in the JavaDoc for the classes.

I also agree that we should add a new class to represent the state which the 
Decoder / Encoder is manipulating.  The problem with calling this class 
{{DecoderState}} is that this name suggests that it is the state of the coder, 
rather than state manipulated by the coder.  Perhaps calling this 
{{DecoderData}} or {{DecoderStream}} is more appropriate.  Having this new 
class will also avoid the need to manually pass around so many arrays and 
indices.

bq. AbstractRawErasureCoder maintains conf, schema related info like 
numDataUnits, and coderOptions. It provides public methods (9 ones) to access 
these fields. All of these are essentials to a erasure coder and common to both 
encoders and decoders. If we move the variables and methods to a utility class, 
it wouldn't look better, and we have to duplicate the methods across encoder 
and decoder.

These methods and fields are configuration methods and configuration fields.  
They belong in a class named something like "ErasureEncodingConfiguration" or 
something like that.  I also feel that configuration should be immutable once 
it is created, since otherwise things get very messy.  We use this pattern in 
many other cases in Hadoop: for example in {{DfsClientConf}} and 
{{DNConf.java}}.

Why is having the configuration in a separate object better than having the 
configuration in a base class?  A few reasons:
* It's easier to follow the flow of control.  You don't have to jump from file 
to file to figure out which method is actually getting called (any subclass 
could override the base class methods we have now).
* It's obvious what a CoderConfiguration class does.  It manages the 
configuration.  It's not obvious what the base class does without reading all 
of the source.
* The configuration class can have a way of printing itself (object-orientation)

Many of these are reasons why the gang of four recommended "*favor composition 
over inheritance*."

bq. Yes it's interesting. I just thought of an exact match for the current 
codes. In JRE sasl framework, it has interfaces SaslClient and SaslSever, 
abstract classes AbstractSaslImpl and GssKrb5Base, class GssKrb5Client extends 
GssKrb5Base implements SaslClient, and class GssKrb5Server extends GssKrb5Base 
implements SaslServer. I'm not sure we followed the style but I guess it could 
be a common pattern for a bit of complex situation. I thought that's why when 
it initially went in this way people understood the codes and I heard no other 
voice then.

We have to understand the reasons behind using a pattern.  The reason to 
separate interfaces from Abstract classes is that some implementations of the 
interface may not want to use the code in the Abstract class.  Since that is 
not the case here, it's not a good idea to copy this pattern.

bq. Generally and often, I have to admit that I'm more a OOP guy and prefer to 
have clear construct over concept and abstract, rather than mixed utilities. We 
can see many example utilities in the codebase that are rather lengthy and 
messy, which intends to break modularity. That's probably why I'm not feeling 
so well to get rid of the level and replace it with utilities here. I agree 
with you that sometimes composition is good to reuse some codes to avoid 
complex inheritance relationships, but here we do have a coder concept and the 
construct for it wouldn't be bad to have.

Using composition doesn't mean putting everything into utilities.  Often, it 
means grouping related things into objects.  Instead of having 20 fields 
sprayed into a base class that other classes inherit from, you have a small 
number of utility classes such as Configuration, Log, etc. that other classes 
reuse by composition (owning an instance of them).

This also makes it easy to change the code later.  Changing inheritance 
relationships often breaks backwards compatibility.  Removing a field or adding 
a new one almost never does.

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
>  

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-04-15 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243509#comment-15243509
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

bq. The problem is, a decoder associates expensive coding buffers and computed 
coding matrices, which would be good to stay in CPU core near enough caches for 
the performance. The cached data is per decoder, not only schema specific, but 
also erasure index specific in decode call, so it's not good to keep the cache 
out of decoder, but still makes sense to cache it because in HDFS side it's 
repeatedly called in a loop for a large block size (64k cell size -> 256mb 
block size). You might have a check about the native codes for native coders 
about the expensive buffers and data cached in every decode call. We had 
benchmarked the coders and showed this optimization obtained great speedup. 
Java InputStreams are similar to here, but not exactly because it's pure 
view-only and leverages OS/IO level caches for file reading stuffs.

If I understand correctly, you're making the case that there is data (such as 
matrices) which should be shared between multiple concurrent encode or decode 
operations.  If that's the case, then let's make that data static and share it 
between all instances.  But I still think that Encoder/Decoder should manage 
its own buffers rather than having them passed in on every call.

bq. Having the common base class would allow encoder and decoder to share 
common properties, not just configurations, but also schema info and some 
options. We can also say that encoder and decoder are also coders, which allows 
to write some common behaviors to deal with coders, not encoder or decoder 
specific. I understand it should also work by composition, but right now I 
don't see very much benefits to switch this from one style to the other, or 
troubles if we don't.

Hmm.  The only state in {{AbstractRawErasureCoder.java}} is configuration 
state.  I don't see why we need this class.  Everything in there could and 
should be a utility function.

The benefit of getting rid of this class is that with a shallower inheritance 
hierarchy, it's easier to understand what's going on.  To continue the analogy 
with Java, InputStream and OutputStream don't share a common base class.

bq. It sounds better not to have the interfaces since the benefit is obvious. 
So in summary how about having these classes (no interface) now: still 
AbstractRawErasureCoder, RawErasureEncoder/Decoder (no Abstract prefix now, 
with the original interface combined), and all kinds of concrete inherent 
encoders/decoders. All client codes will declare RawErasureEncoder/Decoder type 
when creating instances.

It seems reasonable, but I don't see the need for AbstractRawErasureCoder.

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0
>
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HADOOP-13010) Refactor raw erasure coders

2016-04-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242157#comment-15242157
 ] 

Colin Patrick McCabe edited comment on HADOOP-13010 at 4/14/16 11:53 PM:
-

bq. The underlying buffer for the empty trunk is assumed read-only and will 
only be used to zero coding buffers. Making the entire function safe and also 
private is a good idea because in practice that level should be good enough.

Right.  The arrays themselves are read-only.  But we still have to control 
access to the pointer to the array, which is not read-only and which needs to 
be accessed in a thread-safe fashion.

bq. For pure Java coders that use byte array and on-heap bytebuffer, this way 
to zero buffers is efficient (perhaps the most one but I'm not totally sure); 
to zero direct bytebuffer the more efficient way would be to use an empty 
direct bytebuffer. I don't optimize this because pure Java coder is better not 
to use direct bytebuffer overall. Note native coders will prefer direct 
bytebuffer but won't need to bump into this, as we discussed in HADOOP-11540.

Yeah, the JNI encoders can be more efficient, so we don't have to worry about 
optimizing this.  I was just commenting that it's unfortunate that we have to 
keep around the empty array.

bq. Ok. Comment can be made here to tell the null indexes include erased units 
and the units that's not to read.

The function just finds null array entries.  What these entries mean is up to 
the caller.

bq. Because I want \[the first element\] to return fast considering it's the 
most often case.

I don't see any evidence that adding a special case makes this faster than just 
running the loop.  The loop starts at the first element anyway.  If the loop 
usually stops after the first iteration, I would expect the just-in-time 
compiler to optimize this code.  Let's get rid of the special case, unless we 
have some benchmarks showing that it helps.

bq. \[Decoders are\] intended not to be stateful, thus many threads can use the 
same decoder instance. I'm not sure all the existing coders are already good in 
this aspect, but effort will be made to achieve so if necessary, not sure all 
be done here.

Part of the appeal of object-oriented programming is to combine the data with 
the methods used to operate on that data.  I'm not sure why we would want to 
keep the decoder state separate from the decoder functions.  If we want to do 
multiple decode operations in parallel, we can just create multiple Decoder 
objects, right?

Java InputStreams don't have an InputStreamState that you have to pass in to 
every function.  Instead, if you want multiple views of the same file, you just 
create multiple streams.  It seems like we can take the same approach here.


was (Author: cmccabe):
bq. The underlying buffer for the empty trunk is assumed read-only and will 
only be used to zero coding buffers. Making the entire function safe and also 
private is a good idea because in practice that level should be good enough.

Right.  The arrays themselves are read-only.  But we still have to control 
access to the pointer to the array, which is not read-only and which needs to 
be accessed in a thread-safe fashion.

bq. For pure Java coders that use byte array and on-heap bytebuffer, this way 
to zero buffers is efficient (perhaps the most one but I'm not totally sure); 
to zero direct bytebuffer the more efficient way would be to use an empty 
direct bytebuffer. I don't optimize this because pure Java coder is better not 
to use direct bytebuffer overall. Note native coders will prefer direct 
bytebuffer but won't need to bump into this, as we discussed in HADOOP-11540.

Yeah, the JNI encoders can be more efficient, so we don't have to worry about 
optimizing this.  I was just commenting that it's unfortunate that we have to 
keep around the empty array.

bq. Ok. Comment can be made here to tell the null indexes include erased units 
and the units that's not to read.

The function just finds null array entries.  What these entries mean is up to 
the caller.

bq. Because I want \[the first element\] to return fast considering it's the 
most often case.

I don't see any evidence that adding a special case makes this faster than just 
running the loop.  The loop starts at the first element anyway.  If the loop 
usually stops after the first iteration, I would expect the just-in-time 
compiler to optimize this code.  Let's get rid of the special case, unless we 
have some benchmarks showing that it helps.

bq. \[Decoders are\] intended not to be stateful, thus many threads can use the 
same decoder instance. I'm not sure all the existing coders are already good in 
this aspect, but effort will be made to achieve so if necessary, not sure all 
be done here.

Part of the appeal of object-oriented programming is to combine the data with 
the methods

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-04-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242157#comment-15242157
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

bq. The underlying buffer for the empty trunk is assumed read-only and will 
only be used to zero coding buffers. Making the entire function safe and also 
private is a good idea because in practice that level should be good enough.

Right.  The arrays themselves are read-only.  But we still have to control 
access to the pointer to the array, which is not read-only and which needs to 
be accessed in a thread-safe fashion.

bq. For pure Java coders that use byte array and on-heap bytebuffer, this way 
to zero buffers is efficient (perhaps the most one but I'm not totally sure); 
to zero direct bytebuffer the more efficient way would be to use an empty 
direct bytebuffer. I don't optimize this because pure Java coder is better not 
to use direct bytebuffer overall. Note native coders will prefer direct 
bytebuffer but won't need to bump into this, as we discussed in HADOOP-11540.

Yeah, the JNI encoders can be more efficient, so we don't have to worry about 
optimizing this.  I was just commenting that it's unfortunate that we have to 
keep around the empty array.

bq. Ok. Comment can be made here to tell the null indexes include erased units 
and the units that's not to read.

The function just finds null array entries.  What these entries mean is up to 
the caller.

bq. Because I want \[the first element\] to return fast considering it's the 
most often case.

I don't see any evidence that adding a special case makes this faster than just 
running the loop.  The loop starts at the first element anyway.  If the loop 
usually stops after the first iteration, I would expect the just-in-time 
compiler to optimize this code.  Let's get rid of the special case, unless we 
have some benchmarks showing that it helps.

bq. \[Decoders are\] intended not to be stateful, thus many threads can use the 
same decoder instance. I'm not sure all the existing coders are already good in 
this aspect, but effort will be made to achieve so if necessary, not sure all 
be done here.

Part of the appeal of object-oriented programming is to combine the data with 
the methods used to operate on that data.  I'm not sure why we would want to 
keep the decoder state separate from the decoder functions.  If we want to do 
multiple decode operations in parallel, we can just create multiple Decoder 
objects, right?

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0
>
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HADOOP-12975) Add jitter to CachingGetSpaceUsed's thread

2016-04-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241805#comment-15241805
 ] 

Colin Patrick McCabe edited comment on HADOOP-12975 at 4/14/16 8:08 PM:


bq. But a percentage is chosen as it makes the jitter scale with anyone who 
changes du periods. If it's a set number then someone with a refresh period of 
days won't get any benefit from the jitter.

Hmm.  It seems like a fixed amount of jitter still provides a benefit, even to 
someone with a longer refresh interval.  Let's say my refresh period is 7 days. 
 At the end of that, I would still appreciate having my DU processes launch at 
slightly different times on the 7th day, rather than all launching at once.

My concern with varying based on a percentage is that there will be enormous 
variations in how long different volumes go between DU operations, when longer 
refresh intervals are in use.  Like if I have a 7 day period and one volume 
refreshes after 3.5 days, and the other waits for the full 7 days, that's quite 
a variation.  Similarly, if our period is short -- like 1 hour-- having some 
datanodes refresh after only 30 minutes seems unwelcome.  That's why I 
suggested a fixed jitter amount, to be configured by the sysadmin.

I don't feel very strongly about this, though, so if you want to make it 
percentage-based, that's fine too.  As long as it's configurable and the 
defaults are reasonable.  I definitely think that a maximum jitter percentage 
of 0.15 or 0.20 seems more reasonable than 0.5.


was (Author: cmccabe):
bq. But a percentage is chosen as it makes the jitter scale with anyone who 
changes du periods. If it's a set number then someone with a refresh period of 
days won't get any benefit from the jitter.

Hmm.  It seems like a fixed amount of jitter still provides a benefit, even to 
someone with a longer refresh interval.  Let's say my refresh period is 7 days. 
 At the end of that, I would still appreciate having my DU processes launch at 
slightly different times on the 7th day, rather than all launching at once.

My concern with varying based on a percentage is that there will be enormous 
variations in how long different volumes go between DU operations, when longer 
refresh intervals are in use.  Like if I have a 7 day period and one volume 
refreshes after 3.5 days, and the other waits for the full 7 days, that's quite 
a variation.  Similarly, if our period is short -- like 1 hour-- having some 
datanodes refresh after only 30 minutes seems unwelcome.  That's why I 
suggested a fixed jitter amount, to be configured by the sysadmin.

I don't feel very strongly about this, though, so if you want to make it 
percentage-based, that's fine too.  As long as it's configurable and the 
defaults are reasonable.

> Add jitter to CachingGetSpaceUsed's thread
> --
>
> Key: HADOOP-12975
> URL: https://issues.apache.org/jira/browse/HADOOP-12975
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12975v0.patch, HADOOP-12975v1.patch, 
> HADOOP-12975v2.patch
>
>
> Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike. We should add some 
> jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HADOOP-12975) Add jitter to CachingGetSpaceUsed's thread

2016-04-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241805#comment-15241805
 ] 

Colin Patrick McCabe edited comment on HADOOP-12975 at 4/14/16 8:07 PM:


bq. But a percentage is chosen as it makes the jitter scale with anyone who 
changes du periods. If it's a set number then someone with a refresh period of 
days won't get any benefit from the jitter.

Hmm.  It seems like a fixed amount of jitter still provides a benefit, even to 
someone with a longer refresh interval.  Let's say my refresh period is 7 days. 
 At the end of that, I would still appreciate having my DU processes launch at 
slightly different times on the 7th day, rather than all launching at once.

My concern with varying based on a percentage is that there will be enormous 
variations in how long different volumes go between DU operations, when longer 
refresh intervals are in use.  Like if I have a 7 day period and one volume 
refreshes after 3.5 days, and the other waits for the full 7 days, that's quite 
a variation.  Similarly, if our period is short -- like 1 hour-- having some 
datanodes refresh after only 30 minutes seems unwelcome.  That's why I 
suggested a fixed jitter amount, to be configured by the sysadmin.

I don't feel very strongly about this, though, so if you want to make it 
percentage-based, that's fine too.  As long as it's configurable and the 
defaults are reasonable.


was (Author: cmccabe):
bq. But a percentage is chosen as it makes the jitter scale with anyone who 
changes du periods. If it's a set number then someone with a refresh period of 
days won't get any benefit from the jitter.

Hmm.  It seems like a fixed amount of jitter still provides a benefit, even to 
someone with a longer refresh interval.  Let's say my refresh period is 7 days. 
 At the end of that, I would still appreciate having my DU processes launch at 
slightly different times on the 7th day, rather than all launching at once.

My concern with varying based on a percentage is that there will be enormous 
variations in how long different volumes go between DU operations, when longer 
refresh intervals are in use.  Like if I have a 7 day period and one volume 
refreshes after 3.5 days, and the other waits for the full 7 days, that's quite 
a variation.  Similarly, if our period is short -- like 1 hour-- having some 
datanodes refresh after only 30 minutes seems unwelcome.  That's why I 
suggested a fixed jitter amount, to be configured by the sysadmin.

I don't feel very strongly about this, though, so if you want to make it 
percentage-based, that's fine too.  As long as its configurable and the 
defaults are reasonable.

> Add jitter to CachingGetSpaceUsed's thread
> --
>
> Key: HADOOP-12975
> URL: https://issues.apache.org/jira/browse/HADOOP-12975
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12975v0.patch, HADOOP-12975v1.patch, 
> HADOOP-12975v2.patch
>
>
> Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike. We should add some 
> jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HADOOP-12975) Add jitter to CachingGetSpaceUsed's thread

2016-04-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241805#comment-15241805
 ] 

Colin Patrick McCabe edited comment on HADOOP-12975 at 4/14/16 8:07 PM:


bq. But a percentage is chosen as it makes the jitter scale with anyone who 
changes du periods. If it's a set number then someone with a refresh period of 
days won't get any benefit from the jitter.

Hmm.  It seems like a fixed amount of jitter still provides a benefit, even to 
someone with a longer refresh interval.  Let's say my refresh period is 7 days. 
 At the end of that, I would still appreciate having my DU processes launch at 
slightly different times on the 7th day, rather than all launching at once.

My concern with varying based on a percentage is that there will be enormous 
variations in how long different volumes go between DU operations, when longer 
refresh intervals are in use.  Like if I have a 7 day period and one volume 
refreshes after 3.5 days, and the other waits for the full 7 days, that's quite 
a variation.  Similarly, if our period is short -- like 1 hour-- having some 
datanodes refresh after only 30 minutes seems unwelcome.  That's why I 
suggested a fixed jitter amount, to be configured by the sysadmin.

I don't feel very strongly about this, though, so if you want to make it 
percentage-based, that's fine too.  As long as its configurable and the 
defaults are reasonable.


was (Author: cmccabe):
bq. But a percentage is chosen as it makes the jitter scale with anyone who 
changes du periods. If it's a set number then someone with a refresh period of 
days won't get any benefit from the jitter.

Hmm.  It seems like a fixed amount of jitter still provides a benefit, even to 
someone with a longer refresh interval.  Let's say my refresh period is 7 days. 
 At the end of that, I would still appreciate having my DU processes launch at 
slightly different times on the 7th day, rather than all launching at once.

My concern with varying based on a percentage is that there will be enormous 
variations in how long different volumes go between DU operations, when longer 
refresh intervals are in use.  Like if I have a 7 day period and one volume 
refreshes after 3.5 days, and the other ways for the full 7 days, that's quite 
a variation.  Similarly, if our period is short -- like 1 hour-- having some 
datanodes refresh after only 30 minutes seems unwelcome.  That's why I 
suggested a fixed jitter amount, to be configured by the sysadmin.

I don't feel very strongly about this, though, so if you want to make it 
percentage-based, that's fine too.  As long as its configurable and the 
defaults are reasonable.

> Add jitter to CachingGetSpaceUsed's thread
> --
>
> Key: HADOOP-12975
> URL: https://issues.apache.org/jira/browse/HADOOP-12975
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12975v0.patch, HADOOP-12975v1.patch, 
> HADOOP-12975v2.patch
>
>
> Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike. We should add some 
> jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12975) Add jitter to CachingGetSpaceUsed's thread

2016-04-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241805#comment-15241805
 ] 

Colin Patrick McCabe commented on HADOOP-12975:
---

bq. But a percentage is chosen as it makes the jitter scale with anyone who 
changes du periods. If it's a set number then someone with a refresh period of 
days won't get any benefit from the jitter.

Hmm.  It seems like a fixed amount of jitter still provides a benefit, even to 
someone with a longer refresh interval.  Let's say my refresh period is 7 days. 
 At the end of that, I would still appreciate having my DU processes launch at 
slightly different times on the 7th day, rather than all launching at once.

My concern with varying based on a percentage is that there will be enormous 
variations in how long different volumes go between DU operations, when longer 
refresh intervals are in use.  Like if I have a 7 day period and one volume 
refreshes after 3.5 days, and the other ways for the full 7 days, that's quite 
a variation.  Similarly, if our period is short -- like 1 hour-- having some 
datanodes refresh after only 30 minutes seems unwelcome.  That's why I 
suggested a fixed jitter amount, to be configured by the sysadmin.

I don't feel very strongly about this, though, so if you want to make it 
percentage-based, that's fine too.  As long as its configurable and the 
defaults are reasonable.

> Add jitter to CachingGetSpaceUsed's thread
> --
>
> Key: HADOOP-12975
> URL: https://issues.apache.org/jira/browse/HADOOP-12975
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12975v0.patch, HADOOP-12975v1.patch, 
> HADOOP-12975v2.patch
>
>
> Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike. We should add some 
> jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HADOOP-13010) Refactor raw erasure coders

2016-04-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241679#comment-15241679
 ] 

Colin Patrick McCabe edited comment on HADOOP-13010 at 4/14/16 6:29 PM:


Thanks for this, [~drankye].  Looks good overall!

I like the idea of moving some of the utility stuff into {{CoderUtil.java}}.

{code}
  static byte[] getEmptyChunk(int leastLength) {
if (emptyChunk.length >= leastLength) {
  return emptyChunk; // In most time
}
synchronized (AbstractRawErasureCoder.class) {
  emptyChunk = new byte[leastLength];
}
return emptyChunk;
  }
{code}
This isn't safe for multiple threads, since we could be reading 
{{CoderUtil#emptyChunk}} while it's in the middle of being written.  You must 
either make this {{volatile}} or hold the lock for this entire function.

It's unfortunate that we need a function like this-- I was hoping that there 
would be some more efficient way of zeroing a ByteBuffer.  One thing that's a 
little concerning here is that a caller could modify the array returned by 
{{getEmptyChunk}}, which would cause problems for other callers.  To avoid 
this, it's probably better to make this {{private}} to {{CoderUtil.java}}.

{code}
  static ByteBuffer convertInputBuffer(byte[] input, int offset, int len) {
{code}
Hmm.  This name seems a bit confusing.  What this function does has nothing to 
do with whether the buffer is for "input" versus "output"-- it's just copying 
data from an array to a {{DirectByteBuffer}}.  It's also not so much a 
"conversion" as a "copy".  Maybe something like {{cloneAsDirectByteBuffer}} 
would be a better name?

{code}
  static  int[] getErasedOrNotToReadIndexes(T[] inputs) {
{code}
Should be named {{getNullIndexes}}?

{code}
  static  T findFirstValidInput(T[] inputs) {
if (inputs.length > 0 && inputs[0] != null) {
  return inputs[0];
}

for (T input : inputs) {
  if (input != null) {
return input;
  }
}
...
{code}
Why do we need the special case for the first element here?

{code}
  static  void makeValidIndexes(T[] inputs, int[] validIndexes) {
{code}
Should be named {{getNonNullIndexes}}?  Also, why does this one take an array 
passed in, whereas {{getNullIndexes}} returns an array?  I also don't see how 
the caller is supposed to know how many of the array slots were used by the 
function.  If the array starts as all zeros, that is identical to the function 
putting a zero in the first element of the array and then returning, right?  
Perhaps we could mandate that the caller set all the array slots to a negative 
value before calling the function, but that seems like an awkward calling 
convention-- and certainly one that should be documented via JavaDoc.

{code}
  @Override
  protected void doDecode(DecoderState decoderState, ByteBuffer[] inputs,
  int[] erasedIndexes, ByteBuffer[] outputs) {
{code}
I'm not sure why we wouldn't just store {{DecoderState}} in the {{Decoder}}?  
These are stateful objects, I assume.

Continuing my comments from earlier:
* {{AbstractRawErasureCoder}} -- why do we need this base class?  Its function 
seems to be just storing configuration values.  Perhaps we'd be better off just 
having an {{ErasureEncodingConfiguration}} class which other objects can own 
(not inherit from).  I think of a configuration as something you *own*, not 
something you *are*, which is why I think composition would make more sense 
here.  Also, is it possible for this to be immutable?  Mutable configuration is 
a huge headache (another reason I dislike {{Configured.java}})
* {{AbstractRawErasureEncoder}} /{{AbstractRawErasureDecoder}} -- why are these 
classes separate from {{RawErasureEncoder}} / {{RawErasureDecoder}}?  Do we 
expect that any encoders will implement {{RawErasureEncoder}}, but not extend 
{{AbstractRawErasureEncoder}}?  If not, it would be better just to have two 
base classes here rather than 2 classes and 2 interfaces.  Base classes are 
also easier to extend in the future than interfaces because you can add new 
methods without breaking backwards compatibility (as long as you have a default 
in the base).
* {{DummyRawDecoder}} -- {{NoOpRawDecoder}} would be a better name than 
"Dummy".  Is this intended to be used just in unit tests, or is it something 
the end-user should be able to configure?  If it is just unit tests, it should 
be under a {{test}} path, rather than a {{main}} path... i.e. 
{{hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/erasurecode/rawcoder/DummyRawDecoder.java}}


was (Author: cmccabe):
Thanks for this, [~drankye].  Looks good overall!

I like the idea of moving some of the utility stuff into {{CoderUtil.java}}.

{code}
  static byte[] getEmptyChunk(int leastLength) {
if (emptyChunk.length >= leastLength) {
  return emptyChunk; // In most time

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

2016-04-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241679#comment-15241679
 ] 

Colin Patrick McCabe commented on HADOOP-13010:
---

Thanks for this, [~drankye].  Looks good overall!

I like the idea of moving some of the utility stuff into {{CoderUtil.java}}.

{code}
  static byte[] getEmptyChunk(int leastLength) {
if (emptyChunk.length >= leastLength) {
  return emptyChunk; // In most time
}
synchronized (AbstractRawErasureCoder.class) {
  emptyChunk = new byte[leastLength];
}
return emptyChunk;
  }
{code}
This isn't safe for multiple threads, since we could be reading 
{{CoderUtil#emptyChunk}} while it's in the middle of being written.  You must 
either make this {{volatile}} or hold the lock for this entire function.

It's unfortunate that we need a function like this-- I was hoping that there 
would be some more efficient way of zeroing a ByteBuffer.  One thing that's a 
little concerning here is that a caller could modify the array returned by 
{{getEmptyChunk}}, which would cause problems for other callers.  To avoid 
this, it's probably better to make this {{private}} to {{CoderUtil.java}}.

{code}
  static ByteBuffer convertInputBuffer(byte[] input, int offset, int len) {
{code}
Hmm.  This name seems a bit confusing.  What this function does has nothing to 
do with whether the buffer is for "input" versus "output"-- it's just copying 
data from an array to a {{DirectByteBuffer}}.  It's also not so much a 
"conversion" as a "copy".  Maybe something like {{cloneAsDirectByteBuffer}} 
would be a better name?

{code}
  static  int[] getErasedOrNotToReadIndexes(T[] inputs) {
{code}
Should be named {{getNullIndexes}}?

{code}
  static  T findFirstValidInput(T[] inputs) {
if (inputs.length > 0 && inputs[0] != null) {
  return inputs[0];
}

for (T input : inputs) {
  if (input != null) {
return input;
  }
}
...
{code}
Why do we need the special case for the first element here?

{code}
  static  void makeValidIndexes(T[] inputs, int[] validIndexes) {
{code}
Should be named {{getNonNullIndexes}}?  Also, why does this one take an array 
passed in, whereas {{getNullIndexes}} returns an array?  I also don't see how 
the caller is supposed to know how many of the array slots were used by the 
function.  If the array starts as all zeros, that is identical to the function 
putting a zero in the first element of the array and then returning, right?  
Perhaps we could mandate that the caller set all the array slots to a negative 
value before calling the function, but that seems like an awkward calling 
convention-- and certainly one that should be documented via JavaDoc.

{code}
  @Override
  protected void doDecode(DecoderState decoderState, ByteBuffer[] inputs,
  int[] erasedIndexes, ByteBuffer[] outputs) {
{code}
I'm not sure why we wouldn't just store {{DecoderState}} in the {{Decoder}}?  
These are stateful objects, I assume.

Continuing my comments from earlier:
* {{AbstractRawErasureCoder}} -- why do we need this base class?  Its function 
seems to be just storing configuration values.  Perhaps we'd be better off just 
having an {{ErasureEncodingConfiguration}} class which other objects can own 
(not inherit from).  I think of a configuration as something you *own*, not 
something you *are*, which is why I think composition would make more sense 
here.  Also, is it possible for this to be immutable?  Mutable configuration is 
a huge headache (another reason I dislike {{Configured.java}})
* {{AbstractRawErasure{En,De}coder}} -- why are these classes separate from 
{{RawErasureEncoder}} / {{RawErasureDecoder}}?  Do we expect that any encoders 
will implement {{RawErasureEncoder}}, but not extend 
{{AbstractRawErasureEncoder}}?  If not, it would be better just to have two 
base classes here rather than 2 classes and 2 interfaces.  Base classes are 
also easier to extend in the future than interfaces because you can add new 
methods without breaking backwards compatibility (as long as you have a default 
in the base).
* {{DummyRawDecoder}} -- {{NoOpRawDecoder}} would be a better name than 
"Dummy".  Is this intended to be used just in unit tests, or is it something 
the end-user should be able to configure?  If it is just unit tests, it should 
be under a {{test}} path, rather than a {{main}} path... i.e. 
{{hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/erasurecode/rawcoder/DummyRawDecoder.java}}

> Refactor raw erasure coders
> ---
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0
>
> Attachments: HADOOP-

[jira] [Updated] (HADOOP-12973) make DU pluggable

2016-04-12 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-12973:
--
  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

> make DU pluggable
> -
>
> Key: HADOOP-12973
> URL: https://issues.apache.org/jira/browse/HADOOP-12973
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.8.0
>
> Attachments: HADOOP-12973v0.patch, HADOOP-12973v1.patch, 
> HADOOP-12973v10.patch, HADOOP-12973v11.patch, HADOOP-12973v12.patch, 
> HADOOP-12973v13.patch, HADOOP-12973v2.patch, HADOOP-12973v3.patch, 
> HADOOP-12973v5.patch, HADOOP-12973v6.patch, HADOOP-12973v7.patch, 
> HADOOP-12973v8.patch, HADOOP-12973v9.patch
>
>
> If people are concerned about replacing the call to DU. Then an easy first 
> step is to make it pluggable. Then it's possible to replace it with something 
> while leaving the default alone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12973) make DU pluggable

2016-04-12 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238238#comment-15238238
 ] 

Colin Patrick McCabe commented on HADOOP-12973:
---

The fact that the tests pass for me locally, a different subset fails for each 
JVM, and the error message itself leads me to conclude that this is a build 
infrastructure problem, not a patch problem.  Committed to trunk, 2.9, and 2.8. 
 Thanks, [~eclark].

> make DU pluggable
> -
>
> Key: HADOOP-12973
> URL: https://issues.apache.org/jira/browse/HADOOP-12973
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12973v0.patch, HADOOP-12973v1.patch, 
> HADOOP-12973v10.patch, HADOOP-12973v11.patch, HADOOP-12973v12.patch, 
> HADOOP-12973v13.patch, HADOOP-12973v2.patch, HADOOP-12973v3.patch, 
> HADOOP-12973v5.patch, HADOOP-12973v6.patch, HADOOP-12973v7.patch, 
> HADOOP-12973v8.patch, HADOOP-12973v9.patch
>
>
> If people are concerned about replacing the call to DU. Then an easy first 
> step is to make it pluggable. Then it's possible to replace it with something 
> while leaving the default alone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HADOOP-12973) make DU pluggable

2016-04-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236389#comment-15236389
 ] 

Colin Patrick McCabe edited comment on HADOOP-12973 at 4/12/16 1:13 AM:


Cool.  Thanks, [~eclark].

Hmm... TestDU failure looks related.  +1 pending fixing that unit test


was (Author: cmccabe):
Cool.  Thanks, [~eclark].  +1

> make DU pluggable
> -
>
> Key: HADOOP-12973
> URL: https://issues.apache.org/jira/browse/HADOOP-12973
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12973v0.patch, HADOOP-12973v1.patch, 
> HADOOP-12973v10.patch, HADOOP-12973v11.patch, HADOOP-12973v12.patch, 
> HADOOP-12973v2.patch, HADOOP-12973v3.patch, HADOOP-12973v5.patch, 
> HADOOP-12973v6.patch, HADOOP-12973v7.patch, HADOOP-12973v8.patch, 
> HADOOP-12973v9.patch
>
>
> If people are concerned about replacing the call to DU. Then an easy first 
> step is to make it pluggable. Then it's possible to replace it with something 
> while leaving the default alone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HADOOP-12973) make DU pluggable

2016-04-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236389#comment-15236389
 ] 

Colin Patrick McCabe edited comment on HADOOP-12973 at 4/12/16 1:11 AM:


Cool.  Thanks, [~eclark].  +1


was (Author: cmccabe):
Cool.  Thanks, [~eclark].  +1 pending jenkins.

> make DU pluggable
> -
>
> Key: HADOOP-12973
> URL: https://issues.apache.org/jira/browse/HADOOP-12973
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12973v0.patch, HADOOP-12973v1.patch, 
> HADOOP-12973v10.patch, HADOOP-12973v11.patch, HADOOP-12973v12.patch, 
> HADOOP-12973v2.patch, HADOOP-12973v3.patch, HADOOP-12973v5.patch, 
> HADOOP-12973v6.patch, HADOOP-12973v7.patch, HADOOP-12973v8.patch, 
> HADOOP-12973v9.patch
>
>
> If people are concerned about replacing the call to DU. Then an easy first 
> step is to make it pluggable. Then it's possible to replace it with something 
> while leaving the default alone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12973) make DU pluggable

2016-04-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236389#comment-15236389
 ] 

Colin Patrick McCabe commented on HADOOP-12973:
---

Cool.  Thanks, [~eclark].  +1 pending jenkins.

> make DU pluggable
> -
>
> Key: HADOOP-12973
> URL: https://issues.apache.org/jira/browse/HADOOP-12973
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12973v0.patch, HADOOP-12973v1.patch, 
> HADOOP-12973v10.patch, HADOOP-12973v11.patch, HADOOP-12973v12.patch, 
> HADOOP-12973v2.patch, HADOOP-12973v3.patch, HADOOP-12973v5.patch, 
> HADOOP-12973v6.patch, HADOOP-12973v7.patch, HADOOP-12973v8.patch, 
> HADOOP-12973v9.patch
>
>
> If people are concerned about replacing the call to DU. Then an easy first 
> step is to make it pluggable. Then it's possible to replace it with something 
> while leaving the default alone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12973) make DU pluggable

2016-04-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233265#comment-15233265
 ] 

Colin Patrick McCabe commented on HADOOP-12973:
---

Thanks, [~eclark].  Looks good!  The Windows code looks cleaner than before.

{code}
try {
  dfsUsage.close();
} catch (IOException ioe){
  LOG.warn("Error trying to shutdown GetUsedSpace background thread", ioe);
}
{code}
Can use {{IOUtils#cleanup}} here?

I'm wonder if we could have GetSpaceUsed just be an interface with only one 
method... {{long getSpace()}} or something like that.  A method that just 
synchronously retrieves the amount of space used, blocking for as long as it 
takes.

Then, we could have another class which does all this thread management and 
value caching stuff.  It seems unrelated to the {{GetSpaceUsed}} interface.  
Like if I'm implementing {{JNIGetSpaceUsed}}, I don't care about thread 
management.  I just want to implement the method which gets the amount of space 
used, and leave the thread management the same.  I think that's the direction 
you were going with the {{GetSpaceUsed}} base class, but feels messy to make 
the implementation classes reach back up into the base class and play with 
atomic variables.

> make DU pluggable
> -
>
> Key: HADOOP-12973
> URL: https://issues.apache.org/jira/browse/HADOOP-12973
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12973v0.patch, HADOOP-12973v1.patch, 
> HADOOP-12973v10.patch, HADOOP-12973v11.patch, HADOOP-12973v2.patch, 
> HADOOP-12973v3.patch, HADOOP-12973v5.patch, HADOOP-12973v6.patch, 
> HADOOP-12973v7.patch, HADOOP-12973v8.patch, HADOOP-12973v9.patch
>
>
> If people are concerned about replacing the call to DU. Then an easy first 
> step is to make it pluggable. Then it's possible to replace it with something 
> while leaving the default alone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11540) Raw Reed-Solomon coder using Intel ISA-L library

2016-04-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233212#comment-15233212
 ] 

Colin Patrick McCabe commented on HADOOP-11540:
---

Thanks for your work on this, [~drankye].  It's making a lot of progress, I 
think.

> Raw Reed-Solomon coder using Intel ISA-L library
> 
>
> Key: HADOOP-11540
> URL: https://issues.apache.org/jira/browse/HADOOP-11540
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Kai Zheng
> Attachments: HADOOP-11540-initial.patch, HADOOP-11540-v1.patch, 
> HADOOP-11540-v10.patch, HADOOP-11540-v2.patch, HADOOP-11540-v4.patch, 
> HADOOP-11540-v5.patch, HADOOP-11540-v6.patch, HADOOP-11540-v7.patch, 
> HADOOP-11540-v8.patch, HADOOP-11540-v9.patch, 
> HADOOP-11540-with-11996-codes.patch, Native Erasure Coder Performance - Intel 
> ISAL-v1.pdf
>
>
> This is to provide RS codec implementation using Intel ISA-L library for 
> encoding and decoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11540) Raw Reed-Solomon coder using Intel ISA-L library

2016-04-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231306#comment-15231306
 ] 

Colin Patrick McCabe commented on HADOOP-11540:
---

Thanks, [~drankye].  Good progress here.

bq. I agree it will be easier to understand. The only thing I'm not sure about 
is, there are at least 6 Java coders and 2 x 6 encode/decode functions right 
now, if adding a loop to reset the list of output buffers to each function, it 
looks like a major change here. That's why I put the common codes in the 
abstract class.

Hmm.  I still think changing the Java coders is the simplest thing to do.  It's 
a tiny amount of code, or should be (calling one function), and simple to 
understand.

bq. How about introducing AbstractJavaRawEncoder/AbstractJavaRawDecoder similar 
to the native ones for such things, then we can get rid of wantInitOutputs and 
don't have to change into each Java coders?

I don't think this would be a good idea.  We need to start thinking about 
simplifying the inheritance hierarchy and getting rid of some levels.  We have 
too many non-abstract base classes, which makes it difficult to follow.  
Inheritance should not be used to accomplish code reuse, only to express a 
genuine is-a relationship.

> Raw Reed-Solomon coder using Intel ISA-L library
> 
>
> Key: HADOOP-11540
> URL: https://issues.apache.org/jira/browse/HADOOP-11540
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Kai Zheng
> Attachments: HADOOP-11540-initial.patch, HADOOP-11540-v1.patch, 
> HADOOP-11540-v10.patch, HADOOP-11540-v2.patch, HADOOP-11540-v4.patch, 
> HADOOP-11540-v5.patch, HADOOP-11540-v6.patch, HADOOP-11540-v7.patch, 
> HADOOP-11540-v8.patch, HADOOP-11540-v9.patch, 
> HADOOP-11540-with-11996-codes.patch, Native Erasure Coder Performance - Intel 
> ISAL-v1.pdf
>
>
> This is to provide RS codec implementation using Intel ISA-L library for 
> encoding and decoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11540) Raw Reed-Solomon coder using Intel ISA-L library

2016-04-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230976#comment-15230976
 ] 

Colin Patrick McCabe commented on HADOOP-11540:
---

Thanks, [~drankye].

{code}
+  /**
+   * Convert an output bytes array buffer to direct ByteBuffer.
+   * @param output
+   * @return direct ByteBuffer
+   */
+  protected ByteBuffer convertOutputBuffer(byte[] output, int len) {
+ByteBuffer directBuffer = ByteBuffer.allocateDirect(len);
+return directBuffer;
+  }
{code}
Is it intentional that the "output" parameter is ignored here?

bq. For initOutputs and resetBuffer, good catch! About this I initially thought 
as you suggested, instead of having initOutputs, just letting concrete coders 
to override resetBuffer, which would be most flexible. Then I realized for Java 
coders, a default behavior can be provided and used; for native coders, we can 
avoid having it because at the beginning of the encode() call the native coder 
can memset the output buffers directly. If instead the native coder has to 
provide resetBuffer, then a JNI function has to be added, which will be called 
some times to reset output buffers. Considering the overhead in both 
implementation and extra JNI calls, I used the initOutputs() approach.

Thanks for the explanation.  Why not just have the encode() function zero the 
buffer in every case?  I don't see why the pure java code benefits from doing 
this differently-- and it is much simpler to understand if all the coders do it 
the same way.

{code}
void setCoder(JNIEnv* env, jobject thiz, IsalCoder* pCoder) {
  jclass clazz = (*env)->GetObjectClass(env, thiz);
  jfieldID fid = (*env)->GetFieldID(env, clazz, "nativeCoder", "J");
  (*env)->SetLongField(env, thiz, fid, (jlong) pCoder);
}
{code}
All these functions can fail.  You need to check for, and handle their failures.

isAllowingChangeInputs, isAllowingVerboseDump: should be {{allowChangeInputs}}, 
{{allowVerboseDump}} for clarity.

> Raw Reed-Solomon coder using Intel ISA-L library
> 
>
> Key: HADOOP-11540
> URL: https://issues.apache.org/jira/browse/HADOOP-11540
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Kai Zheng
> Attachments: HADOOP-11540-initial.patch, HADOOP-11540-v1.patch, 
> HADOOP-11540-v10.patch, HADOOP-11540-v2.patch, HADOOP-11540-v4.patch, 
> HADOOP-11540-v5.patch, HADOOP-11540-v6.patch, HADOOP-11540-v7.patch, 
> HADOOP-11540-v8.patch, HADOOP-11540-v9.patch, 
> HADOOP-11540-with-11996-codes.patch, Native Erasure Coder Performance - Intel 
> ISAL-v1.pdf
>
>
> This is to provide RS codec implementation using Intel ISA-L library for 
> encoding and decoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HADOOP-12973) make DU pluggable

2016-04-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230762#comment-15230762
 ] 

Colin Patrick McCabe edited comment on HADOOP-12973 at 4/7/16 6:31 PM:
---

bq. It makes it more obvious when someone overrides the class where things are.

Hmm.  How about making the class {{final}} instead?

Re: {{DU}} versus {{WindowsDU}}. If you really want to separate the classes, I 
don't object, but I don't want the {{WindowsDU}} to be a subclass of the Linux 
{{DU}}.  That is just weird.

bq. Shutdown is needed. So it's very strange to have a shutdown without a start.

There is a start-- in {{GetSpaceUsedBuilder}}.  Having an "init" method that 
you have to call after initialization is an anti-pattern.  There is no reason 
why the user should have to care whether {{GetSpaceUsedBuilder}} contains a 
thread or not-- many implementations won't need a thread.  The fact that not 
all subclasses need threads is a good sign that thread management doesn't 
belong in the common interface.

I'm also curious how you feel about the idea of making the interface 
{{Closeable}}, as we've done with many other interfaces such as 
{{FailoverProxyProvider}}, {{ServicePlugin}}, {{BlockReader}}, {{Peer}}, 
{{PeerServer}}, {{FsVolumeReference}}, etc. etc.


was (Author: cmccabe):
bq. It makes it more obvious when someone overrides the class where things are.

Hmm.  How about making the class {{final}} instead?

Re: DU versus WindowsDU. If you really want to separate the classes, I don't 
object, but I don't want the WindowsDU to be a subclass of the Linux DU.  That 
is just weird.

bq. Shutdown is needed. So it's very strange to have a shutdown without a start.

There is a start-- in GetSpaceUsedBuilder.  Having an "init" method is an 
anti-pattern.


> make DU pluggable
> -
>
> Key: HADOOP-12973
> URL: https://issues.apache.org/jira/browse/HADOOP-12973
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12973v0.patch, HADOOP-12973v1.patch, 
> HADOOP-12973v10.patch, HADOOP-12973v2.patch, HADOOP-12973v3.patch, 
> HADOOP-12973v5.patch, HADOOP-12973v6.patch, HADOOP-12973v7.patch, 
> HADOOP-12973v8.patch, HADOOP-12973v9.patch
>
>
> If people are concerned about replacing the call to DU. Then an easy first 
> step is to make it pluggable. Then it's possible to replace it with something 
> while leaving the default alone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HADOOP-12973) make DU pluggable

2016-04-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230762#comment-15230762
 ] 

Colin Patrick McCabe edited comment on HADOOP-12973 at 4/7/16 6:32 PM:
---

bq. It makes it more obvious when someone overrides the class where things are.

Hmm.  How about making the class {{final}} instead?

Re: {{DU}} versus {{WindowsDU}}. If you really want to separate the classes, I 
don't object, but I don't want the {{WindowsDU}} to be a subclass of the Linux 
{{DU}}.  That is just weird.

bq. Shutdown is needed. So it's very strange to have a shutdown without a start.

There is a start-- in {{GetSpaceUsedBuilder}}.  Having an "init" method that 
you have to call after initialization is an anti-pattern.  There is no reason 
why the user should have to care whether {{GetSpaceUsedBuilder}} contains a 
thread or not-- many implementations won't need a thread.  The fact that not 
all subclasses need threads is a good sign that thread management doesn't 
belong in the common interface.

I'm also curious how you feel about the idea of making the interface 
{{Closeable}}, as we've done with many other interfaces such as 
{{FailoverProxyProvider}}, {{ServicePlugin}}, {{BlockReader}}, {{Peer}}, 
{{PeerServer}}, {{FsVolumeReference}}, etc. etc.  The compiler and various 
linters warn about failures to close {{Closeable}} objects in many cases, but 
not about failure to call custom shutdown funtions.


was (Author: cmccabe):
bq. It makes it more obvious when someone overrides the class where things are.

Hmm.  How about making the class {{final}} instead?

Re: {{DU}} versus {{WindowsDU}}. If you really want to separate the classes, I 
don't object, but I don't want the {{WindowsDU}} to be a subclass of the Linux 
{{DU}}.  That is just weird.

bq. Shutdown is needed. So it's very strange to have a shutdown without a start.

There is a start-- in {{GetSpaceUsedBuilder}}.  Having an "init" method that 
you have to call after initialization is an anti-pattern.  There is no reason 
why the user should have to care whether {{GetSpaceUsedBuilder}} contains a 
thread or not-- many implementations won't need a thread.  The fact that not 
all subclasses need threads is a good sign that thread management doesn't 
belong in the common interface.

I'm also curious how you feel about the idea of making the interface 
{{Closeable}}, as we've done with many other interfaces such as 
{{FailoverProxyProvider}}, {{ServicePlugin}}, {{BlockReader}}, {{Peer}}, 
{{PeerServer}}, {{FsVolumeReference}}, etc. etc.

> make DU pluggable
> -
>
> Key: HADOOP-12973
> URL: https://issues.apache.org/jira/browse/HADOOP-12973
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HADOOP-12973v0.patch, HADOOP-12973v1.patch, 
> HADOOP-12973v10.patch, HADOOP-12973v2.patch, HADOOP-12973v3.patch, 
> HADOOP-12973v5.patch, HADOOP-12973v6.patch, HADOOP-12973v7.patch, 
> HADOOP-12973v8.patch, HADOOP-12973v9.patch
>
>
> If people are concerned about replacing the call to DU. Then an easy first 
> step is to make it pluggable. Then it's possible to replace it with something 
> while leaving the default alone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >