[jira] [Commented] (HADOOP-14973) Log StorageStatistics

Steve Loughran (JIRA) Tue, 24 Oct 2017 03:03:08 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216638#comment-16216638
 ]


Steve Loughran commented on HADOOP-14973:
-----------------------------------------

First,  sean, tag versions, give title a hint it's for S3, mark as improvement, 
move under HADOOP-14831 so it can be tracked for Hadoop 1

Second, you haven't called FileSystem.toString() for a while have you? Or 
FSDataInputStream.toString()? Because it prints all this stuff. How else do you 
think all the seek optimisation work was debugged?
{code}
2017-10-10 16:23:47,050 [ScalaTest-main-running-S3ADataFrameSuite] INFO  
s3.S3ADataFrameSuite (Logging.scala:logInfo(54)) - Duration of scan result list 
= 2,118,450 nS
2017-10-10 16:23:47,050 [ScalaTest-main-running-S3ADataFrameSuite] INFO  
s3.S3ADataFrameSuite (Logging.scala:logInfo(54)) - FileSystem 
S3AFileSystem{uri=s3a://hwdev-steve-ireland-new, 
workingDir=s3a://hwdev-steve-ireland-new/user/stevel, inputPolicy=random, 
partSize=8388608, enableMultiObjectsDelete=true, maxKeys=5000, 
readAhead=262144, blockSize=1048576, multiPartThreshold=2147483647, 
serverSideEncryptionAlgorithm='NONE', 
blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@64f6964f, 
metastore=NullMetadataStore, authoritative=false, useListV1=false, 
boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=25,
 available=25, waiting=0}, activeCount=0}, 
unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@60291e59[Running, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], 
statistics {182521443 bytes read, 39004 bytes written, 207 read ops, 0 large 
read ops, 76 write ops}, metrics {{Context=S3AFileSystem} 
{FileSystemId=e62eeb1a-cced-473b-95f3-06c9910604ad-hwdev-steve-ireland-new} 
{fsURI=s3a://hwdev-steve-ireland-new} {files_created=0} {files_copied=0} 
{files_copied_bytes=0} {files_deleted=0} {fake_directories_deleted=0} 
{directories_created=0} {directories_deleted=0} {ignored_errors=0} 
{op_copy_from_local_file=0} {op_exists=0} {op_get_file_status=1} 
{op_glob_status=0} {op_is_directory=0} {op_is_file=0} {op_list_files=1} 
{op_list_located_status=0} {op_list_status=0} {op_mkdirs=0} {op_rename=0} 
{object_copy_requests=0} {object_delete_requests=0} {object_list_requests=2} 
{object_continue_list_requests=0} {object_metadata_requests=2} 
{object_multipart_aborted=0} {object_put_bytes=0} {object_put_requests=0} 
{object_put_requests_completed=0} {stream_write_failures=0} 
{stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} 
{stream_write_block_uploads_aborted=0} {stream_write_total_time=0} 
{stream_write_total_data=0} {committer_commits_created=0} 
{committer_commits_completed=0} {committer_jobs_completed=0} 
{committer_jobs_failed=0} {committer_tasks_completed=0} 
{committer_tasks_failed=0} {committer_bytes_committed=0} 
{committer_bytes_uploaded=0} {committer_commits_failed=0} 
{committer_commits_aborted=0} {committer_commits_reverted=0} 
{s3guard_metadatastore_put_path_request=1} 
{s3guard_metadatastore_initialization=0} {s3guard_metadatastore_retry=0} 
{s3guard_metadatastore_throttled=0} {store_io_throttled=0} 
{object_put_requests_active=0} {object_put_bytes_pending=0} 
{stream_write_block_uploads_active=0} {stream_write_block_uploads_pending=0} 
{stream_write_block_uploads_data_pending=0} 
{S3guard_metadatastore_put_path_latencyNumOps=0} 
{S3guard_metadatastore_put_path_latency50thPercentileLatency=0} 
{S3guard_metadatastore_put_path_latency75thPercentileLatency=0} 
{S3guard_metadatastore_put_path_latency90thPercentileLatency=0} 
{S3guard_metadatastore_put_path_latency95thPercentileLatency=0} 
{S3guard_metadatastore_put_path_latency99thPercentileLatency=0} 
{S3guard_metadatastore_throttle_rateNumEvents=0} 
{S3guard_metadatastore_throttle_rate50thPercentileFrequency (Hz)=0} 
{S3guard_metadatastore_throttle_rate75thPercentileFrequency (Hz)=0} 
{S3guard_metadatastore_throttle_rate90thPercentileFrequency (Hz)=0} 
{S3guard_metadatastore_throttle_rate95thPercentileFrequency (Hz)=0} 
{S3guard_metadatastore_throttle_rate99thPercentileFrequency (Hz)=0} 
{stream_read_fully_operations=0} {stream_opened=0} 
{stream_bytes_skipped_on_seek=0} {stream_closed=0} 
{stream_bytes_backwards_on_seek=0} {stream_bytes_read=0} 
{stream_read_operations_incomplete=0} {stream_bytes_discarded_in_abort=0} 
{stream_close_operations=0} {stream_read_operations=0} {stream_aborted=0} 
{stream_forward_seek_operations=0} {stream_backward_seek_operations=0} 
{stream_seek_operations=0} {stream_bytes_read_in_close=0} 
{stream_read_exceptions=0} }}
- DataFrames
2017-10-10 16:23:47,051 [ScalaTest-main-running-S3ADataFrameSuite] INFO  
s3.S3ADataFrameSuite (Logging.scala:logInfo(54)) - Cleaning 
s3a://hwdev-steve-ireland-new/cloud-integration/DELAY_LISTING_ME/S3ADataFrameSuite
S3AOrcRelationSuite:
{code}

See? That's from a Spark {{logInfo(s"Stats $filesystem")}} instruction, with no 
changes make to the spark codebase at all.

w.r.t broad stats there, what is needed is: aggregate collection of stats from 
executors. where the work for a specific executor contains the stats for that 
task, rather than the statistics summary for the entire life of the shared 
process. Same for Tez, I expect

* the _SUCCESS file in the HADOOP-13786 patch collects the VM stats and 
aggregates them; it doesn't do what is needed, which is per-thread 
collection/diff.
* There's been discussion in Spark PRs about improving how executor stats are 
collected (currently it just does a {{listFiles(task-output-dir, 
true).map(status => status.len).sum()}} .  Tasks should be able to return a 
full map string-> long of that tasks' stats and aggregate them.

This is broader than just s3; it needs to cover all stores, plus let committers 
& executors add more data. 

[~liuml07] has done some of the initial work on chaining up StorageStats.

Anyway, if all you want is logging s3a stats, toString() does it, so I'd close 
it as a WORKSFORME. However, we do need to glue together the entire storage 
stats mechanism, finishing off Mingliang's work. Well volunteered!


> Log StorageStatistics
> ---------------------
>
>                 Key: HADOOP-14973
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14973
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>
> S3A is currently storing much more detailed metrics via StorageStatistics 
> than are logged in a MapReduce job. Eventually, it would be nice to get 
> Spark, MapReduce and other workloads to retrieve and store these metrics, but 
> it may be some time before they all do that. I'd like to consider having S3A 
> publish the metrics itself in some form. This is tricky, as S3A has no daemon 
> but lives inside various other processes.
> Perhaps writing to a log file at some configurable interval and on close() 
> would be the best we could do. Other ideas would be welcome.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14973) Log StorageStatistics

Reply via email to