[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-03-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216509#comment-15216509
 ] 

Hitesh Shah commented on HDFS-10175:


[~cmccabe] For cluster admins who keep tabs on Hive queries or Pig scripts to 
see what kind of ops they are doing are using the job counters today. 
Unfortunately, the current job counters - read ops, write ops. etc are quite 
old and obsolete and are very high level and cannot be used to find bad jobs 
which say create 1000s of files ( for partitions, etc ). The approach Jitendra 
and Minglian seem to be proposing seems to integrate well with Mapreduce, etc 
where counters could be created to map to the filesystem stats. Additionally, 
these counters would be available for historical analysis as needed via the job 
history. This is something which would likely be needed for all jobs.  

To some extent I do agree that all the 50-odd metrics being tracked may not be 
useful for all use-cases. However, enabling this via htrace and turning on 
htrace for all jobs is also not an option as it does not allow any selective 
tracking of only certain info. Htrace being enabled for a certain job means too 
indepth profiling which is not really needed unless necessary for very specific 
profiling. Do you have any suggestions on the app layers above HDFS can obtain 
more in-depth stats but at the same time have all of these stats be accessible 
for all jobs? 





> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9343) Empty caller context considered invalid

2015-10-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981668#comment-14981668
 ] 

Hitesh Shah commented on HDFS-9343:
---

Comments: 

isValid() - is this meant to be a public API? Should this be renamed to 
isContextValid() as it is only checking the context part? 

{code}
if (callerContext.getSignature() != null &&   
callerContext.getSignature().length <= callerSignatureMaxLen) {
{code}
   - Not checking for zero-length string? 
   - should signature validity check be moved into a function too? 

> Empty caller context considered invalid
> ---
>
> Key: HDFS-9343
> URL: https://issues.apache.org/jira/browse/HDFS-9343
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9343.000.patch
>
>
> The caller context with empty context string is considered invalid, and it 
> should not appear in the audit log.
> Meanwhile, too long signature will not be written to audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9343) Empty caller context considered invalid

2015-10-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981797#comment-14981797
 ] 

Hitesh Shah commented on HDFS-9343:
---

Might be safer to do an empty string check anyway as the logger and setSig() 
are in different parts of the codebase. 

> Empty caller context considered invalid
> ---
>
> Key: HDFS-9343
> URL: https://issues.apache.org/jira/browse/HDFS-9343
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9343.000.patch, HDFS-9343.001.patch
>
>
> The caller context with empty context string is considered invalid, and it 
> should not appear in the audit log.
> Meanwhile, too long signature will not be written to audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9343) Empty caller context considered invalid

2015-10-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981799#comment-14981799
 ] 

Hitesh Shah commented on HDFS-9343:
---

In any case, the patch looks fine. +1.

> Empty caller context considered invalid
> ---
>
> Key: HDFS-9343
> URL: https://issues.apache.org/jira/browse/HDFS-9343
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9343.000.patch, HDFS-9343.001.patch
>
>
> The caller context with empty context string is considered invalid, and it 
> should not appear in the audit log.
> Meanwhile, too long signature will not be written to audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-4956) DistributedFileSystem does not handle CreateFlag.APPEND in create call

2013-07-03 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned HDFS-4956:
-

Assignee: Arun C Murthy

 DistributedFileSystem does not handle CreateFlag.APPEND in create call
 --

 Key: HDFS-4956
 URL: https://issues.apache.org/jira/browse/HDFS-4956
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Suresh Srinivas
Assignee: Arun C Murthy

 Currently DistributedFileSystem does not handle CreateFlag.APPEND in the 
 implementation of FileSystem#create() method. It only support OVERWRITE  
 CREATE or just CREATE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-4956) DistributedFileSystem does not handle CreateFlag.APPEND in create call

2013-07-03 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned HDFS-4956:
-

Assignee: (was: Arun C Murthy)

 DistributedFileSystem does not handle CreateFlag.APPEND in create call
 --

 Key: HDFS-4956
 URL: https://issues.apache.org/jira/browse/HDFS-4956
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Suresh Srinivas

 Currently DistributedFileSystem does not handle CreateFlag.APPEND in the 
 implementation of FileSystem#create() method. It only support OVERWRITE  
 CREATE or just CREATE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4820) Remove hdfs-default.xml

2013-05-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657206#comment-13657206
 ] 

Hitesh Shah commented on HDFS-4820:
---

@Suresh, if it is just for documentation, it could be kept around in some form 
but I think the main thing that [~sseth] is trying to raise is that it should 
not be part of the jar and should not be looked for and loaded in by default 
into the Configuration object.

 Remove hdfs-default.xml
 ---

 Key: HDFS-4820
 URL: https://issues.apache.org/jira/browse/HDFS-4820
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth

 Similar to YARN-673, which contains additional details.
 There's separate jiras for YARN, MR and HDFS so enough people take a look. 
 Looking for reasons for these files to exist, other than the ones mentioned 
 in YARN-673, or a good reason to keep the files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3409) Secondary namenode should expose checkpoint info via JMX

2012-05-11 Thread Hitesh Shah (JIRA)
Hitesh Shah created HDFS-3409:
-

 Summary: Secondary namenode should expose checkpoint info via JMX
 Key: HDFS-3409
 URL: https://issues.apache.org/jira/browse/HDFS-3409
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.2
Reporter: Hitesh Shah


Information such as when the last checkpoint was done should be exposed via JMX 
so that it can be easily queried via scripts. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3409) Secondary namenode should expose checkpoint info via JMX

2012-05-11 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273618#comment-13273618
 ] 

Hitesh Shah commented on HDFS-3409:
---

@Aaron, thanks for pointing out the other JIRA. 

Regarding where the information is exposed, I would defer to anyone who is more 
well versed with HDFS to make a call on where it makes sense to expose this 
information for 1.x ( whether it is in the namenode if it tracks the last 
checkpoint time or do it separately in each SNN given that multiple SNNs are 
supported ). Having information of the last valid checkpoint time regardless of 
on which SNN it was done 'may' be sufficient from an operational point of view.
 

 Secondary namenode should expose checkpoint info via JMX
 

 Key: HDFS-3409
 URL: https://issues.apache.org/jira/browse/HDFS-3409
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.2
Reporter: Hitesh Shah

 Information such as when the last checkpoint was done should be exposed via 
 JMX so that it can be easily queried via scripts. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira