[jira] [Created] (MAPREDUCE-5902) JobHistoryServer needs more debug logs.

2014-05-22 Thread jay vyas (JIRA)
jay vyas created MAPREDUCE-5902:
---

 Summary: JobHistoryServer needs more debug logs.
 Key: MAPREDUCE-5902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: jay vyas


With the JobHistory Server , it appears that its possible sometimes to skip 
over certain history files.  I havent been able to determine why yet, but I've 
found that some long named .jhist files aren't getting collected into the done/ 
directory.

After tracing some in the actual source, and turning on DEBUG level logging, it 
became clear that this snippet is an important workhorse 
(scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately 
boil down to scanDirectory()).  

It would be extremely useful , then, to have a couple of gaurded logs at this 
level of the code, so that we can see, in the log folders, why files are being 
filtered out  , i.e. it is due to filterint or visibility.

{noformat}

  private static ListFileStatus scanDirectory(Path path, FileContext fc,
  PathFilter pathFilter) throws IOException {
path = fc.makeQualified(path);
ListFileStatus jhStatusList = new ArrayListFileStatus();
RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path);
while (fileStatusIter.hasNext()) {
  FileStatus fileStatus = fileStatusIter.next();
  Path filePath = fileStatus.getPath();
  if (fileStatus.isFile()  pathFilter.accept(filePath)) {
jhStatusList.add(fileStatus);
  }
}
return jhStatusList;
  }

{noformat}





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5894) Make critical YARN properties first class citizens in the build.

2014-05-17 Thread jay vyas (JIRA)
jay vyas created MAPREDUCE-5894:
---

 Summary: Make critical YARN properties first class citizens in the 
build.
 Key: MAPREDUCE-5894
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5894
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: jay vyas


We recently found that when deploy hadoop 2.2 with hadoop 2.0 values 
{noformat} mapreduce_shuffle {noformat} changed to {noformat}  
mapreduce.shuffle {noformat} .  

There are likewise many similar examples of parameters which become deprecated 
over time.   See 
http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

I suggest we:

1)  Use the *set of parameters which are deprecated* over time into java class 
which ships directly with the code, maybe even as a static list inside of 
Configuration() itself, with *optional extended parameters read from a 
configurable parameter *, so that ecosystem users (i.e. like Hbase, or 
alternative file systems)  can add their own deprecation info.

2) have this list *checked on yarn daemon startup*.  so that unused parameters 
which are *obviously artifacts are flagged immediately* by the daemon failing 
immediately.

3)Have a list of all mandatory *current* parameters stored in the code, and 
also, a list of deprecated ones. Then, have the build * automatically fail * a 
parameter in the madatory list is NOT accessed.  this would (a) make it so 
that unit testing of parameters does not regress and (b) force all updates to 
the code which change a parameter name, to also include update to deprecated 
parameter list, before build passes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


mapreduce.framework.name -- Where is the yarn service embedded?

2014-04-11 Thread Jay Vyas
The mapred execution engine  is checked in the Cluster.java source, and
each Service implementation is scanned through and then selected based on
the match to the configuration property mapreduce.framework.name 


,,, but How and where do JDK Service implementations that encapsulate this
information get packaged into hadoop jars, ?  Is there a generic way in the
hadoop build that the JDK Service API is implemented ?

Thanks.

-- 
Jay Vyas
http://jayunit100.blogspot.com


Re: Hadoop Test libraries: Where did they go ?

2013-11-25 Thread Jay Vyas
Yup , we figured it out eventually.
The artifacts now use the test-jar directive which creates a jar file that you 
can reference in mvn using the type tag in your dependencies.

However, fyi, I haven't been able to successfully google for the quintessential 
classes in the hadoop test libs like the fs BaseContractTest by name, so they 
are now harder to find then before

So i think it's unfortunate that they are not a top level maven artifact.

It's misleading, as It's now very easy to assume from looking at hadoop in mvn 
central that hadoop-test is just an old library that nobody updates anymore.

Just a thought but Maybe hadoop-test could be rejuvenated to point to the 
hadoop-commons some how?


 On Nov 25, 2013, at 4:52 AM, Steve Loughran ste...@hortonworks.com wrote:
 
 I see a hadoop-common-2.2.0-tests.jar in org.apache.hadoop/hadoop-?common;
 SHA1 a9994d261d00295040a402cd2f611a2bac23972a, which resolves in a search
 engine to
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.2.0/
 
 It looks like it is now part of the hadoop-common artifacts, you just say
 you want the test bits
 
 http://maven.apache.org/guides/mini/guide-attached-tests.html
 
 
 
 On 21 November 2013 23:28, Jay Vyas jayunit...@gmail.com wrote:
 
 It appears to me that
 
 http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-test
 
 Is no longer updated
 
 Where does hadoop now package the test libraries?
 
 Looking in the .//hadoop-common-project/hadoop-common/pom.xml  file in
 the hadoop 2X branches, im not sure wether or not src/test is packaged into
 a jar anymore... but i fear it is not.
 
 --
 Jay Vyas
 http://jayunit100.blogspot.com
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


[jira] [Created] (MAPREDUCE-5572) Provide alternative logic for getPos() implementation in custom RecordReader

2013-10-07 Thread jay vyas (JIRA)
jay vyas created MAPREDUCE-5572:
---

 Summary: Provide alternative logic for getPos() implementation in 
custom RecordReader
 Key: MAPREDUCE-5572
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5572
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 1.2.1, 1.2.0, 1.1.1, 1.1.0, 1.1.3, 1.2.2
Reporter: jay vyas
Priority: Minor


The custom RecordReader class defines the getPos() as follows:

long currentOffset = currentStream == null ? 0 : currentStream.getPos();
...

This is meant to prevent errors when underlying stream is null. But it doesn't 
gaurantee to work: The RawLocalFileSystem, for example, currectly will close 
the underlying file stream once it is consumed, and the currentStream will thus 
throw a NullPointerException when trying to access the null stream.

This is only seen when running this in the context where the MapTask class, 
which is only relevant in mapred.* API, calls getPos() twice in tandem, before 
and after reading a record.

This custom record reader should be gaurded, or else eliminated, since it 
assumes something which is not in the FileSystem contract:  That a getPos will 
always return a integral value.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5511) Multifilewc and the mapred.* API: Is the use of getPos() valid?

2013-09-16 Thread jay vyas (JIRA)
jay vyas created MAPREDUCE-5511:
---

 Summary: Multifilewc and the mapred.* API:  Is the use of getPos() 
valid?
 Key: MAPREDUCE-5511
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5511
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Reporter: jay vyas
Priority: Minor


The MultiFileWordCount class in the hadoop examples libraries uses a record 
reader which switches between files.  This behaviour can cause the 
RawLocalFileSystem to break in a concurrent environment because of the way 
buffering works (in RawLocalFileSystem, switching between streams results in a 
temproraily null inner stream, and that inner stream is called by the 
getPos() implementation in the custom RecordReader for MultiFileWordCount). 

There are basically 2 ways to handle this:

1) Wrap the getPos() implementation in the object returned by open() in the 
RawLocalFileSystem to cache the value of getPos() everytime it is called, so 
that calls to getPos() can return a valid long even if underlying stream is 
null. OR

2) Update the RecordReader in multifilewc to not rely on the inner input stream 
and cache the position / return 0 if the stream cannot return a valid value. 

The final question here is:  Is the RecordReader for MultiFileWordCount doing 
the right thing ?  Or is it breaking the contract of getPos()... and really... 
what SHOULD getPos() return if the underlying stream has already been consumed? 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Proxying FileSystem.get()

2013-08-16 Thread Jay Vyas
Well I want the method calls to be wrapped dynamically, rather than
individually wrapping each one of them and manually wrapping the calls.
 That way the Wrapping file system can be used with any underlying
FileSystem base from any version .

If manually wrapping the underlying FileSystem, then underlying changes in
different versions of hadoop wont be reflected - and the code would require
maintainance with respect to new FileSystem contracts.




On Fri, Aug 16, 2013 at 3:24 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 Not sure about your final intention, but a new FileSystem impl
 wrapping/composing the underlying file system should work. No?

 Thanks,
 +Vinod

 On Aug 16, 2013, at 11:08 AM, Jay Vyas wrote:

  Hi mapred:
 
  I'd like to proxy calls made to the FileSystem's created during mapreduce
  jobs.
 
  However, since the common way jobs work is to use the FileSystem.get(..),
  it doesnt seem like an InvocationHandler will be a solution (because it
  requires use of the Proxy.newInstance operation).
 
  Any good way to reroute all calls to a FileSystem so that they go
 through a
  particular dynamic proxy ?  Maybe a pure AOP solution would be better,
 but
  havent been able to figure one out yet.
 
  This is relevant to debugging the way different FileSystem
 implementations
  behave beneath mapred ,.
 
  http://stackoverflow.com/questions/18279397/using-aspects-to-inject-
  invocationhandlers-without-proxy-class
 
  --
  Jay Vyas
  http://jayunit100.blogspot.com


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Jay Vyas
http://jayunit100.blogspot.com


ConfigKeys wrappers for MapReduce source code base

2013-04-21 Thread Jay Vyas
Hi guys:

A breif check wiht find ./ -name *ConfigKeys* doesn't seem to indicate
that there is a MapRedConfigKeys class... Should there be one to help get
rid of magic numbers and unify the namespace?  This seems to be the goal of
the DFSConfigKeys class in the HDFS source tree.

... Or are there differences in the way configuration values are handled in
mapred versus hdfs code bases ?

For example :

job.getInt(JobContext.IO_SORT_FACTOR, 100) in the ReduceTask class would
more typically be implemented (if in hdfs) using the DFSConfigKeys static
class which stores defaults and configuration parameter names.

Just curious wether there is any goal to take the Mapred configuration
values and unify their namespace in mapred/common in the same way that
seems to have been done in hdfs using the DFSConfigKeys class.

-- 
Jay Vyas
http://jayunit100.blogspot.com


[jira] [Created] (MAPREDUCE-5165) Create MiniMRCluster version which uses the mapreduce package.

2013-04-18 Thread jay vyas (JIRA)
jay vyas created MAPREDUCE-5165:
---

 Summary: Create MiniMRCluster version which uses the mapreduce 
package.
 Key: MAPREDUCE-5165
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5165
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: jay vyas
Priority: Minor


The MiniMapRedCluster class references some older mapred.* classes.  

It could be recreated in the mapreduce package to use the Configuration class 
instead of JobConf, which would make it simpler to use and integrate with new 
FS implementations and test harnesses that use new Configuration (not JobConf) 
objects to drive tests.

This could be done many ways:

1) using inheritance or else 
2) by copying the code directly

The appropriate implementation depends on wether or not 

1) Is it okay for mapreduce.* classes to depend on mapred.* classes ?
2) Is the mapred MiniMRCluster implementation going to be deprecated or 
eliminated anytime? 
3) What is the future of the JobConf class - which has been deprecated and then 
undeprecated ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


JobConf and MiniMapRedCluster

2013-04-17 Thread Jay Vyas
Hi guys:

the MiniMapRedCluster seems like a very useful tool which I just discovered
in this blog post :

http://grepalex.com/2012/10/20/hadoop-unit-testing-with-minimrcluster/

But it looks like MiniMapRedCluster

http://svn.apache.org/viewvc/hadoop/common/tags/release-1.0.3/src/test/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java?view=co

is still using JobConf instead of the Configured/Tool interface.

Any plans to update this or should file a JIRA?

-- 
Jay Vyas
http://jayunit100.blogspot.com


Re: JobConf and MiniMapRedCluster

2013-04-17 Thread Jay Vyas
Only one response, inline below... Certainly I will file a JIRA and some 
updates if this makes sense :)
 
Would love to bring the minimrcluster class up to date!

On Apr 18, 2013, at 12:49 AM, Harsh J ha...@cloudera.com wrote:

 Why do you imagine a test case would need the Configured and Tool
 interfaces, which are more useful for actual client apps?

Because - the JobConf is deprecated --- then shouldn't the classes which depend 
upon it be update to use the Configured interface?


 Or did you mean these should support running Tool apps?
 
 Any plans to update this or should file a JIRA?
 
 No plans as far as I'm aware; please do file a JIRA with a patch if
 this makes sense to improve. Also do check out trunk first.
 
 On Wed, Apr 17, 2013 at 11:36 PM, Jay Vyas jayunit...@gmail.com wrote:
 Hi guys:
 
 the MiniMapRedCluster seems like a very useful tool which I just discovered
 in this blog post :
 
 http://grepalex.com/2012/10/20/hadoop-unit-testing-with-minimrcluster/
 
 But it looks like MiniMapRedCluster
 
 http://svn.apache.org/viewvc/hadoop/common/tags/release-1.0.3/src/test/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java?view=co
 
 is still using JobConf instead of the Configured/Tool interface.
 
 Any plans to update this or should file a JIRA?
 
 --
 Jay Vyas
 http://jayunit100.blogspot.com
 
 
 
 -- 
 Harsh J


Re: JobConf and MiniMapRedCluster

2013-04-17 Thread Jay Vyas
Okay, thanks Ill look into this JIRA .

it is clear from some light googling that, at some point

** some version of JobConf was deprecated, and that maybe again it was 
undeprecated or maybe moved**
 
Will have to look into this more formally to really determine what's going on.



On Apr 18, 2013, at 1:16 AM, Harsh J ha...@cloudera.com wrote:

 Am not sure I totally understand yet. JobConf isn't deprecated, and is
 still a (and the only) valid way to use the older mapred.* API.
 
 If you mean we should shift the tests over to the new API
 (mapreduce.*, and Job) then am all for it.
 
 The Tool+Configured extensions are good for ToolRunner.run(…) invoked
 classes, which I guess is also a good way to write a base test
 invoking class, but you'd have to end up changing a lot of test
 classes for this.
 
 On Thu, Apr 18, 2013 at 10:24 AM, Jay Vyas jayunit...@gmail.com wrote:
 Only one response, inline below... Certainly I will file a JIRA and some 
 updates if this makes sense :)
 
 Would love to bring the minimrcluster class up to date!
 
 On Apr 18, 2013, at 12:49 AM, Harsh J ha...@cloudera.com wrote:
 
 Why do you imagine a test case would need the Configured and Tool
 interfaces, which are more useful for actual client apps?
 
 Because - the JobConf is deprecated --- then shouldn't the classes which 
 depend upon it be update to use the Configured interface?
 
 
 Or did you mean these should support running Tool apps?
 
 Any plans to update this or should file a JIRA?
 
 No plans as far as I'm aware; please do file a JIRA with a patch if
 this makes sense to improve. Also do check out trunk first.
 
 On Wed, Apr 17, 2013 at 11:36 PM, Jay Vyas jayunit...@gmail.com wrote:
 Hi guys:
 
 the MiniMapRedCluster seems like a very useful tool which I just discovered
 in this blog post :
 
 http://grepalex.com/2012/10/20/hadoop-unit-testing-with-minimrcluster/
 
 But it looks like MiniMapRedCluster
 
 http://svn.apache.org/viewvc/hadoop/common/tags/release-1.0.3/src/test/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java?view=co
 
 is still using JobConf instead of the Configured/Tool interface.
 
 Any plans to update this or should file a JIRA?
 
 --
 Jay Vyas
 http://jayunit100.blogspot.com
 
 
 
 --
 Harsh J
 
 
 
 -- 
 Harsh J


Mapreduce migration to mvn ?

2013-04-09 Thread Jay Vyas
Hi guys :

Seems like it would be simpler if the existing mapreduce repo had a pom.xml
for building, rather than build.xml.

Could there be a JIRA made to this effect?

-- 
Jay Vyas
http://jayunit100.blogspot.com