[jira] [Created] (MAPREDUCE-5902) JobHistoryServer needs more debug logs.
jay vyas created MAPREDUCE-5902: --- Summary: JobHistoryServer needs more debug logs. Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5894) Make critical YARN properties first class citizens in the build.
jay vyas created MAPREDUCE-5894: --- Summary: Make critical YARN properties first class citizens in the build. Key: MAPREDUCE-5894 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5894 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: jay vyas We recently found that when deploy hadoop 2.2 with hadoop 2.0 values {noformat} mapreduce_shuffle {noformat} changed to {noformat} mapreduce.shuffle {noformat} . There are likewise many similar examples of parameters which become deprecated over time. See http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html I suggest we: 1) Use the *set of parameters which are deprecated* over time into java class which ships directly with the code, maybe even as a static list inside of Configuration() itself, with *optional extended parameters read from a configurable parameter *, so that ecosystem users (i.e. like Hbase, or alternative file systems) can add their own deprecation info. 2) have this list *checked on yarn daemon startup*. so that unused parameters which are *obviously artifacts are flagged immediately* by the daemon failing immediately. 3)Have a list of all mandatory *current* parameters stored in the code, and also, a list of deprecated ones. Then, have the build * automatically fail * a parameter in the madatory list is NOT accessed. this would (a) make it so that unit testing of parameters does not regress and (b) force all updates to the code which change a parameter name, to also include update to deprecated parameter list, before build passes. -- This message was sent by Atlassian JIRA (v6.2#6252)
mapreduce.framework.name -- Where is the yarn service embedded?
The mapred execution engine is checked in the Cluster.java source, and each Service implementation is scanned through and then selected based on the match to the configuration property mapreduce.framework.name ,,, but How and where do JDK Service implementations that encapsulate this information get packaged into hadoop jars, ? Is there a generic way in the hadoop build that the JDK Service API is implemented ? Thanks. -- Jay Vyas http://jayunit100.blogspot.com
Re: Hadoop Test libraries: Where did they go ?
Yup , we figured it out eventually. The artifacts now use the test-jar directive which creates a jar file that you can reference in mvn using the type tag in your dependencies. However, fyi, I haven't been able to successfully google for the quintessential classes in the hadoop test libs like the fs BaseContractTest by name, so they are now harder to find then before So i think it's unfortunate that they are not a top level maven artifact. It's misleading, as It's now very easy to assume from looking at hadoop in mvn central that hadoop-test is just an old library that nobody updates anymore. Just a thought but Maybe hadoop-test could be rejuvenated to point to the hadoop-commons some how? On Nov 25, 2013, at 4:52 AM, Steve Loughran ste...@hortonworks.com wrote: I see a hadoop-common-2.2.0-tests.jar in org.apache.hadoop/hadoop-?common; SHA1 a9994d261d00295040a402cd2f611a2bac23972a, which resolves in a search engine to http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.2.0/ It looks like it is now part of the hadoop-common artifacts, you just say you want the test bits http://maven.apache.org/guides/mini/guide-attached-tests.html On 21 November 2013 23:28, Jay Vyas jayunit...@gmail.com wrote: It appears to me that http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-test Is no longer updated Where does hadoop now package the test libraries? Looking in the .//hadoop-common-project/hadoop-common/pom.xml file in the hadoop 2X branches, im not sure wether or not src/test is packaged into a jar anymore... but i fear it is not. -- Jay Vyas http://jayunit100.blogspot.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Created] (MAPREDUCE-5572) Provide alternative logic for getPos() implementation in custom RecordReader
jay vyas created MAPREDUCE-5572: --- Summary: Provide alternative logic for getPos() implementation in custom RecordReader Key: MAPREDUCE-5572 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5572 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Affects Versions: 1.2.1, 1.2.0, 1.1.1, 1.1.0, 1.1.3, 1.2.2 Reporter: jay vyas Priority: Minor The custom RecordReader class defines the getPos() as follows: long currentOffset = currentStream == null ? 0 : currentStream.getPos(); ... This is meant to prevent errors when underlying stream is null. But it doesn't gaurantee to work: The RawLocalFileSystem, for example, currectly will close the underlying file stream once it is consumed, and the currentStream will thus throw a NullPointerException when trying to access the null stream. This is only seen when running this in the context where the MapTask class, which is only relevant in mapred.* API, calls getPos() twice in tandem, before and after reading a record. This custom record reader should be gaurded, or else eliminated, since it assumes something which is not in the FileSystem contract: That a getPos will always return a integral value. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (MAPREDUCE-5511) Multifilewc and the mapred.* API: Is the use of getPos() valid?
jay vyas created MAPREDUCE-5511: --- Summary: Multifilewc and the mapred.* API: Is the use of getPos() valid? Key: MAPREDUCE-5511 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5511 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: jay vyas Priority: Minor The MultiFileWordCount class in the hadoop examples libraries uses a record reader which switches between files. This behaviour can cause the RawLocalFileSystem to break in a concurrent environment because of the way buffering works (in RawLocalFileSystem, switching between streams results in a temproraily null inner stream, and that inner stream is called by the getPos() implementation in the custom RecordReader for MultiFileWordCount). There are basically 2 ways to handle this: 1) Wrap the getPos() implementation in the object returned by open() in the RawLocalFileSystem to cache the value of getPos() everytime it is called, so that calls to getPos() can return a valid long even if underlying stream is null. OR 2) Update the RecordReader in multifilewc to not rely on the inner input stream and cache the position / return 0 if the stream cannot return a valid value. The final question here is: Is the RecordReader for MultiFileWordCount doing the right thing ? Or is it breaking the contract of getPos()... and really... what SHOULD getPos() return if the underlying stream has already been consumed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Proxying FileSystem.get()
Well I want the method calls to be wrapped dynamically, rather than individually wrapping each one of them and manually wrapping the calls. That way the Wrapping file system can be used with any underlying FileSystem base from any version . If manually wrapping the underlying FileSystem, then underlying changes in different versions of hadoop wont be reflected - and the code would require maintainance with respect to new FileSystem contracts. On Fri, Aug 16, 2013 at 3:24 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Not sure about your final intention, but a new FileSystem impl wrapping/composing the underlying file system should work. No? Thanks, +Vinod On Aug 16, 2013, at 11:08 AM, Jay Vyas wrote: Hi mapred: I'd like to proxy calls made to the FileSystem's created during mapreduce jobs. However, since the common way jobs work is to use the FileSystem.get(..), it doesnt seem like an InvocationHandler will be a solution (because it requires use of the Proxy.newInstance operation). Any good way to reroute all calls to a FileSystem so that they go through a particular dynamic proxy ? Maybe a pure AOP solution would be better, but havent been able to figure one out yet. This is relevant to debugging the way different FileSystem implementations behave beneath mapred ,. http://stackoverflow.com/questions/18279397/using-aspects-to-inject- invocationhandlers-without-proxy-class -- Jay Vyas http://jayunit100.blogspot.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Jay Vyas http://jayunit100.blogspot.com
ConfigKeys wrappers for MapReduce source code base
Hi guys: A breif check wiht find ./ -name *ConfigKeys* doesn't seem to indicate that there is a MapRedConfigKeys class... Should there be one to help get rid of magic numbers and unify the namespace? This seems to be the goal of the DFSConfigKeys class in the HDFS source tree. ... Or are there differences in the way configuration values are handled in mapred versus hdfs code bases ? For example : job.getInt(JobContext.IO_SORT_FACTOR, 100) in the ReduceTask class would more typically be implemented (if in hdfs) using the DFSConfigKeys static class which stores defaults and configuration parameter names. Just curious wether there is any goal to take the Mapred configuration values and unify their namespace in mapred/common in the same way that seems to have been done in hdfs using the DFSConfigKeys class. -- Jay Vyas http://jayunit100.blogspot.com
[jira] [Created] (MAPREDUCE-5165) Create MiniMRCluster version which uses the mapreduce package.
jay vyas created MAPREDUCE-5165: --- Summary: Create MiniMRCluster version which uses the mapreduce package. Key: MAPREDUCE-5165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: jay vyas Priority: Minor The MiniMapRedCluster class references some older mapred.* classes. It could be recreated in the mapreduce package to use the Configuration class instead of JobConf, which would make it simpler to use and integrate with new FS implementations and test harnesses that use new Configuration (not JobConf) objects to drive tests. This could be done many ways: 1) using inheritance or else 2) by copying the code directly The appropriate implementation depends on wether or not 1) Is it okay for mapreduce.* classes to depend on mapred.* classes ? 2) Is the mapred MiniMRCluster implementation going to be deprecated or eliminated anytime? 3) What is the future of the JobConf class - which has been deprecated and then undeprecated ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
JobConf and MiniMapRedCluster
Hi guys: the MiniMapRedCluster seems like a very useful tool which I just discovered in this blog post : http://grepalex.com/2012/10/20/hadoop-unit-testing-with-minimrcluster/ But it looks like MiniMapRedCluster http://svn.apache.org/viewvc/hadoop/common/tags/release-1.0.3/src/test/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java?view=co is still using JobConf instead of the Configured/Tool interface. Any plans to update this or should file a JIRA? -- Jay Vyas http://jayunit100.blogspot.com
Re: JobConf and MiniMapRedCluster
Only one response, inline below... Certainly I will file a JIRA and some updates if this makes sense :) Would love to bring the minimrcluster class up to date! On Apr 18, 2013, at 12:49 AM, Harsh J ha...@cloudera.com wrote: Why do you imagine a test case would need the Configured and Tool interfaces, which are more useful for actual client apps? Because - the JobConf is deprecated --- then shouldn't the classes which depend upon it be update to use the Configured interface? Or did you mean these should support running Tool apps? Any plans to update this or should file a JIRA? No plans as far as I'm aware; please do file a JIRA with a patch if this makes sense to improve. Also do check out trunk first. On Wed, Apr 17, 2013 at 11:36 PM, Jay Vyas jayunit...@gmail.com wrote: Hi guys: the MiniMapRedCluster seems like a very useful tool which I just discovered in this blog post : http://grepalex.com/2012/10/20/hadoop-unit-testing-with-minimrcluster/ But it looks like MiniMapRedCluster http://svn.apache.org/viewvc/hadoop/common/tags/release-1.0.3/src/test/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java?view=co is still using JobConf instead of the Configured/Tool interface. Any plans to update this or should file a JIRA? -- Jay Vyas http://jayunit100.blogspot.com -- Harsh J
Re: JobConf and MiniMapRedCluster
Okay, thanks Ill look into this JIRA . it is clear from some light googling that, at some point ** some version of JobConf was deprecated, and that maybe again it was undeprecated or maybe moved** Will have to look into this more formally to really determine what's going on. On Apr 18, 2013, at 1:16 AM, Harsh J ha...@cloudera.com wrote: Am not sure I totally understand yet. JobConf isn't deprecated, and is still a (and the only) valid way to use the older mapred.* API. If you mean we should shift the tests over to the new API (mapreduce.*, and Job) then am all for it. The Tool+Configured extensions are good for ToolRunner.run(…) invoked classes, which I guess is also a good way to write a base test invoking class, but you'd have to end up changing a lot of test classes for this. On Thu, Apr 18, 2013 at 10:24 AM, Jay Vyas jayunit...@gmail.com wrote: Only one response, inline below... Certainly I will file a JIRA and some updates if this makes sense :) Would love to bring the minimrcluster class up to date! On Apr 18, 2013, at 12:49 AM, Harsh J ha...@cloudera.com wrote: Why do you imagine a test case would need the Configured and Tool interfaces, which are more useful for actual client apps? Because - the JobConf is deprecated --- then shouldn't the classes which depend upon it be update to use the Configured interface? Or did you mean these should support running Tool apps? Any plans to update this or should file a JIRA? No plans as far as I'm aware; please do file a JIRA with a patch if this makes sense to improve. Also do check out trunk first. On Wed, Apr 17, 2013 at 11:36 PM, Jay Vyas jayunit...@gmail.com wrote: Hi guys: the MiniMapRedCluster seems like a very useful tool which I just discovered in this blog post : http://grepalex.com/2012/10/20/hadoop-unit-testing-with-minimrcluster/ But it looks like MiniMapRedCluster http://svn.apache.org/viewvc/hadoop/common/tags/release-1.0.3/src/test/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java?view=co is still using JobConf instead of the Configured/Tool interface. Any plans to update this or should file a JIRA? -- Jay Vyas http://jayunit100.blogspot.com -- Harsh J -- Harsh J
Mapreduce migration to mvn ?
Hi guys : Seems like it would be simpler if the existing mapreduce repo had a pom.xml for building, rather than build.xml. Could there be a JIRA made to this effect? -- Jay Vyas http://jayunit100.blogspot.com