[jira] [Commented] (HADOOP-15473) Configure serialFilter to avoid UnrecoverableKeyException caused by JDK-8189997
[ https://issues.apache.org/jira/browse/HADOOP-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477789#comment-16477789 ] Philip Zeyliger commented on HADOOP-15473: -- In dealing with the fallout of this in Impala, I was able to edit the KMS "init" script to add the relevant {{-D}} option. The change is at [https://gerrit.cloudera.org/#/c/10418/1/testdata/cluster/node_templates/cdh6/etc/init.d/kms] if you're interested. Do we need to actually expose this to our (Hadoop's) users? i.e., the only possible use of this API within a KMS process is {{org.apache.hadoop.crypto.key.JavaKeyStoreProvider}}, so could we just set this explicitly at start-up (either via the shell scripts or programmatically), and avoid exposing it to the users? (Or does a client use this API, or perhaps we have enough plugins that we need to be more careful?) > Configure serialFilter to avoid UnrecoverableKeyException caused by > JDK-8189997 > --- > > Key: HADOOP-15473 > URL: https://issues.apache.org/jira/browse/HADOOP-15473 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.7.6, 3.0.2 > Environment: JDK 8u171 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: HDFS-13494.001.patch, HDFS-13494.002.patch, > HDFS-13494.003.patch, org.apache.hadoop.crypto.key.TestKeyProviderFactory.txt > > > There is a new feature in JDK 8u171 called Enhanced KeyStore Mechanisms > (http://www.oracle.com/technetwork/java/javase/8u171-relnotes-430.html#JDK-8189997). > This is the cause of the following errors in the TestKeyProviderFactory: > {noformat} > Caused by: java.security.UnrecoverableKeyException: Rejected by the > jceks.key.serialFilter or jdk.serialFilter property > at com.sun.crypto.provider.KeyProtector.unseal(KeyProtector.java:352) > at > com.sun.crypto.provider.JceKeyStore.engineGetKey(JceKeyStore.java:136) > at java.security.KeyStore.getKey(KeyStore.java:1023) > at > org.apache.hadoop.crypto.key.JavaKeyStoreProvider.getMetadata(JavaKeyStoreProvider.java:410) > ... 28 more > {noformat} > This issue causes errors and failures in hbase tests right now (using hdfs) > and could affect other products running on this new Java version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15019) Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST
[ https://issues.apache.org/jira/browse/HADOOP-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244495#comment-16244495 ] Philip Zeyliger commented on HADOOP-15019: -- bq. Or, just use 'hadoop classpath' ... Yep, you're right. I had tried and failed, but I must have gotten something else wrong. > Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST > > > Key: HADOOP-15019 > URL: https://issues.apache.org/jira/browse/HADOOP-15019 > Project: Hadoop Common > Issue Type: Bug > Components: bin >Reporter: Philip Zeyliger > > If a user sets {{HADOOP_USER_CLASSPATH_FIRST=true}} and furthermore includes > a directory that's already in Hadoop's classpath via {{HADOOP_CLASSPATH}}, > that directory will appear later than it should in the eventual $CLASSPATH. I > believe this is because the de-duping at > https://github.com/apache/hadoop/blob/cbc632d9abf08c56a7fc02be51b2718af30bad28/hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh#L1200 > is ignoring the "before/after" parameter. > My way of reproduction, first build the following trivial Java program: > {code} > $cat Test.java > public class Test { > public static void main(String[]args) { > System.out.println(System.getenv().get("CLASSPATH")); > } > } > $javac Test.java > $jar cf test.jar Test.class > {code} > With that, if you happen to have an entry in HADOOP_CLASSPATH that matches > what Hadoop would produce, you'll find the ordering not honored. It's easiest > to reproduce this with a match for HADOOP_CONF_DIR, as in the second case > below: > {code} > # As you'd expect, /usr/share is first! > $HADOOP_CONF_DIR=/etc HADOOP_USER_CLASSPATH_FIRST="true" > HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' > '\n' | grep -n . | grep '/usr/share' > WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete. > 1:/usr/share > # Surprise! /usr/share is now in the 3rd line, even thought it was first in > HADOOP_CLASSPATH. > $HADOOP_CONF_DIR=/usr/share HADOOP_USER_CLASSPATH_FIRST="true" > HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' > '\n' | grep -n . | grep '/usr/share' > WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete. > 3:/usr/share > {code} > To re-iterate, what's surprising is that you can make an entry that's first > in HADOOP_USER_CLASSPATH show up not first in the resulting classpath. > I ran into this configuring {{bin/hive}} with a confdir that was being used > for both HDFS and Hive, and flailing as to why my {{log4j2.properties}} > wasn't being read. The one in my conf dir was lower in my classpath than one > bundled in some Hive jar. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15019) Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST
Philip Zeyliger created HADOOP-15019: Summary: Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST Key: HADOOP-15019 URL: https://issues.apache.org/jira/browse/HADOOP-15019 Project: Hadoop Common Issue Type: Bug Components: bin Reporter: Philip Zeyliger If a user sets {{HADOOP_USER_CLASSPATH_FIRST=true}} and furthermore includes a directory that's already in Hadoop's classpath via {{HADOOP_CLASSPATH}}, that directory will appear later than it should in the eventual $CLASSPATH. I believe this is because the de-duping at https://github.com/apache/hadoop/blob/cbc632d9abf08c56a7fc02be51b2718af30bad28/hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh#L1200 is ignoring the "before/after" parameter. My way of reproduction, first build the following trivial Java program: {code} $cat Test.java public class Test { public static void main(String[]args) { System.out.println(System.getenv().get("CLASSPATH")); } } $javac Test.java $jar cf test.jar Test.class {code} With that, if you happen to have an entry in HADOOP_CLASSPATH that matches what Hadoop would produce, you'll find the ordering not honored. It's easiest to reproduce this with a match for HADOOP_CONF_DIR, as in the second case below: {code} # As you'd expect, /usr/share is first! $HADOOP_CONF_DIR=/etc HADOOP_USER_CLASSPATH_FIRST="true" HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' '\n' | grep -n . | grep '/usr/share' WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete. 1:/usr/share # Surprise! /usr/share is now in the 3rd line, even thought it was first in HADOOP_CLASSPATH. $HADOOP_CONF_DIR=/usr/share HADOOP_USER_CLASSPATH_FIRST="true" HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' '\n' | grep -n . | grep '/usr/share' WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete. 3:/usr/share {code} To re-iterate, what's surprising is that you can make an entry that's first in HADOOP_USER_CLASSPATH show up not first in the resulting classpath. I ran into this configuring {{bin/hive}} with a confdir that was being used for both HDFS and Hive, and flailing as to why my {{log4j2.properties}} wasn't being read. The one in my conf dir was lower in my classpath than one bundled in some Hive jar. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-9160) Adopt Jolokia as the JMX HTTP/JSON bridge.
[ https://issues.apache.org/jira/browse/HADOOP-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725510#comment-13725510 ] Philip Zeyliger commented on HADOOP-9160: - I'm not a committer on this project, so my comments are obviously advisory only, and if you get support, you're welcome to disregard me. * It sounds like you're using Jolokia not so much for the JMX-ness as for its alternative authentication model. Something about the tools you're handling can't handle the SPNEGO(?) model for the HTTP-based (REST) APIs, and so you're introducing a separate HTTP server (Jolokia) to handle that. Is it the protocol that's causing interoporability issues or the authentication model? The auth models are already pluggable. I wish we'd talk about concrete tools that you're integrating--I'm surprised they speak Jolokia but not HTTP. * Similarly, I'd like to understand whether, in your ideal world, you could, say, read a file or call "hdfs upgrade" over JMX? * We've spent considerable effort getting backwards-compatible protocols in place for the wire protocol (via protocol buffers) and the client interfaces (via annotations). Opening up another layer of RPC exposes us to more issues here. > Adopt Jolokia as the JMX HTTP/JSON bridge. > -- > > Key: HADOOP-9160 > URL: https://issues.apache.org/jira/browse/HADOOP-9160 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Junping Du > Labels: features > Attachments: hadoop-9160-demo-branch-1.txt, HADOOP-9160.patch > > > The current JMX HTTP bridge has served its purpose, while a more complete > solution: Jolokia (formerly Jmx4Perl) has been developed/matured over the > years. > Jolokia provides comprehensive JMX features over HTTP/JSON including search > and list of JMX attributes and operations metadata, which helps to support > inter framework/platform compatibility. It has first class language bindings > for Perl, Python, Javascript, Java. > It's trivial (see demo patch) to incorporate Jolokia servlet into Hadoop HTTP > servers and use the same security mechanisms. > Adopting Jolokia will substantially improve the manageability of Hadoop and > its ecosystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9160) Adopt Jolokia as the JMX HTTP/JSON bridge.
[ https://issues.apache.org/jira/browse/HADOOP-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724169#comment-13724169 ] Philip Zeyliger commented on HADOOP-9160: - > third party JVM agents that provide advanced runtime monitoring and tuning > of JVM that expose JMX as the API If you're running a third party JVM agent, by all means let it expose whatever APIs it would like--it's got everything it needs to bind to addresses and listen for 'em. I have no particular objections to alternative access endpoints (e.g., NFS proxies), though it's notable that most of them are out of process proxies. Furthermore, for distributed systems like HDFS, you still often need a client shim to deal with HA and querying for data from the wrong machine. I'd never use HTTPFS from certain types of programs because then I'd have to manually re-write all the nice retry and error-handling that DFSClient provides me. I do have objections to alternatives for write access. I completely agree with Allen W: we've got to have a way to turn it off. BTW, one way to add administrative APIs is to add plugins. Hue, for example, used a plugin to datanodes and namenodes to get at some stuff. It's not pretty, but, hey, the maintenance burden is on the right place. > Adopt Jolokia as the JMX HTTP/JSON bridge. > -- > > Key: HADOOP-9160 > URL: https://issues.apache.org/jira/browse/HADOOP-9160 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Junping Du > Labels: features > Attachments: hadoop-9160-demo-branch-1.txt, HADOOP-9160.patch > > > The current JMX HTTP bridge has served its purpose, while a more complete > solution: Jolokia (formerly Jmx4Perl) has been developed/matured over the > years. > Jolokia provides comprehensive JMX features over HTTP/JSON including search > and list of JMX attributes and operations metadata, which helps to support > inter framework/platform compatibility. It has first class language bindings > for Perl, Python, Javascript, Java. > It's trivial (see demo patch) to incorporate Jolokia servlet into Hadoop HTTP > servers and use the same security mechanisms. > Adopting Jolokia will substantially improve the manageability of Hadoop and > its ecosystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9160) Adopt Jolokia as the JMX HTTP/JSON bridge.
[ https://issues.apache.org/jira/browse/HADOOP-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13723054#comment-13723054 ] Philip Zeyliger commented on HADOOP-9160: - Luke, The issue is with *write* operations. Things like "format HFDS" or "refreshNodes" or "decomission" or "enterSafemode." Those are currently done with DFSAdmin and friends. They operate over our RPC mechanism. I don't like the prospect of having yet another RPC mechanism to do the same thing. > Adopt Jolokia as the JMX HTTP/JSON bridge. > -- > > Key: HADOOP-9160 > URL: https://issues.apache.org/jira/browse/HADOOP-9160 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Junping Du > Labels: features > Attachments: hadoop-9160-demo-branch-1.txt, HADOOP-9160.patch > > > The current JMX HTTP bridge has served its purpose, while a more complete > solution: Jolokia (formerly Jmx4Perl) has been developed/matured over the > years. > Jolokia provides comprehensive JMX features over HTTP/JSON including search > and list of JMX attributes and operations metadata, which helps to support > inter framework/platform compatibility. It has first class language bindings > for Perl, Python, Javascript, Java. > It's trivial (see demo patch) to incorporate Jolokia servlet into Hadoop HTTP > servers and use the same security mechanisms. > Adopting Jolokia will substantially improve the manageability of Hadoop and > its ecosystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9562) Create REST interface for HDFS health data
[ https://issues.apache.org/jira/browse/HADOOP-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663064#comment-13663064 ] Philip Zeyliger commented on HADOOP-9562: - Y'all are aware that {code} GET http://namenode:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo {code} returns something along the lines of {code} { "beans" : [ { "name" : "Hadoop:service=NameNode,name=NameNodeInfo", "modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem", "Threads" : 60, "Total" : 39793714372608, "ClusterId" : "cluster19", "BlockPoolId" : "BP-1187031495-172.29.122.20-1355867299076", "Used" : 19269710336134, "PercentUsed" : 48.424004, "PercentRemaining" : 31.148054, "Version" : "2.0.0-cdh4.3.0-SNAPSHOT, rcb6dc0d3d99891eb095dc6577a440d7dde067789", "Free" : 12394968694784, "Safemode" : "", "UpgradeFinalized" : true, "NonDfsUsedSpace" : 8129035341690, "BlockPoolUsedSpace" : 19269710336134, "PercentBlockPoolUsed" : 48.424004, "TotalBlocks" : 64237, "TotalFiles" : 27708, "NumberOfMissingBlocks" : 0, "LiveNodes" : "{\"p0431.mtv.cloudera.com\":{\"numBlocks\":32249,\"usedSpace\":3220494794752,\"lastContact\":1,\"capacity\":5688127021056,\"nonDfsUsedSpace\":560239116288,\"adminState\":\"In Service\"},\"p0429.mtv.cloudera.com\":{\"numBlocks\":25709,\"usedSpace\":2468956155904,\"lastContact\":2,\"capacity\":5688127021056,\"nonDfsUsedSpace\":1509498798080,\"adminState\":\"In Service\"},\"p0427.mtv.cloudera.com\":{\"numBlocks\":16919,\"usedSpace\":2056487919683,\"lastContact\":0,\"capacity\":5676539633664,\"nonDfsUsedSpace\":1999484977085,\"adminState\":\"In Service\"},\"p0430.mtv.cloudera.com\":{\"numBlocks\":31258,\"usedSpace\":3117177036800,\"lastContact\":2,\"capacity\":5688127021056,\"nonDfsUsedSpace\":987432075264,\"adminState\":\"In Service\"},\"p0432.mtv.cloudera.com\":{\"numBlocks\":20877,\"usedSpace\":1899904856064,\"lastContact\":0,\"capacity\":5688127021056,\"nonDfsUsedSpace\":1620254515200,\"adminState\":\"In Service\"},\"p0433.mtv.cloudera.com\":{\"numBlocks\":31995,\"usedSpace\":3204067667968,\"lastContact\":1,\"capacity\":5688127021056,\"nonDfsUsedSpace\":172768587776,\"adminState\":\"In Service\"},\"p0428.mtv.cloudera.com\":{\"numBlocks\":33680,\"usedSpace\":3302621904963,\"lastContact\":0,\"capacity\":5676539633664,\"nonDfsUsedSpace\":1279357271997,\"adminState\":\"In Service\"}}", "DeadNodes" : "{}", "DecomNodes" : "{}", "NameDirStatuses" : "{\"failed\":{},\"active\":{\"/data/1/dfs2/nn\":\"IMAGE_AND_EDITS\",\"/data/2/dfs2/nn\":\"IMAGE_AND_EDITS\"}}" } ] } {code} Is that sufficient? I'd rather whatever missing information be added to existing JMX beans, which already accessible via HTTP, than new equivalent APIs be added. There's also a java API in DFSClient {code} $hdfs dfsadmin -report Configured Capacity: 39793714372608 (36.19 TB) Present Capacity: 31664679067648 (28.80 TB) DFS Remaining: 12394968727552 (11.27 TB) DFS Used: 19269710340096 (17.53 TB) DFS Used%: 60.86% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 - report: Access denied for user philip. Superuser privilege is required {code} > Create REST interface for HDFS health data > -- > > Key: HADOOP-9562 > URL: https://issues.apache.org/jira/browse/HADOOP-9562 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.0.4-alpha >Reporter: Trevor Lorimer >Priority: Minor > Attachments: 0001-HAD-162-Final-revision-refactor.patch > > > The HDFS health screen (dfshealth.jsp) displays basic Version, Security and > Health information concerning the NameNode, currently this information is > accessible from classes in the org.apache.hadoop,hdfs.server.namenode package > and cannot be accessed outside the NameNode. This becomes prevalent if the > data is required to be displayed using a new user interface. > The proposal is to create a REST interface to expose all the information > displayed on dfshealth.jsp using GET methods. Wrapper classes will be created > to serve the data to the REST root resource within the hadoop-hdfs project. > This will enable the HDFS health screen information to be accessed remotely. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9194) RPC Support for QoS
[ https://issues.apache.org/jira/browse/HADOOP-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549938#comment-13549938 ] Philip Zeyliger commented on HADOOP-9194: - In a previous life, I used systems which has multiple ports open for the same protocols, and relied on the both hardware and OS queueing to make one port a higher priority than the other. Sure was easy to reason about. > RPC Support for QoS > --- > > Key: HADOOP-9194 > URL: https://issues.apache.org/jira/browse/HADOOP-9194 > Project: Hadoop Common > Issue Type: New Feature > Components: ipc >Affects Versions: 2.0.2-alpha >Reporter: Luke Lu > > One of the next frontiers of Hadoop performance is QoS (Quality of Service). > We need QoS support to fight the inevitable "buffer bloat" (including various > queues, which are probably necessary for throughput) in our software stack. > This is important for mixed workload with different latency and throughput > requirements (e.g. OLTP vs OLAP, batch and even compaction I/O) against the > same DFS. > Any potential bottleneck will need to be managed by QoS mechanisms, starting > with RPC. > How about adding a one byte DS (differentiated services) field (a la the > 6-bit DS field in IP header) in the RPC header to facilitate the QoS > mechanisms (in separate JIRAs)? The byte at a fixed offset (how about 0?) of > the header is helpful for implementing high performance QoS mechanisms in > switches (software or hardware) and servers with minimum decoding effort. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8922) Provide alternate JSONP output for JMXJsonServlet to allow javascript in browser dashboard
[ https://issues.apache.org/jira/browse/HADOOP-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476742#comment-13476742 ] Philip Zeyliger commented on HADOOP-8922: - +1 (non-binding). Patch looks good to me. Please fix the spelling of "optionnal" in the javadoc. > Provide alternate JSONP output for JMXJsonServlet to allow javascript in > browser dashboard > -- > > Key: HADOOP-8922 > URL: https://issues.apache.org/jira/browse/HADOOP-8922 > Project: Hadoop Common > Issue Type: Improvement > Components: metrics >Reporter: Damien Hardy >Priority: Trivial > Labels: newbie, patch > Attachments: HADOOP-8922-2.patch, test.html > > > JMXJsonServlet may provide a JSONP alternative to JSON to allow javascript in > browser GUI to make requests. > For security purpose about XSS, browser limit request on other > domain[¹|#ref1] so that metrics from cluster nodes cannot be used in a full > js interface. > An example of this kind of dashboard is the bigdesk[²|#ref2] plugin for > ElasticSearch. > In order to achieve that the servlet should detect a GET parameter > (callback=) and modify the response by surrounding the Json value with > "(" and ");" [³|#ref3] > value "" is variable and should be provide by client as callback > parameter value. > {anchor:ref1}[1] > https://developer.mozilla.org/en-US/docs/Same_origin_policy_for_JavaScript > {anchor:ref2}[2] https://github.com/lukas-vlcek/bigdesk > {anchor:ref3}[3] http://en.wikipedia.org/wiki/JSONP -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8761) Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be bytes)
[ https://issues.apache.org/jira/browse/HADOOP-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-8761: Attachment: HADOOP-8761.patch.txt Trivial patch attached. > Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be > bytes) > > > Key: HADOOP-8761 > URL: https://issues.apache.org/jira/browse/HADOOP-8761 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Reporter: Philip Zeyliger >Priority: Minor > Attachments: HADOOP-8761.patch.txt > > > Trivial patch attached corrects the usage information. Stat.java calls > FileStatus.getLen(), which is most definitely the file size in bytes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8761) Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be bytes)
[ https://issues.apache.org/jira/browse/HADOOP-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-8761: Assignee: Philip Zeyliger Status: Patch Available (was: Open) Submitting patch for Hudson. No tests were added. > Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be > bytes) > > > Key: HADOOP-8761 > URL: https://issues.apache.org/jira/browse/HADOOP-8761 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger >Priority: Minor > Attachments: HADOOP-8761.patch.txt > > > Trivial patch attached corrects the usage information. Stat.java calls > FileStatus.getLen(), which is most definitely the file size in bytes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8761) Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be bytes)
Philip Zeyliger created HADOOP-8761: --- Summary: Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be bytes) Key: HADOOP-8761 URL: https://issues.apache.org/jira/browse/HADOOP-8761 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Philip Zeyliger Priority: Minor Trivial patch attached corrects the usage information. Stat.java calls FileStatus.getLen(), which is most definitely the file size in bytes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8347) Hadoop Common logs misspell 'successful'
[ https://issues.apache.org/jira/browse/HADOOP-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267231#comment-13267231 ] Philip Zeyliger commented on HADOOP-8347: - Test failures seem trashy: bq. >>> org.apache.hadoop.fs.viewfs.TestViewFsTrash.testTrash That's unrelated. > Hadoop Common logs misspell 'successful' > > > Key: HADOOP-8347 > URL: https://issues.apache.org/jira/browse/HADOOP-8347 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 2.0.0 >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger > Attachments: 0001-HADOOP-8347.-Fixing-spelling-of-successful.patch > > > 'successfull' is a misspelling of 'successful.' Trivial patch attached. The > constants are private, and there doesn't seem to be any serialized form of > these comments except in log files, so this shouldn't have compatibility > issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8347) Hadoop Common logs misspell 'successful'
[ https://issues.apache.org/jira/browse/HADOOP-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-8347: Status: Patch Available (was: Open) > Hadoop Common logs misspell 'successful' > > > Key: HADOOP-8347 > URL: https://issues.apache.org/jira/browse/HADOOP-8347 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger > Attachments: 0001-HADOOP-8347.-Fixing-spelling-of-successful.patch > > > 'successfull' is a misspelling of 'successful.' Trivial patch attached. The > constants are private, and there doesn't seem to be any serialized form of > these comments except in log files, so this shouldn't have compatibility > issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8347) Hadoop Common logs misspell 'successful'
[ https://issues.apache.org/jira/browse/HADOOP-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-8347: Attachment: 0001-HADOOP-8347.-Fixing-spelling-of-successful.patch > Hadoop Common logs misspell 'successful' > > > Key: HADOOP-8347 > URL: https://issues.apache.org/jira/browse/HADOOP-8347 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger > Attachments: 0001-HADOOP-8347.-Fixing-spelling-of-successful.patch > > > 'successfull' is a misspelling of 'successful.' Trivial patch attached. The > constants are private, and there doesn't seem to be any serialized form of > these comments except in log files, so this shouldn't have compatibility > issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8347) Hadoop Common logs misspell 'successful'
Philip Zeyliger created HADOOP-8347: --- Summary: Hadoop Common logs misspell 'successful' Key: HADOOP-8347 URL: https://issues.apache.org/jira/browse/HADOOP-8347 Project: Hadoop Common Issue Type: Bug Components: security Reporter: Philip Zeyliger Assignee: Philip Zeyliger Attachments: 0001-HADOOP-8347.-Fixing-spelling-of-successful.patch 'successfull' is a misspelling of 'successful.' Trivial patch attached. The constants are private, and there doesn't seem to be any serialized form of these comments except in log files, so this shouldn't have compatibility issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8343) Allow configuration of authorization for JmxJsonServlet and MetricsServlet
Philip Zeyliger created HADOOP-8343: --- Summary: Allow configuration of authorization for JmxJsonServlet and MetricsServlet Key: HADOOP-8343 URL: https://issues.apache.org/jira/browse/HADOOP-8343 Project: Hadoop Common Issue Type: New Feature Components: util Affects Versions: 2.0.0 Reporter: Philip Zeyliger When using authorization for the daemons' web server, it would be useful to specifically control the authorization requirements for accessing /jmx and /metrics. Currently, they require administrative access. This JIRA would propose that whether or not they are available to administrators only or to all users be controlled by "hadoop.instrumentation.requires.administrator" (or similar). The default would be that administrator access is required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7652) Provide a mechanism for a client Hadoop configuration to 'poison' daemon startup; i.e., disallow daemon start up on a client config.
[ https://issues.apache.org/jira/browse/HADOOP-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107546#comment-13107546 ] Philip Zeyliger commented on HADOOP-7652: - Hi Arun, As far as I can tell, service-level authorization restricts who can join in terms of unix users, but, for a non-kerberos packaged install (including both the RPMs based on 0.20.204 and various Cloudera packages), this doesn't help much. Users still type "/etc/init.d/hadoop-datanode start", and the username of the datanode process is "hdfs," thereby circumventing service level authorization (if I'm guessing correctly how it works.) -- Philip > Provide a mechanism for a client Hadoop configuration to 'poison' daemon > startup; i.e., disallow daemon start up on a client config. > > > Key: HADOOP-7652 > URL: https://issues.apache.org/jira/browse/HADOOP-7652 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Reporter: Philip Zeyliger > > We've seen folks who have been given Hadoop configuration to act as a client > accidentally type "hadoop namenode" and get things into a confused, or > incorrect state. Most recently, we've seen data corruption when users > accidentally run extra secondary namenodes > (https://issues.apache.org/jira/browse/HDFS-2305). > I'd like to propose that we introduce a configuration property, say, > "client.poison.servers", which, if set, disables the Hadoop daemons (nn, snn, > jt, tt, etc.) with a reasonable error message. Hadoop administrators can > hand out/install configs that are on machines intended to just be clients > with a little less worry that they'll accidentally get run. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7652) Provide a mechanism for a client Hadoop configuration to 'poison' daemon startup; i.e., disallow daemon start up on a client config.
Provide a mechanism for a client Hadoop configuration to 'poison' daemon startup; i.e., disallow daemon start up on a client config. Key: HADOOP-7652 URL: https://issues.apache.org/jira/browse/HADOOP-7652 Project: Hadoop Common Issue Type: Improvement Components: conf Reporter: Philip Zeyliger We've seen folks who have been given Hadoop configuration to act as a client accidentally type "hadoop namenode" and get things into a confused, or incorrect state. Most recently, we've seen data corruption when users accidentally run extra secondary namenodes (https://issues.apache.org/jira/browse/HDFS-2305). I'd like to propose that we introduce a configuration property, say, "client.poison.servers", which, if set, disables the Hadoop daemons (nn, snn, jt, tt, etc.) with a reasonable error message. Hadoop administrators can hand out/install configs that are on machines intended to just be clients with a little less worry that they'll accidentally get run. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7585) hadoop-config.sh should be changed to not rely on java6 behavior for classpath expansion since it breaks jsvc
[ https://issues.apache.org/jira/browse/HADOOP-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091519#comment-13091519 ] Philip Zeyliger commented on HADOOP-7585: - For what it's worth, I've had good luck asking the jsvc folks for compatibility fixes. https://issues.apache.org/jira/browse/DAEMON-208 was fairly similar. Might be worth filing a bug with jsvc. -- Philip > hadoop-config.sh should be changed to not rely on java6 behavior for > classpath expansion since it breaks jsvc > - > > Key: HADOOP-7585 > URL: https://issues.apache.org/jira/browse/HADOOP-7585 > Project: Hadoop Common > Issue Type: Bug >Reporter: Arun C Murthy >Assignee: Eric Yang >Priority: Blocker > Fix For: 0.23.0 > > Attachments: HADOOP-7585.patch > > > hadoop-config.sh should be changed to not rely on java6 behavior for > classpath expansion since it breaks jsvc - we need to add back the for loops > in hadoop-config.sh which were changed in HADOOP-7563 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-7357) hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if test failed
[ https://issues.apache.org/jira/browse/HADOOP-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-7357: Attachment: HADOOP-7357-v2.patch.txt Removing the entire try catch, as per Todd's suggestion. > hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if > test failed > -- > > Key: HADOOP-7357 > URL: https://issues.apache.org/jira/browse/HADOOP-7357 > Project: Hadoop Common > Issue Type: Bug > Components: test >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger >Priority: Trivial > Attachments: HADOOP-7357-v2.patch.txt, HADOOP-7357.patch.txt > > > It's convenient to run something like > {noformat} > HADOOP_CLASSPATH=hadoop-test-0.20.2.jar bin/hadoop > org.apache.hadoop.io.compress.TestCodec -count 3 -codec fo > {noformat} > but the error code it returns isn't interesting. > 1-line patch attached fixes that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-7357) hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if test failed
[ https://issues.apache.org/jira/browse/HADOOP-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-7357: Status: Patch Available (was: Open) > hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if > test failed > -- > > Key: HADOOP-7357 > URL: https://issues.apache.org/jira/browse/HADOOP-7357 > Project: Hadoop Common > Issue Type: Bug > Components: test >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger >Priority: Trivial > Attachments: HADOOP-7357.patch.txt > > > It's convenient to run something like > {noformat} > HADOOP_CLASSPATH=hadoop-test-0.20.2.jar bin/hadoop > org.apache.hadoop.io.compress.TestCodec -count 3 -codec fo > {noformat} > but the error code it returns isn't interesting. > 1-line patch attached fixes that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-7357) hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if test failed
[ https://issues.apache.org/jira/browse/HADOOP-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-7357: Attachment: HADOOP-7357.patch.txt > hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if > test failed > -- > > Key: HADOOP-7357 > URL: https://issues.apache.org/jira/browse/HADOOP-7357 > Project: Hadoop Common > Issue Type: Bug > Components: test >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger >Priority: Trivial > Attachments: HADOOP-7357.patch.txt > > > It's convenient to run something like > {noformat} > HADOOP_CLASSPATH=hadoop-test-0.20.2.jar bin/hadoop > org.apache.hadoop.io.compress.TestCodec -count 3 -codec fo > {noformat} > but the error code it returns isn't interesting. > 1-line patch attached fixes that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7357) hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if test failed
hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if test failed -- Key: HADOOP-7357 URL: https://issues.apache.org/jira/browse/HADOOP-7357 Project: Hadoop Common Issue Type: Bug Components: test Reporter: Philip Zeyliger Assignee: Philip Zeyliger Priority: Trivial Attachments: HADOOP-7357.patch.txt It's convenient to run something like {noformat} HADOOP_CLASSPATH=hadoop-test-0.20.2.jar bin/hadoop org.apache.hadoop.io.compress.TestCodec -count 3 -codec fo {noformat} but the error code it returns isn't interesting. 1-line patch attached fixes that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HADOOP-7198) Hadoop defaults for web UI ports often fall smack in the middle of Linux ephemeral port range
Hadoop defaults for web UI ports often fall smack in the middle of Linux ephemeral port range - Key: HADOOP-7198 URL: https://issues.apache.org/jira/browse/HADOOP-7198 Project: Hadoop Common Issue Type: Wish Reporter: Philip Zeyliger Priority: Trivial It turns out (see http://en.wikipedia.org/wiki/Ephemeral_port and /proc/sys/net/ipv4/ip_local_port_range) that when you bind to port 0, Linux chooses an ephemeral port. On my default-ridden Ubuntu Maverick box and on CentOS 5.5, that range is 32768-61000. So, when HBase binds to 60030 or when mapReduce binds to 50070, there's a small chance that you'll conflict with, say, an FTP session, or with some other Hadoop daemon that's had a listening address configured as :0. I don't know that there's a practical resolution here, since changing the defaults seems like an ill-fated effort, but if you have any ephemeral port use, you can run into this. We've now run into it once. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-7150) Create a module system for adding extensions to Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996169#comment-12996169 ] Philip Zeyliger commented on HADOOP-7150: - Owen, To clarify, are you referring about adding libraries for user MapReduce code (e.g., an image manipulation library) or more about adding plugins/modules to the core services (e.g., https://issues.apache.org/jira/browse/HADOOP-5257, "adding service plugins for namenode/datanode"; there's a similar MR ticket). Both are very handy; just curious which one you're thinking about. Thanks! > Create a module system for adding extensions to Hadoop > -- > > Key: HADOOP-7150 > URL: https://issues.apache.org/jira/browse/HADOOP-7150 > Project: Hadoop Common > Issue Type: New Feature > Components: util >Reporter: Owen O'Malley >Assignee: Owen O'Malley > > Currently adding extensions to Hadoop is difficult. I propose adding the > concept of modules that can add jars and libraries to the system. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-7001) Allow configuration changes without restarting configured nodes
[ https://issues.apache.org/jira/browse/HADOOP-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921553#action_12921553 ] Philip Zeyliger commented on HADOOP-7001: - I think that the number of reconfigurable parameters will be quite small, and that they will require extra code. (e.g., changing the size of a thread pool requires a line or two of code). So maybe it's just better to expose those things in some more direct way? At the very least, we should make it quite clear where it's going to be documented what things are re-configurable. > Allow configuration changes without restarting configured nodes > --- > > Key: HADOOP-7001 > URL: https://issues.apache.org/jira/browse/HADOOP-7001 > Project: Hadoop Common > Issue Type: Task >Reporter: Patrick Kling > > Currently, changing the configuration on a node (e.g., the name node) > requires that we restart the node. We propose a change that would allow us to > make configuration changes without restarting. Nodes that support > configuration changes at run time should implement the following interface: > interface ChangeableConfigured extends Configured { >void changeConfiguration(Configuration newConf) throws > ConfigurationChangeException; > } > The contract of changeConfiguration is as follows: > The node will compare newConf to the existing configuration. For each > configuration property that is set to a different value than in the current > configuration, the node will either adjust its behaviour to conform to the > new configuration or throw a ConfigurationChangeException if this change is > not possible at run time. If a configuration property is set in the current > configuration but is unset in newConf, the node should use its default value > for this property. After a successful invocation of changeConfiguration, the > behaviour of the configured node should be indistinguishable from the > behaviour of a node that was configured with newConf at creation. > It should be easy to change existing nodes to implement this interface. We > can start by throwing the exception for all changes and then gradually start > supporting more and more changes at run time. (We might even consider > replacing Configured with ChangeableConfigured entirely, but I think the > proposal above afford greater flexibility). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7001) Allow configuration changes without restarting configured nodes
[ https://issues.apache.org/jira/browse/HADOOP-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921493#action_12921493 ] Philip Zeyliger commented on HADOOP-7001: - I might be grumpy, but I think the right way to deal with configuration changes in a distributed, fault-tolerant system is to just restart the daemon entirely. We already have to deal with the daemon suddenly crashing and that not affecting the system too much, so I'm wary of extra complexity of yet another process. In practice, many configuration variables are loaded at start time and then stored as statics: stuff like threadpool sizes, for example. Not to mention that Configuration objects get copied along, so it's hard to make sure that a configuration change propagates to all possible children. I'll point out that the namenode and the jobtracker's fair scheduler already have mechanisms for dynamic configuration changes. In namenode, that's -refreshNodes. In the jt, I think the fair scheduler re-reads its XML configuration file on occasion. Both of these make a lot of sense: these are specific endpoints for managing specific data, and the semantics of those changes are easy to understand. -- Philip > Allow configuration changes without restarting configured nodes > --- > > Key: HADOOP-7001 > URL: https://issues.apache.org/jira/browse/HADOOP-7001 > Project: Hadoop Common > Issue Type: Task >Reporter: Patrick Kling > > Currently, changing the configuration on a node (e.g., the name node) > requires that we restart the node. We propose a change that would allow us to > make configuration changes without restarting. Nodes that support > configuration changes at run time should implement the following interface: > interface ChangeableConfigured extends Configured { >void changeConfiguration(Configuration newConf) throws > ConfigurationChangeException; > } > The contract of changeConfiguration is as follows: > The node will compare newConf to the existing configuration. For each > configuration property that is set to a different value than in the current > configuration, the node will either adjust its behaviour to conform to the > new configuration or throw a ConfigurationChangeException if this change is > not possible at run time. If a configuration property is set in the current > configuration but is unset in newConf, the node should use its default value > for this property. After a successful invocation of changeConfiguration, the > behaviour of the configured node should be indistinguishable from the > behaviour of a node that was configured with newConf at creation. > It should be easy to change existing nodes to implement this interface. We > can start by throwing the exception for all changes and then gradually start > supporting more and more changes at run time. (We might even consider > replacing Configured with ChangeableConfigured entirely, but I think the > proposal above afford greater flexibility). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6977) Herriot daemon clients should vend statistics
[ https://issues.apache.org/jira/browse/HADOOP-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915590#action_12915590 ] Philip Zeyliger commented on HADOOP-6977: - Konstantin, Sounds, well, sound. > Herriot daemon clients should vend statistics > - > > Key: HADOOP-6977 > URL: https://issues.apache.org/jira/browse/HADOOP-6977 > Project: Hadoop Common > Issue Type: Improvement > Components: test >Affects Versions: 0.22.0 >Reporter: Konstantin Boudnik >Assignee: Konstantin Boudnik > Attachments: HADOOP-6977.patch, HADOOP-6977.y20S.patch > > > The HDFS web user interface serves useful information through dfshealth.jsp > and dfsnodelist.jsp. > The Herriot interface to Hadoop cluster daemons would benefit from the > addition of some way to channel metics information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6977) Herriot daemon clients should vend statistics
[ https://issues.apache.org/jira/browse/HADOOP-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915566#action_12915566 ] Philip Zeyliger commented on HADOOP-6977: - Separating out the presentation and the data in dfshealth.jsp would be a boon all around. It would make testing easier, and it would open the way for supporting other (XML, JSON, whatever) ways to get at those things. +1 for the idea. > Herriot daemon clients should vend statistics > - > > Key: HADOOP-6977 > URL: https://issues.apache.org/jira/browse/HADOOP-6977 > Project: Hadoop Common > Issue Type: Improvement > Components: test >Affects Versions: 0.22.0 >Reporter: Konstantin Boudnik >Assignee: Konstantin Boudnik > Attachments: HADOOP-6977.patch > > > The HDFS web user interface serves useful information through dfshealth.jsp > and dfsnodelist.jsp. > The Herriot interface to Hadoop cluster daemons would benefit from the > addition of some way to channel metics information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6974) Configurable header buffer size for Hadoop HTTP server
[ https://issues.apache.org/jira/browse/HADOOP-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915131#action_12915131 ] Philip Zeyliger commented on HADOOP-6974: - I'm +1 the idea. I've absolutely run into this limit before, when running web apps on the same host. "dfs.http.header.buffer.size" seems like the wrong name for this parameter, since HttpServer is also used by other places. Perhaps "core.http.header.buffer.size"? I would be in favor of making the limit larger by default. Typically, I believe, additions to config variables include a change to core-default.xml to document that variable. Would be appropriate to see that as part of this patch, too. -- Philip > Configurable header buffer size for Hadoop HTTP server > -- > > Key: HADOOP-6974 > URL: https://issues.apache.org/jira/browse/HADOOP-6974 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Paul Butler > Attachments: hadoop-6974.patch > > > This patch adds a configurable parameter dfs.http.header.buffer.size to > Hadoop which allows the buffer size to be configured from the xml > configuration. > This fixes an issue that came up in an environment where the Hadoop servers > share a domain with other web applications that use domain cookies. The large > cookies overwhelmed Jetty's buffer which caused it to return a 413 error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6950) Suggest that HADOOP_CLASSPATH should be preserved in hadoop-env.sh.template
[ https://issues.apache.org/jira/browse/HADOOP-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6950: Attachment: HADOOP-6950.patch.txt Patch attached. This is a comment/doc change. > Suggest that HADOOP_CLASSPATH should be preserved in hadoop-env.sh.template > --- > > Key: HADOOP-6950 > URL: https://issues.apache.org/jira/browse/HADOOP-6950 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Reporter: Philip Zeyliger >Priority: Trivial > Attachments: HADOOP-6950.patch.txt > > > HADOOP_CLASSPATH tends to be used to add to bin/hadoop's classpath. Because > of the way the comment is written, administrator's who customize > hadoop-env.sh often inadvertently disable user's abilities to use it, by not > including the present value of the variable. > I propose we change the commented out suggestion code to include the present > value. > {noformat} > # Extra Java CLASSPATH elements. Optional. > -# export HADOOP_CLASSPATH= > +# export HADOOP_CLASSPATH=":$HADOOP_CLASSPATH" > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-6950) Suggest that HADOOP_CLASSPATH should be preserved in hadoop-env.sh.template
Suggest that HADOOP_CLASSPATH should be preserved in hadoop-env.sh.template --- Key: HADOOP-6950 URL: https://issues.apache.org/jira/browse/HADOOP-6950 Project: Hadoop Common Issue Type: Bug Components: scripts Reporter: Philip Zeyliger Priority: Trivial HADOOP_CLASSPATH tends to be used to add to bin/hadoop's classpath. Because of the way the comment is written, administrator's who customize hadoop-env.sh often inadvertently disable user's abilities to use it, by not including the present value of the variable. I propose we change the commented out suggestion code to include the present value. {noformat} # Extra Java CLASSPATH elements. Optional. -# export HADOOP_CLASSPATH= +# export HADOOP_CLASSPATH=":$HADOOP_CLASSPATH" {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6859) Introduce additional statistics to FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888049#action_12888049 ] Philip Zeyliger commented on HADOOP-6859: - Patch looks fine. You might want to mention in the javadoc what you mentioned here in the JIRA about what a "large read" operation may be. > Introduce additional statistics to FileSystem > - > > Key: HADOOP-6859 > URL: https://issues.apache.org/jira/browse/HADOOP-6859 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.22.0 > > Attachments: HADOOP-6859.patch > > > Currently FileSystem#statistics tracks bytesRead and bytesWritten. Additional > statistics that gives summary of operations performed will be useful for > tracking file system use. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6579) A utility for reading and writing tokens into a URL safe string.
[ https://issues.apache.org/jira/browse/HADOOP-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837952#action_12837952 ] Philip Zeyliger commented on HADOOP-6579: - Hi Owen, Trying to keep up with some of the security jiras. You're producing a lot of code, thereby making it tricky :) I think, in general, it's not that useful to write arbitrary writables to base64-encoded strings. Most browsers limit how long URL strings can be, so you've got to be pretty careful about what you're up to. Would you consider instead making this more specific, by moving this code into Token.getAsUrlSafeString() and (static) Token.fromUrlSafeString()? Or, equivalently, leave the code here, but in redirectToRandomDataNode() (patch in HDFS-991), use a method on the Token instead of WritableUtils. (This has the additional property that one could serialize tokens however; they just have to have a URL-safe string serialization.) Looked at the code and tests. Those look clear and good. -- Philip > A utility for reading and writing tokens into a URL safe string. > > > Key: HADOOP-6579 > URL: https://issues.apache.org/jira/browse/HADOOP-6579 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: c-6579-hdfs.patch, c-6579-mr.patch, c-6579.patch, > c-6579.patch > > > We need to include HDFS delegation tokens in the URLs while browsing the file > system. Therefore, we need a url-safe way to encode and decode them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6537) Proposal for exceptions thrown by FileContext and Abstract File System
[ https://issues.apache.org/jira/browse/HADOOP-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831199#action_12831199 ] Philip Zeyliger commented on HADOOP-6537: - @Suresh, if the current proposal isn't using FileSystemException, then my point is moot. All I was trying to say is that, when possible, we should avoid to have multiple classes named the same thing but in different packages. Specifically, having both java.io.FileSystemException and org.apache.hadoop.io.FileSystemException is a bit confusing because a casual user won't know which to import, and why they're different. -- Philip > Proposal for exceptions thrown by FileContext and Abstract File System > -- > > Key: HADOOP-6537 > URL: https://issues.apache.org/jira/browse/HADOOP-6537 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Jitendra Nath Pandey >Assignee: Suresh Srinivas > Fix For: 0.22.0 > > Attachments: hdfs-717.1.patch, hdfs-717.patch, hdfs-717.patch > > > Currently the APIs in FileContext throw only IOException. Going forward these > APIs will throw more specific exceptions. > This jira proposes following hierarchy of exceptions to be thrown by > FileContext and AFS (Abstract File System) classes. > InterruptedException (java.lang.InterruptedException) > IOException > /* Following exceptions extend IOException */ > FileNotFoundException > FileAlreadyExistsException > DirectoryNotEmptyException > NotDirectoryException > AccessDeniedException > IsDirectoryException > InvalidPathNameException > > FileSystemException > /* Following exceptions extend > FileSystemException */ > FileSystemNotReadyException > ReadOnlyFileSystemException > QuotaExceededException > OutOfSpaceException > RemoteException (java.rmi.RemoteException) > Most of the IOExceptions above are caused by invalid user input, while > FileSystemException is thrown when FS is in such a state that the requested > operation cannot proceed. > Please note that the proposed RemoteException is from standard java rmi > package, which also extends IOException. > > HDFS throws many exceptions which are not in the above list. The DFSClient > will unwrap the exceptions thrown by HDFS, and any exception not in the above > list will be thrown as IOException or FileSystemException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6537) Proposal for exceptions thrown by FileContext and Abstract File System
[ https://issues.apache.org/jira/browse/HADOOP-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830471#action_12830471 ] Philip Zeyliger commented on HADOOP-6537: - It's hard to tell where the current proposal is, but if FileSystemException is going to be in Java 1.7, it might be nice to not name Hadoop's exception by the same name (unless they're one and the same). Code that uses DFSClient might also be using the local file system using the regular Java APIs, and it might be nice to separate the two systems (local and Hadoop). > Proposal for exceptions thrown by FileContext and Abstract File System > -- > > Key: HADOOP-6537 > URL: https://issues.apache.org/jira/browse/HADOOP-6537 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Jitendra Nath Pandey >Assignee: Suresh Srinivas > Fix For: 0.22.0 > > Attachments: hdfs-717.1.patch, hdfs-717.patch, hdfs-717.patch > > > Currently the APIs in FileContext throw only IOException. Going forward these > APIs will throw more specific exceptions. > This jira proposes following hierarchy of exceptions to be thrown by > FileContext and AFS (Abstract File System) classes. > InterruptedException (java.lang.InterruptedException) > IOException > /* Following exceptions extend IOException */ > FileNotFoundException > FileAlreadyExistsException > DirectoryNotEmptyException > NotDirectoryException > AccessDeniedException > IsDirectoryException > InvalidPathNameException > > FileSystemException > /* Following exceptions extend > FileSystemException */ > FileSystemNotReadyException > ReadOnlyFileSystemException > QuotaExceededException > OutOfSpaceException > RemoteException (java.rmi.RemoteException) > Most of the IOExceptions above are caused by invalid user input, while > FileSystemException is thrown when FS is in such a state that the requested > operation cannot proceed. > Please note that the proposed RemoteException is from standard java rmi > package, which also extends IOException. > > HDFS throws many exceptions which are not in the above list. The DFSClient > will unwrap the exceptions thrown by HDFS, and any exception not in the above > list will be thrown as IOException or FileSystemException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6419) Change RPC layer to support SASL based mutual authentication
[ https://issues.apache.org/jira/browse/HADOOP-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828314#action_12828314 ] Philip Zeyliger commented on HADOOP-6419: - Thanks, Kan and Owen, for answering my questions. Kan, I looked through the new patch very briefly. Thanks for addressing my concerns! -- Philip > Change RPC layer to support SASL based mutual authentication > > > Key: HADOOP-6419 > URL: https://issues.apache.org/jira/browse/HADOOP-6419 > Project: Hadoop Common > Issue Type: New Feature > Components: security >Reporter: Kan Zhang >Assignee: Kan Zhang > Attachments: c6419-26.patch, c6419-39.patch, c6419-45.patch, > c6419-66.patch, c6419-67.patch, c6419-69.patch, c6419-70.patch, c6419-72.patch > > > The authentication mechanism to use will be SASL DIGEST-MD5 (see RFC- and > RFC-2831) or SASL GSSAPI/Kerberos. Since J2SE 5, Sun provides a SASL > implementation by default. Both our delegation token and job token can be > used as credentials for SASL DIGEST-MD5 authentication. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6419) Change RPC layer to support SASL based mutual authentication
[ https://issues.apache.org/jira/browse/HADOOP-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828011#action_12828011 ] Philip Zeyliger commented on HADOOP-6419: - Hi, Coming late to the game here. I've been reading SASL RFCs (oy), so wanted to take a look (for my own education and to advise Avro-SASL) at how one implements it in Java. I've got some comments and quite a few questions; thanks in advance for your patience. bq. public static enum AuthMethod RFC4422 calls these "mechanisms". Admittedly, in their land, mechanisms are NUL-terminated C-strings, and not enums. I think it's fine that we restrict the implementation to support only one mechanism per protocol. {noformat} saslClient = Sasl.createSaslClient( new String[] { SaslRpcServer.SASL_DIGEST_MECH }, null, null, SaslRpcServer.SASL_DEFAULT_REALM, SaslRpcServer.SASL_PROPS, new SaslClientCallbackHandler(token)); {noformat} Instead of SaslRpcServer.SASL_DIGEST_MECH, you could put the constant inside the AuthMethod enum, which you have available here. {noformat} saslClient = Sasl.createSaslClient( new String[] { SaslRpcServer.SASL_KERBEROS_MECH }, null, names[0], names[1], SaslRpcServer.SASL_PROPS, null); {noformat} It's pretty unintuitive that you had to pass the two parts of the server's kerberos identity as the protocol and server parameters here. How did you figure that out? I couldn't find any documentation for it, outside of the source for GssKrb5Client.java. bq. useSasl = false Am I right in guessing that the reason we don't use the "plain" mechanism for SASL is that we wish to avoid SASL's extra framing? bq. TokenSelector, TokenIdentifier Could you explain (perhaps in the javadoc) why you need both of these classes. The implementation of TestTokenSelector suggests to me that all TokenSelectors are just going to compare (kind, service) Text objects, and that's all. Are there likely to be different types of TokenSelectors? Likewise, when would TokenIdenitifier not just be (kind, username)? I think you're going for some type safety by using generics there, but I'm missing what it's buying you. bq. byte[] token = saslServer.unwrap(saslToken, 0, saslToken.length); bq.processUnwrappedData(token); At this point, token's not a token, but merely data, no? You might rename that variable, to avoid confusion. (I was confused.) bq. setupResponse(): if (call.connection.useSasl) { The code here would be clearer if you extracted this into a "void wrapWithSasl(response)" method. I missed that response was being re-used, and was scratching my head for a while :) bq. SaslInputStream Isn't it totally bizarre that the SaslServer javaDoc talks about "SecureInputStream", and there doesn't seem to be such a thin? I think they must have meant com.sun.jndi.ldap.sasl.SaslInputStream, which seems to be part of OpenJDK and GPL'd, so never us mind. {noformat} @KerberosInfo( serverPrincipalKey = SERVER_PRINCIPAL_KEY ) @TokenInfo( identifierClass = TestTokenIdentifier.class, secretManagerClass = TestTokenSecretManager.class, selectorClass = TestTokenSelector.class ) {noformat} With my "how-much-work-is-this-going-to-be-to-port-to-AVRO" hat on, I've been thinking about whether these annotations should be on the protocol (like they are), or just part of RPC.getProxy()/RPC.getServer(). I think they're fine as annotations: Hadoop's protocols are closely tied with the type of authentication they expect. That said: there's a lot of implicit information being passed in this annotation (and Client.java is correspondingly complicated). Could this just be @TokenRpcAuth(enum) and @KerberosRpcAuth(SERVER_PRINCIPLE_KEY)? I can't imagine a case where one of the three parameters for the @TokenInfo annotation wouldn't imply the other two, but I might be missing something. I'll also point out that your test works by a little bit of trickery: I initially thought that if @TokenInfo is specified, Client.java would use that. Turns out it will fall back to Kerberos if the token's not present. This is all fine; it was just a bit complicated to figure out how your test tries to cover both cases. (It wouldn't be crazy to assert that only one non-plain authentication type is supported, but maybe there are protocols where you could do either...) bq. static void testKerberosRpc I take it that this is a main() test and not a @Test test because Kerberos doesn't exist on Hudson? Might be appropriate to call that out. bq. SaslInputStream/SaslOutputStream Should these have tests? Thanks for your patience! -- Philip > Change RPC layer to support SASL based mutual authentication > > > Key: HADOOP-6419 > URL: https://issues.apache.org/jira/browse/HADOOP-64
[jira] Commented: (HADOOP-4487) Security features for Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795482#action_12795482 ] Philip Zeyliger commented on HADOOP-4487: - I'm surprised I'm the first to comment: is the discussion going on elsewhere? I read the design document over Christmas. Great to see a document with so much detail, thanks! I had some questions, and thought a couple of places could be clearer; my comments are below. ** One thing that hasn't been covered (outside of assumptions) is more detail about how to operationally secure a Hadoop cluster in Unix-land. The assumptions section lays out some of these ("root" needs to be secure). Some things that I thought about: (1) data nodes node to write their data with a unix user that users don't have access to, and with appropriate permissions (or umask). (Looking at my local system, the DataNode has left blocks world-readable.) (2) We assume that the JT and NN are also run under unix accounts which users do not have access to. Since Data Nodes and the NameNode share a key, it's important to limit cluster membership. (This is critical for task trackers, too, since an evil task tracker could do nasty things.) What's the mechanism to limit cluster participation? Is there a central registry of what users can access HDFS and queues? Is there an "HDFS" superuser? In existing Hadoop, it's the username corresponding to the uid of the running the Namenode process. bq. If the token doesn't exist in memory, which indicates NameNode has restarted It could also mean that the token is expired, no? I think this is made clearer in the following sentences. bq. READ, WRITE, COPY, REPLACE What is the COPY access mode used for? bq. "only the user will be able to kill their own jobs and tasks" Somewhere else in the document, there's discussion of jobs having owners/groups, not just owners. Surely a superuser or cluster manager can kill jobs with appropriate permissions? bq. API and environment changes Will users still be able to use Hadoop in a "non-secure" manner? How much work would be involved in using a different security model? This is probably answered by the patch itself :) > Security features for Hadoop > > > Key: HADOOP-4487 > URL: https://issues.apache.org/jira/browse/HADOOP-4487 > Project: Hadoop Common > Issue Type: New Feature > Components: security >Reporter: Kan Zhang >Assignee: Kan Zhang > Attachments: security-design.pdf > > > This is a top-level tracking JIRA for security work we are doing in Hadoop. > Please add reference to this when opening new security related JIRAs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6457) Set Hadoop User/Group by System properties or environment variables
[ https://issues.apache.org/jira/browse/HADOOP-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792623#action_12792623 ] Philip Zeyliger commented on HADOOP-6457: - Do note that users can be in more than one group, so hadoop.group.name should probably be hadoop.group.names, and be comma-delimited. > Set Hadoop User/Group by System properties or environment variables > --- > > Key: HADOOP-6457 > URL: https://issues.apache.org/jira/browse/HADOOP-6457 > Project: Hadoop Common > Issue Type: New Feature > Components: security >Reporter: issei yoshida > Attachments: 6457.patch > > > Hadoop User/Group can be set by System properties or environment variables. > For example, in environment variables, > export HADOOP_USER=test > export HADOOP_GROUP=user > or in your MapReduce, > System.setProperty("hadoop.user.name", "test"); > System.setProperty("hadoop.group.name", "user"); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6436) Remove auto-generated native build files
[ https://issues.apache.org/jira/browse/HADOOP-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789429#action_12789429 ] Philip Zeyliger commented on HADOOP-6436: - +1. > Remove auto-generated native build files > - > > Key: HADOOP-6436 > URL: https://issues.apache.org/jira/browse/HADOOP-6436 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Eli Collins >Assignee: Eli Collins > > The repo currently includes the automake and autoconf generated files for the > native build. Per discussion on HADOOP-6421 let's remove them and use the > host's automake and autoconf. We should also do this for libhdfs and > fuse-dfs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-4998) Implement a native OS runtime for Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784048#action_12784048 ] Philip Zeyliger commented on HADOOP-4998: - Figured I'd mention that Tomcat has some support for calling into APR: Javadoc: http://tomcat.apache.org/tomcat-6.0-doc/api/org/apache/tomcat/jni/package-tree.html Google codesearch link: http://www.google.com/codesearch/p?hl=en#cM_OVOKybvs/tomcat/tomcat-6/v6.0.10/src/apache-tomcat-6.0.10-src.zip|KNqCNnRERSg/apache-tomcat-6.0.10-src/java/org/apache/tomcat/jni/User.java&q=tomcat%20apr&d=10 > Implement a native OS runtime for Hadoop > > > Key: HADOOP-4998 > URL: https://issues.apache.org/jira/browse/HADOOP-4998 > Project: Hadoop Common > Issue Type: New Feature > Components: native >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Fix For: 0.21.0 > > > It would be useful to implement a JNI-based runtime for Hadoop to get access > to the native OS runtime. This would allow us to stop relying on exec'ing > bash to get access to information such as user-groups, process limits etc. > and for features such as chown/chgrp (org.apache.hadoop.util.Shell). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6318) Upgrade to Avro 1.2.0
[ https://issues.apache.org/jira/browse/HADOOP-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1276#action_1276 ] Philip Zeyliger commented on HADOOP-6318: - Could you also update the Eclipse classpath {noformat} $cat .eclipse.templates/.classpath | grep avro {noformat} when you commit this? Thanks! > Upgrade to Avro 1.2.0 > - > > Key: HADOOP-6318 > URL: https://issues.apache.org/jira/browse/HADOOP-6318 > Project: Hadoop Common > Issue Type: Improvement > Components: io, ipc >Reporter: Doug Cutting >Assignee: Doug Cutting > Attachments: HADOOP-6318.java > > > Avro 1.2 has been released. The API's Hadoop Common uses have been > simplified, and it should be upgraded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6241: Status: Patch Available (was: Open) > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, > HADOOP-6241-v4.patch.txt, HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6241: Status: Open (was: Patch Available) > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, > HADOOP-6241-v4.patch.txt, HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6241: Attachment: HADOOP-6241-v4.patch.txt bq. # ./bin/split_config.py shows nice usage information including the same invocation. split_config.py --help lacks the _doc_ info which I found helpful. Can you convince OptionParser to print that for all usage notices? done bq. In dist releases, the -default.xml files don't exist. If we expect this to be useful for users and not just developers, it should be able to look inside the jars for the -default.xml files. Since jars are just zips, you can probably do this pretty easily using the zipfile module. Done. Added another option to specify templates this way. Also, here's how I've been testing this (still has to be run manually); {noformat} $PYTHONPATH=bin:$PYTHONPATH python src/test/bin/split_config_test.py Could not find template file to place property 'notintemplates'. Ignoring. Writing out to Writing out to . -- Ran 1 test in 0.003s OK {noformat} > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, > HADOOP-6241-v4.patch.txt, HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765210#action_12765210 ] Philip Zeyliger commented on HADOOP-6241: - Nicholas, Cool. Is this ready for submission, then? Looks like Hudson has looked at the last patch. Thanks! -- Philip > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, > HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764525#action_12764525 ] Philip Zeyliger commented on HADOOP-6241: - Nicholas, Unfortunately how we load pyAntTasks is a bit moot: split_config.py won't run without lxml installed, and that's just not part of the standard python installation (though is very commonly available via package managers, etc.) -- Philip > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, > HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764252#action_12764252 ] Philip Zeyliger commented on HADOOP-6241: - I managed to hack together the ant stuff to get the test to "run", but it fails, even on my machine, with an import error. ("[py-test] ImportError: No module named lxml.etree"). {quote} + + + + + + + + + + {quote} > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, > HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764242#action_12764242 ] Philip Zeyliger commented on HADOOP-6241: - Nicholas, Even if I pull in pyAntTasks, the build machine (unless it magically has python2.5 and/or some extra packages) won't pass the tests, because it won't have the python lxml package installed. I can't think of an easy way around that... I could only run the tests if lxml is there, but that seems like cheating. > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, > HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6241: Status: Open (was: Patch Available) > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, > HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6241: Status: Patch Available (was: Open) > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, > HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6241: Attachment: HADOOP-6241-v3.patch.txt Attaching another version, where I've split out the test file. Nicholas--I tried to update the ant tasks, but there was a deeper problem: I'm using lxml.etree, which isn't part of the standard python distro (though is a commonly installed package). I tried to backport to xml.ElementTree, which worked (though I couldn't preserve the comments anymore, since ElementTree drops them at parse), but then I realized that that package isn't even in python 2.4, so it would still be a burden to find machines with the right pre-requisites. My original motivation here was to share a script that I found useful after splitting a couple of clusters' configuration files into multiple files. I used python and lxml, since those tools were quickest for me. In terms of integration with the build/test system, Java would have obviously been better :/ I'm ok with deciding that this isn't the right place for this script, and just throwing it up on the web somewhere else. Do you think that's the way we should go? BTW, if I were to update the build file, I'd need to pull in a depency to pyAntTasks (which Avro already uses). -- Philip > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, > HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764152#action_12764152 ] Philip Zeyliger commented on HADOOP-6241: - bq. -1 core tests org.apache.hadoop.io.TestUTF8.testGetBytes is failing, but has nothing to do with this patch. bq. -1 tests included There's an embedded python test, runnable with nose. {quote} $nosetests -v bin/split-config.py Test for shuffle_properties. ... ok -- Ran 1 test in 0.003s OK {quote} > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6241: Status: Patch Available (was: Open) > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6241: Attachment: HADOOP-6241-v2.patch.txt Attaching a new version that tries to preserve comments. > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6287) Support units for configuration knobs
[ https://issues.apache.org/jira/browse/HADOOP-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760665#action_12760665 ] Philip Zeyliger commented on HADOOP-6287: - The URI is the type that you'll get if you get the value of the configuration variable with "fsDefaultName.get(Configuration)". Different configuration variables will have different types there. Today, Configuration.get() returns String, and there are helper methods to get integers, etc. The basic types we'd want to support right off are Boolean, List, URI, String, List, Integer, and probably a few others. In the context of this ticket, we might want to support a Size type, which has a Size.getValue(SizeUnit.BYTES) accessor. The annotation is used to process the code to generate the documentation. There's more than one way to do this, but it's pretty easy to ask a variant of javac to spit out all the annotated variables. > Support units for configuration knobs > - > > Key: HADOOP-6287 > URL: https://issues.apache.org/jira/browse/HADOOP-6287 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Reporter: Arun C Murthy >Assignee: Eli Collins > Attachments: hadoop-6287-1.patch > > > We should add support for units in our Configuration system so that we can > specify values to be *1GB* or *1MB* rather than forcing every component which > consumes such values to be aware of these. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6287) Support units for configuration knobs
[ https://issues.apache.org/jira/browse/HADOOP-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760628#action_12760628 ] Philip Zeyliger commented on HADOOP-6287: - I'm +1 eric's suggestion: I think config documentation and code should be co-located, and should generate things. I'd like to do something like this: {code} @ConfigVariableDeclaration public final static ConfigVariable fsDefaultName = ConfigVariableBuilder.newURI() .setDefault(null) .setHelp("Default filesystem") .setVisibility(Visibility.PUBLIC) .addAccessor("fs.default.name") .addDeprecatedAccessor("core.default.fs", "Use foo instead") .addValidator(new ValidateSupportedFilesystem()); {code} (See more at https://issues.apache.org/jira/browse/HADOOP-6105#action_12727143) Yes, this imposes a type system: fsDefaultName now has to be a URI. There could be other units/validators for time, disk, etc. You then should access the variable by using "fsDefaultName.get(Configuration)", which means that it's easy to find all usages of the configuration. I'd very much like Hadoop to find configuration errors early, and for those errors to be sensible. (It would help both with NPEs deep inside of things that aren't obviously looking at configuration, but also at the wide array of ways we store comma-delimited values in Configuration: we should only be writing that logic correctly once, and, you know, we should support filenames with commas in them.) Steve, I actually think having a schema on configuration would make LDAP support easier, not harder. For reasons of backwards compatibility, we're going to support toString/fromString for a long time, and if you know what you're storing, the mapping functions to/from LDAP are going to be considerably easier to write. -- Philip > Support units for configuration knobs > - > > Key: HADOOP-6287 > URL: https://issues.apache.org/jira/browse/HADOOP-6287 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Reporter: Arun C Murthy >Assignee: Eli Collins > Attachments: hadoop-6287-1.patch > > > We should add support for units in our Configuration system so that we can > specify values to be *1GB* or *1MB* rather than forcing every component which > consumes such values to be aware of these. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6288) Improve Configuration to support default and minimum values
[ https://issues.apache.org/jira/browse/HADOOP-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760075#action_12760075 ] Philip Zeyliger commented on HADOOP-6288: - I haven't had much time to work on it, but in https://issues.apache.org/jira/browse/HADOOP-6105#action_12727143 I suggested an approach that would give us typed configuration, including default values and some validation. If there was some buy-in, I'd be very happy to bang out my prototype code into submission. -- Philip > Improve Configuration to support default and minimum values > > > Key: HADOOP-6288 > URL: https://issues.apache.org/jira/browse/HADOOP-6288 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Reporter: Arun C Murthy > > HADOOP-6105 provided a way to automatically handle deprecation of configs - > I'd like to take this further to support default values, minimum values etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6170) add Avro-based RPC serialization
[ https://issues.apache.org/jira/browse/HADOOP-6170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756167#action_12756167 ] Philip Zeyliger commented on HADOOP-6170: - I took a look. Overall, looks good. I'm sufficiently behind on my Avro-fu that I had a hard time following some of the magic. I'm writing down below what my current understanding is. Let me know where I'm totally wrong. Do add some JavaDoc per class. TunnelProtocol is the meta-interface that's used by Hadoop IPC. BufferListWritable is used by Hadoop IPC to ship the Avro byte-stream around. Its serialization format is: {code} data := size (buffer)[size] buffer := len (bytes)[len] size, len are writable integers bytes are raw bytes x[count] indicates count repetitions of x. {code} ClientTranceiver delegates shipping of bytes via Hadoop IPC. ServerTranceiver doesn't do much of anything except store the response data. New instances of it are created at every call in TunnelResponder. TunnelResponder implements the Hadoop IPC (TunnelProtocol); it's the thing running on the server, and converts calls into Avro RPC calls. This class would be clearer if, instead of extending ReflectResponder, you kept a private ReflectResponder, and called that explicitly in call. Then you would rename TunnelResponder to TunnelProtocolImpl. Can a single RPC server satisfy multiple protocols? Does that work with AvroRPC? I don't think it can right now, but I think that's necessary, since several daemons implement a handful of protocols. Some specific notes: bq. versioning features to for inter-Java RPCs. Typo. AvroTestProtocol is generated code, yes? Or does ReflectData.getProtocol() figure it out from reflection and paranamer data? If the former, should the source schema be checked in? Should the generation be done in build.xml? bq. assertEquals(intResult, 3); Total nit: JUnit prefers the expected argument on the left. bq. public BufferListWritable() {} Eclipse tells me this is unused. Might be worth a quick comment indicating that it's required because Writables get instantiated via reflection. bq. getRemoteName() { return "remote"; } Should these be customized to the Protocol being used? Cheers, -- Philip > add Avro-based RPC serialization > > > Key: HADOOP-6170 > URL: https://issues.apache.org/jira/browse/HADOOP-6170 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Doug Cutting >Assignee: Doug Cutting > Fix For: 0.21.0 > > Attachments: HADOOP-6170.patch, HADOOP-6170.patch, HADOOP-6170.patch > > > Permit RPC protocols to use Avro to serialize requests and responses, so that > protocols may better evolve without breaking compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6257) Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr
[ https://issues.apache.org/jira/browse/HADOOP-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6257: Status: Patch Available (was: Open) > Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr > -- > > Key: HADOOP-6257 > URL: https://issues.apache.org/jira/browse/HADOOP-6257 > Project: Hadoop Common > Issue Type: Bug > Components: build, fs, test >Reporter: Philip Zeyliger >Priority: Minor > Attachments: HADOOP-6257.patch > > > I propose to rename > hadoop-common/src/test/core/org/apache/hadoop/fs/TestFileSystem.java -> > src/test/core/org/apache/hadoop/fs/TestFileSystemCaching.java. Otherwise, it > conflicts with > hadoop-hdfs/src/test/hdfs-with-mr/org/apache/hadoop/fs/TestFileSystem.java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6257) Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr
[ https://issues.apache.org/jira/browse/HADOOP-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6257: Attachment: HADOOP-6257.patch Trivial rename. BTW, the conflict showed up in the recent HADOOP-6231. And I ran into it because hadoop-hdfs-hdfswithmr-test's driver was finding the wrong TestFileSystem. {code} $ bin/hadoop jar lib/hadoop-hdfs-hdfswithmr-test-0.21.0-dev.jar java.lang.NoSuchMethodException: org.apache.hadoop.fs.TestFileSystem.main([Ljava.lang.String;) at java.lang.Class.getMethod(Class.java:1605) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.(ProgramDriver.java:56) at org.apache.hadoop.util.ProgramDriver.addClass(ProgramDriver.java:99) at org.apache.hadoop.test.HdfsWithMRTestDriver.(HdfsWithMRTestDriver.java:47) at org.apache.hadoop.test.HdfsWithMRTestDriver.(HdfsWithMRTestDriver.java:39) at org.apache.hadoop.test.HdfsWithMRTestDriver.main(HdfsWithMRTestDriver.java:77) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) An example program must be given as the first argument. Valid program names are: nnbench: A benchmark that stresses the namenode. {code} > Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr > -- > > Key: HADOOP-6257 > URL: https://issues.apache.org/jira/browse/HADOOP-6257 > Project: Hadoop Common > Issue Type: Bug > Components: build, fs, test >Reporter: Philip Zeyliger >Priority: Minor > Attachments: HADOOP-6257.patch > > > I propose to rename > hadoop-common/src/test/core/org/apache/hadoop/fs/TestFileSystem.java -> > src/test/core/org/apache/hadoop/fs/TestFileSystemCaching.java. Otherwise, it > conflicts with > hadoop-hdfs/src/test/hdfs-with-mr/org/apache/hadoop/fs/TestFileSystem.java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-6257) Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr
Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr -- Key: HADOOP-6257 URL: https://issues.apache.org/jira/browse/HADOOP-6257 Project: Hadoop Common Issue Type: Bug Components: build, fs, test Reporter: Philip Zeyliger Priority: Minor I propose to rename hadoop-common/src/test/core/org/apache/hadoop/fs/TestFileSystem.java -> src/test/core/org/apache/hadoop/fs/TestFileSystemCaching.java. Otherwise, it conflicts with hadoop-hdfs/src/test/hdfs-with-mr/org/apache/hadoop/fs/TestFileSystem.java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6241: Attachment: HADOOP-6241.patch.txt I'm attaching a quick script I wrote for this purpose. The help is pasted in below for lighter reading. I'm very open to suggestions about what to name the script (split-config.py is a bit generic). Is bin/ an appropriate place for this, or is there a better place? To run, the script requires that lxml for python is installed on your system. It's not included and is BSD-licensed. http://codespeak.net/lxml/index.html#license . There's a built-in test that should be run with the nose test runner (http://pypi.python.org/pypi/nose/0.11.1 -- LGPL), though that dependency could be removed quite easily. {noformat} This script separates a single Hadoop XML configuration file into multiple ones, by finding properties that are in supplied templates and storing them in the designated output files. This script comes about to solve the problem of splitting up "hadoop-site.xml" into "core-site.xml", "mapred-site.xml", and "hdfs-site.xml", and, in fact, a common usage would be split-config.py --input hadoop-site.xml \\ --template core-default.xml \\ --template hdfs-default.xml \\ --template mapred-default.xml \\ --output core-site.xml \\ --output hdfs-site.xml \\ --output mapred-site.xml {noformat} > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
[ https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751667#action_12751667 ] Philip Zeyliger commented on HADOOP-6241: - Also, if we commit this, we should consider committing to the 0.20 branch, since that's when people are likely to be migrating from one set of configurations to the other. > Script to split configurations from one file into multiple, according to > templates. > --- > > Key: HADOOP-6241 > URL: https://issues.apache.org/jira/browse/HADOOP-6241 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Philip Zeyliger > Attachments: HADOOP-6241.patch.txt > > > This script moves properties from hadoop-site.xml into common-site, > mapred-site, and hdfs-site in a reasonably generic way. This is useful for > upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.
Script to split configurations from one file into multiple, according to templates. --- Key: HADOOP-6241 URL: https://issues.apache.org/jira/browse/HADOOP-6241 Project: Hadoop Common Issue Type: New Feature Reporter: Philip Zeyliger This script moves properties from hadoop-site.xml into common-site, mapred-site, and hdfs-site in a reasonably generic way. This is useful for upgrading clusters from 0.18 to 0.20 and 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6105) Provide a way to automatically handle backward compatibility of deprecated keys
[ https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733919#action_12733919 ] Philip Zeyliger commented on HADOOP-6105: - I think you have to think about how Hadoop's notion of a "final" flag interacts with this, too. If a system administrator has set either A or B to be final, then that value must override any user-submitted value, regardless of which was set first. > Provide a way to automatically handle backward compatibility of deprecated > keys > --- > > Key: HADOOP-6105 > URL: https://issues.apache.org/jira/browse/HADOOP-6105 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Reporter: Hemanth Yamijala > > There are cases when we have had to deprecate configuration keys. Use cases > include, changing the names of variables to better match intent, splitting a > single parameter into two - for maps, reduces etc. > In such cases, we typically provide a backwards compatible option for the old > keys. The handling of such cases might typically be common enough to actually > add support for it in a generic fashion in the Configuration class. Some > initial discussion around this started in HADOOP-5919, but since the project > split happened in between we decided to open this issue to fix it in common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6105) Provide a way to automatically handle backward compatibility of deprecated keys
[ https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732002#action_12732002 ] Philip Zeyliger commented on HADOOP-6105: - You might consider logging a warning every time you run into (either via get or set) a "set" deprecated key. > Provide a way to automatically handle backward compatibility of deprecated > keys > --- > > Key: HADOOP-6105 > URL: https://issues.apache.org/jira/browse/HADOOP-6105 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Reporter: Hemanth Yamijala > > There are cases when we have had to deprecate configuration keys. Use cases > include, changing the names of variables to better match intent, splitting a > single parameter into two - for maps, reduces etc. > In such cases, we typically provide a backwards compatible option for the old > keys. The handling of such cases might typically be common enough to actually > add support for it in a generic fashion in the Configuration class. Some > initial discussion around this started in HADOOP-5919, but since the project > split happened in between we decided to open this issue to fix it in common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6105) Provide a way to automatically handle backward compatibility of deprecated keys
[ https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731560#action_12731560 ] Philip Zeyliger commented on HADOOP-6105: - Owen, Apologies for missing this e-mail for so long. I'm behind on the "all-jira" bucket, and I failed to set a watch. Hemanth, you should definitely forge ahead with the simple, expedient solution. I'd like to convince you and Owen that the more complicated proposal is a net win (and I've used a similar system in the past), but I think the best way to do that is to actually write the code and transform a few usages. I've been busy with some other deadlines, so when I get there, I'll file a JIRA and bother you all again. (To answer Owen's questions: the couple of classes for ConfigVariable go into the configuration package; users are welcome to use the same classes to set their variables, or they can set them manually; the documentation for the variables themselves is generated, the documentation for the system lives in JavaDoc on the individual classes and the package.) -- Philip > Provide a way to automatically handle backward compatibility of deprecated > keys > --- > > Key: HADOOP-6105 > URL: https://issues.apache.org/jira/browse/HADOOP-6105 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Reporter: Hemanth Yamijala > > There are cases when we have had to deprecate configuration keys. Use cases > include, changing the names of variables to better match intent, splitting a > single parameter into two - for maps, reduces etc. > In such cases, we typically provide a backwards compatible option for the old > keys. The handling of such cases might typically be common enough to actually > add support for it in a generic fashion in the Configuration class. Some > initial discussion around this started in HADOOP-5919, but since the project > split happened in between we decided to open this issue to fix it in common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
[ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729870#action_12729870 ] Philip Zeyliger commented on HADOOP-6140: - One creates clusters in unit tests using MiniMRCluster, but that's too heavy-weight: I think it's ok to extract the relevant function and test it via unit test. +1 to the tests for (1). -- Philip > DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch > > > Key: HADOOP-6140 > URL: https://issues.apache.org/jira/browse/HADOOP-6140 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 0.18.3 >Reporter: Vladimir Klimontovich > Attachments: HADOOP-6140-ver2.patch, HADOOP-6140.patch > > > addArchiveToClassPath is a method of DistributedCache class. It should be > called before running a task. It accepts path to a jar file on a DFS. After it > this method should put this jar file on sitribuuted cache and than add this > file to classpath to each map/reduce process on job tracker. > This method didn't work. > Bug 1: > addArchiveToClassPath adds DFS-path to archive to > mapred.job.classpath.archives property. It uses > System.getProperty("path.separator") as delimiter of multiple path. > getFileClassPaths that is called from TaskRunner uses splits > mapred.job.classpath.archives using System.getProperty("path.separator"). > In unix systems System.getProperty("path.separator") equals to ":". DFS-path > urls is hdfs://host:port/path. It means that a result of split will be > [ hdfs,//host,port/path]. > Suggested solution: use "," instead of > Bug 2: > in TaskRunner there is an algorithm that looks for correspondence between DFS > paths and local paths in distributed cache. > It compares >if (archives[i].getPath().equals( > > archiveClasspaths[j].toString())){ > instead of > if (archives[i].toString().equals( > > archiveClasspaths[j].toString())) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
[ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729812#action_12729812 ] Philip Zeyliger commented on HADOOP-6140: - Vladimir, Patch looks good. It would be nice to have a test for (2). It may also be appropriate to add an exception if someone passes in a filename with a comma. -- Philip > DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch > > > Key: HADOOP-6140 > URL: https://issues.apache.org/jira/browse/HADOOP-6140 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 0.18.3 >Reporter: Vladimir Klimontovich > Attachments: HADOOP-6140.patch > > > addArchiveToClassPath is a method of DistributedCache class. It should be > called before running a task. It accepts path to a jar file on a DFS. After it > this method should put this jar file on sitribuuted cache and than add this > file to classpath to each map/reduce process on job tracker. > This method didn't work. > Bug 1: > addArchiveToClassPath adds DFS-path to archive to > mapred.job.classpath.archives property. It uses > System.getProperty("path.separator") as delimiter of multiple path. > getFileClassPaths that is called from TaskRunner uses splits > mapred.job.classpath.archives using System.getProperty("path.separator"). > In unix systems System.getProperty("path.separator") equals to ":". DFS-path > urls is hdfs://host:port/path. It means that a result of split will be > [ hdfs,//host,port/path]. > Suggested solution: use "," instead of > Bug 2: > in TaskRunner there is an algorithm that looks for correspondence between DFS > paths and local paths in distributed cache. > It compares >if (archives[i].getPath().equals( > > archiveClasspaths[j].toString())){ > instead of > if (archives[i].toString().equals( > > archiveClasspaths[j].toString())) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6125) extend DistributedCache to work locally (LocalJobRunner) (common half)
[ https://issues.apache.org/jira/browse/HADOOP-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6125: Resolution: Won't Fix Status: Resolved (was: Patch Available) MAPREDUCE-711 moves the distributed cache to common, which makes separating the patch in MAPREDUCE-476 into two (this being one half of it) is no longer necessary. > extend DistributedCache to work locally (LocalJobRunner) (common half) > -- > > Key: HADOOP-6125 > URL: https://issues.apache.org/jira/browse/HADOOP-6125 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger >Priority: Minor > Attachments: HADOOP-6125.patch > > > This is the co-ticket to MAPREDUCE-476, covering the significant part of > DistributedCache that's in common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6105) Provide a way to automatically handle backward compatibility of deprecated keys
[ https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727615#action_12727615 ] Philip Zeyliger commented on HADOOP-6105: - Hemanth, This JIRA is about backwards-compatibility of deprecated keys, which is something my comment addresses, so I thought it fit in well here. Think of it as an alternative solution to the problem you're trying to solve by keeping the map of deprecated keys in Configuration.java. Keeping a deprecation map is expedient and simple, but I think it may hamper a better, longer-term solution. The design goals above are "out of thin air" (in the sense that they haven't been discussed on JIRA outside of the JIRAs mentioned above and MAPREDUCE-475), though I hope they're reasonable. They were discussed a bit at http://wiki.apache.org/hadoop/DeveloperOffsite20090612, too. That said, I hope they help to frame the conversation a bit I very very much want there to be a path to be able to rename configuration keys, but I want to make sure that the solution that comes out of this JIRA is compatible with some future work. -- Philip > Provide a way to automatically handle backward compatibility of deprecated > keys > --- > > Key: HADOOP-6105 > URL: https://issues.apache.org/jira/browse/HADOOP-6105 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Reporter: Hemanth Yamijala > > There are cases when we have had to deprecate configuration keys. Use cases > include, changing the names of variables to better match intent, splitting a > single parameter into two - for maps, reduces etc. > In such cases, we typically provide a backwards compatible option for the old > keys. The handling of such cases might typically be common enough to actually > add support for it in a generic fashion in the Configuration class. Some > initial discussion around this started in HADOOP-5919, but since the project > split happened in between we decided to open this issue to fix it in common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6105) Provide a way to automatically handle backward compatibility of deprecated keys
[ https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727143#action_12727143 ] Philip Zeyliger commented on HADOOP-6105: - I'm not enamored of this approach and would like to propose a slightly heavier-weight, but, I think, cleaner approach than stuffing more logic into the Configuration class. My apologies for coming to this conversation a bit late. If you don't want to read a long e-mail, skip down to the code examples at the bottom. :) Before I get to the proposal, I wanted to lay out what I think the goals are. Note that HADOOP-475 is also related. * Standardization of configuration names, documentation, and value formats. Today, the names tend to appear in the code, or, at best, in constants in the code, and the documentation, when it exists, may be in -default.xml. It would be nice if it was very difficult to avoid writing documentation for the variable you're introducing. Right now there are and have been a handful of bugs where the default in the code is different than the default in the XML file, and that gets really confusing. * Backwards compatibility. We'd love to rename "mapred.foo" and "mr.bar" to be consistent, but we want to maintain backwards compatibility. This ticket is all about that. * Availability to user code. Users should be able to use configuration the same way the core does. Users pass information to their jobs via Configuration, and they should use the same mechanism. This is true today. * Type-safety. Configurations have a handful of recurring types: number of bytes, filename, URI, hostport combination, arrays of paths, etc. The parsing is done in an ad-hoc fashion, which is a shame, since it doesn't have to be. It would be nice to have some generic runtime checking of configuration parameters, too, and perhaps even ranges (that number can't be negative!). * Upgradeability to a different configuration format. I don't think we'll leave a place where configuration has to be a key->value map (especially because of "availability to user code", but it would eventually be nice if configuration could be queried from other places, or if the values could have a bit more structure. (For example, we could use XML to separate out a list of paths, instead of blindly using comma-delimited, unescaped text.) * Development ease. It ought to be easier to find the places where configuration is used. Today the best we can do is a grep, and then follow references manually. * Autogenerated documentation. No-brainer. * Ability to specify visibility, scope, and stability. Alogn the lines of HADOOP-5073, configuration variables should be classified as deprecated, unstable, evolving, and stable. It would be nice to introduce variables (say, that were used for tuning), with the expectation that they are not part of the public API. Use at your own risk sort of thing. My proposal is to represent every configuration variable that's accessed in the Hadoop code by a static instance of a ConfigVariable class. The interface is something like: {code} public interface ConfigValue { T get(Configuration conf); T getDefault(); void set(Configuration conf, T value); String getHelp(); } {code} There's more than one way to implement this. Here's one proposal that uses Java annotations: {code} @ConfigDescription(help="Some help text", visibility=Visibility.PUBLIC) @ConfigAccessors({ @ConfigAccessor(name="common.sample"), @ConfigAccessor(name="core.sample", deprecated="Use common.sample instead") }) public final static ConfigVariable myConfigVariable = ConfigVariables.newIntConfigVariable(15 /* default value */); {code} This approach would require pre-processing (at build time) the annotations into a data file, and then, at runtime, querying this data file. (It's not easily possible to get at the annotations on the field from within myConfigVariable.) I'm half-way to getting this working, and I actually think something like the following would be better: {code} @ConfigVariableDeclaration public final static ConfigVariable fsDefaultName = ConfigVariableBuilder.newURI() .setDefault(null) .setHelp("Default filesystem") .setVisibility(Visibility.PUBLIC) .addAccessor("fs.default.name") .addDeprecatedAccessor("core.default.fs", "Use foo instead") .addValidator(new ValidateSupportedFilesystem()); {code} This would still require build-time preprocessing (javac supports this) to find the variables, instantiate them, and output the documentation, but the rest of the processing is easy at runtime. A drawback of this approach is how to handle the defaults that default to other variables. Perhaps the easiest thing to do is to handle the same syntax we support now, like 'addIndirectDefault("${default.dir}/mapred")', but something that references the other variable directly is m
[jira] Updated: (HADOOP-6125) extend DistributedCache to work locally (LocalJobRunner) (common half)
[ https://issues.apache.org/jira/browse/HADOOP-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6125: Status: Patch Available (was: Open) > extend DistributedCache to work locally (LocalJobRunner) (common half) > -- > > Key: HADOOP-6125 > URL: https://issues.apache.org/jira/browse/HADOOP-6125 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger >Priority: Minor > Attachments: HADOOP-6125.patch > > > This is the co-ticket to MAPREDUCE-476, covering the significant part of > DistributedCache that's in common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6125) extend DistributedCache to work locally (LocalJobRunner) (common half)
[ https://issues.apache.org/jira/browse/HADOOP-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-6125: Attachment: HADOOP-6125.patch Regenerated patches from HADOOP-2914. I used: bq. cat HADOOP-2914-v3.patch | sed -e 's,src/core/,src/java/,' | patch -p0 > extend DistributedCache to work locally (LocalJobRunner) (common half) > -- > > Key: HADOOP-6125 > URL: https://issues.apache.org/jira/browse/HADOOP-6125 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger >Priority: Minor > Attachments: HADOOP-6125.patch > > > This is the co-ticket to MAPREDUCE-476, covering the significant part of > DistributedCache that's in common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-6125) extend DistributedCache to work locally (LocalJobRunner) (common half)
extend DistributedCache to work locally (LocalJobRunner) (common half) -- Key: HADOOP-6125 URL: https://issues.apache.org/jira/browse/HADOOP-6125 Project: Hadoop Common Issue Type: Improvement Reporter: Philip Zeyliger Assignee: Philip Zeyliger Priority: Minor This is the co-ticket to MAPREDUCE-476, covering the significant part of DistributedCache that's in common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.