from:"Philip Zeyliger \(JIRA\)"

[jira] [Commented] (HADOOP-15473) Configure serialFilter to avoid UnrecoverableKeyException caused by JDK-8189997

2018-05-16 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477789#comment-16477789
 ] 

Philip Zeyliger commented on HADOOP-15473:
--

In dealing with the fallout of this in Impala, I was able to edit the KMS 
"init" script to add the relevant {{-D}} option. The change is at 
[https://gerrit.cloudera.org/#/c/10418/1/testdata/cluster/node_templates/cdh6/etc/init.d/kms]
 if you're interested.

Do we need to actually expose this to our (Hadoop's) users? i.e., the only 
possible use of this API within a KMS process is 
{{org.apache.hadoop.crypto.key.JavaKeyStoreProvider}}, so could we just set 
this explicitly at start-up (either via the shell scripts or programmatically), 
and avoid exposing it to the users?

(Or does a client use this API, or perhaps we have enough plugins that we need 
to be more careful?)

> Configure serialFilter to avoid UnrecoverableKeyException caused by 
> JDK-8189997
> ---
>
> Key: HADOOP-15473
> URL: https://issues.apache.org/jira/browse/HADOOP-15473
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.7.6, 3.0.2
> Environment: JDK 8u171
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Critical
> Attachments: HDFS-13494.001.patch, HDFS-13494.002.patch, 
> HDFS-13494.003.patch, org.apache.hadoop.crypto.key.TestKeyProviderFactory.txt
>
>
> There is a new feature in JDK 8u171 called Enhanced KeyStore Mechanisms 
> (http://www.oracle.com/technetwork/java/javase/8u171-relnotes-430.html#JDK-8189997).
> This is the cause of the following errors in the TestKeyProviderFactory:
> {noformat}
> Caused by: java.security.UnrecoverableKeyException: Rejected by the 
> jceks.key.serialFilter or jdk.serialFilter property
>   at com.sun.crypto.provider.KeyProtector.unseal(KeyProtector.java:352)
>   at 
> com.sun.crypto.provider.JceKeyStore.engineGetKey(JceKeyStore.java:136)
>   at java.security.KeyStore.getKey(KeyStore.java:1023)
>   at 
> org.apache.hadoop.crypto.key.JavaKeyStoreProvider.getMetadata(JavaKeyStoreProvider.java:410)
>   ... 28 more
> {noformat}
> This issue causes errors and failures in hbase tests right now (using hdfs) 
> and could affect other products running on this new Java version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15019) Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST

2017-11-08 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244495#comment-16244495
 ] 

Philip Zeyliger commented on HADOOP-15019:
--

bq. Or, just use 'hadoop classpath' ...

Yep, you're right. I had tried and failed, but I must have gotten something 
else wrong.

> Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST 
> 
>
> Key: HADOOP-15019
> URL: https://issues.apache.org/jira/browse/HADOOP-15019
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Reporter: Philip Zeyliger
>
> If a user sets {{HADOOP_USER_CLASSPATH_FIRST=true}} and furthermore includes 
> a directory that's already in Hadoop's classpath via {{HADOOP_CLASSPATH}}, 
> that directory will appear later than it should in the eventual $CLASSPATH. I 
> believe this is because the de-duping at 
> https://github.com/apache/hadoop/blob/cbc632d9abf08c56a7fc02be51b2718af30bad28/hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh#L1200
>  is ignoring the "before/after" parameter.
> My way of reproduction, first build the following trivial Java program:
> {code}
> $cat Test.java
> public class Test {
>   public static void main(String[]args) {
> System.out.println(System.getenv().get("CLASSPATH"));
>   }
> }
> $javac Test.java
> $jar cf test.jar Test.class
> {code}
> With that, if you happen to have an entry in HADOOP_CLASSPATH that matches 
> what Hadoop would produce, you'll find the ordering not honored. It's easiest 
> to reproduce this with a match for HADOOP_CONF_DIR, as in the second case 
> below:
> {code}
> # As you'd expect, /usr/share is first!
> $HADOOP_CONF_DIR=/etc HADOOP_USER_CLASSPATH_FIRST="true" 
> HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' 
> '\n' | grep -n . | grep '/usr/share'
> WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
> 1:/usr/share
> # Surprise! /usr/share is now in the 3rd line, even thought it was first in 
> HADOOP_CLASSPATH.
> $HADOOP_CONF_DIR=/usr/share HADOOP_USER_CLASSPATH_FIRST="true" 
> HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' 
> '\n' | grep -n . | grep '/usr/share'
> WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
> 3:/usr/share
> {code}
> To re-iterate, what's surprising is that you can make an entry that's first 
> in HADOOP_USER_CLASSPATH show up not first in the resulting classpath.
> I ran into this configuring {{bin/hive}} with a confdir that was being used 
> for both HDFS and Hive, and flailing as to why my {{log4j2.properties}} 
> wasn't being read. The one in my conf dir was lower in my classpath than one 
> bundled in some Hive jar.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15019) Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST

2017-11-06 Thread Philip Zeyliger (JIRA)

Philip Zeyliger created HADOOP-15019:


 Summary: Hadoop shell script classpath de-duping ignores 
HADOOP_USER_CLASSPATH_FIRST 
 Key: HADOOP-15019
 URL: https://issues.apache.org/jira/browse/HADOOP-15019
 Project: Hadoop Common
  Issue Type: Bug
  Components: bin
Reporter: Philip Zeyliger


If a user sets {{HADOOP_USER_CLASSPATH_FIRST=true}} and furthermore includes a 
directory that's already in Hadoop's classpath via {{HADOOP_CLASSPATH}}, that 
directory will appear later than it should in the eventual $CLASSPATH. I 
believe this is because the de-duping at 
https://github.com/apache/hadoop/blob/cbc632d9abf08c56a7fc02be51b2718af30bad28/hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh#L1200
 is ignoring the "before/after" parameter.

My way of reproduction, first build the following trivial Java program:

{code}
$cat Test.java
public class Test {
  public static void main(String[]args) {
System.out.println(System.getenv().get("CLASSPATH"));
  }
}
$javac Test.java
$jar cf test.jar Test.class
{code}

With that, if you happen to have an entry in HADOOP_CLASSPATH that matches what 
Hadoop would produce, you'll find the ordering not honored. It's easiest to 
reproduce this with a match for HADOOP_CONF_DIR, as in the second case below:
{code}
# As you'd expect, /usr/share is first!
$HADOOP_CONF_DIR=/etc HADOOP_USER_CLASSPATH_FIRST="true" 
HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' 
'\n' | grep -n . | grep '/usr/share'
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
1:/usr/share

# Surprise! /usr/share is now in the 3rd line, even thought it was first in 
HADOOP_CLASSPATH.
$HADOOP_CONF_DIR=/usr/share HADOOP_USER_CLASSPATH_FIRST="true" 
HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' 
'\n' | grep -n . | grep '/usr/share'
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
3:/usr/share
{code}

To re-iterate, what's surprising is that you can make an entry that's first in 
HADOOP_USER_CLASSPATH show up not first in the resulting classpath.

I ran into this configuring {{bin/hive}} with a confdir that was being used for 
both HDFS and Hive, and flailing as to why my {{log4j2.properties}} wasn't 
being read. The one in my conf dir was lower in my classpath than one bundled 
in some Hive jar.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-9160) Adopt Jolokia as the JMX HTTP/JSON bridge.

2013-07-31 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725510#comment-13725510
 ] 

Philip Zeyliger commented on HADOOP-9160:
-

I'm not a committer on this project, so my comments are obviously advisory 
only, and if you get support, you're welcome to disregard me.

* It sounds like you're using Jolokia not so much for the JMX-ness as for its 
alternative authentication model.  Something about the tools you're handling 
can't handle the SPNEGO(?) model for the HTTP-based (REST) APIs, and so you're 
introducing a separate HTTP server (Jolokia) to handle that.  Is it the 
protocol that's causing interoporability issues or the authentication model?  
The auth models are already pluggable.  I wish we'd talk about concrete tools 
that you're integrating--I'm surprised they speak Jolokia but not HTTP.

* Similarly, I'd like to understand whether, in your ideal world, you could, 
say, read a file or call "hdfs upgrade" over JMX?

* We've spent considerable effort getting backwards-compatible protocols in 
place for the wire protocol (via protocol buffers) and the client interfaces 
(via annotations).  Opening up another layer of RPC exposes us to more issues 
here.





> Adopt Jolokia as the JMX HTTP/JSON bridge.
> --
>
> Key: HADOOP-9160
> URL: https://issues.apache.org/jira/browse/HADOOP-9160
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Luke Lu
>Assignee: Junping Du
>  Labels: features
> Attachments: hadoop-9160-demo-branch-1.txt, HADOOP-9160.patch
>
>
> The current JMX HTTP bridge has served its purpose, while a more complete 
> solution: Jolokia (formerly Jmx4Perl) has been developed/matured over the 
> years.
> Jolokia provides comprehensive JMX features over HTTP/JSON including search 
> and list of JMX attributes and operations metadata, which helps to support 
> inter framework/platform compatibility. It has first class language bindings 
> for Perl, Python, Javascript, Java.
> It's trivial (see demo patch) to incorporate Jolokia servlet into Hadoop HTTP 
> servers and use the same security mechanisms.
> Adopting Jolokia will substantially improve the manageability of Hadoop and 
> its ecosystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9160) Adopt Jolokia as the JMX HTTP/JSON bridge.

2013-07-30 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724169#comment-13724169
 ] 

Philip Zeyliger commented on HADOOP-9160:
-

>  third party JVM agents that provide advanced runtime monitoring and tuning 
> of JVM that expose JMX as the API

If you're running a third party JVM agent, by all means let it expose whatever 
APIs it would like--it's got everything it needs to bind to addresses and 
listen for 'em.  

I have no particular objections to alternative access endpoints (e.g., NFS 
proxies), though it's notable that most of them are out of process proxies.  
Furthermore, for distributed systems like HDFS, you still often need a client 
shim to deal with HA and querying for data from the wrong machine.  I'd never 
use HTTPFS from certain types of programs because then I'd have to manually 
re-write all the nice retry and error-handling that DFSClient provides me.  I 
do have objections to alternatives for write access.  I completely agree with 
Allen W: we've got to have a way to turn it off.

BTW, one way to add administrative APIs is to add plugins.  Hue, for example, 
used a plugin to datanodes and namenodes to get at some stuff.  It's not 
pretty, but, hey, the maintenance burden is on the right place.

> Adopt Jolokia as the JMX HTTP/JSON bridge.
> --
>
> Key: HADOOP-9160
> URL: https://issues.apache.org/jira/browse/HADOOP-9160
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Luke Lu
>Assignee: Junping Du
>  Labels: features
> Attachments: hadoop-9160-demo-branch-1.txt, HADOOP-9160.patch
>
>
> The current JMX HTTP bridge has served its purpose, while a more complete 
> solution: Jolokia (formerly Jmx4Perl) has been developed/matured over the 
> years.
> Jolokia provides comprehensive JMX features over HTTP/JSON including search 
> and list of JMX attributes and operations metadata, which helps to support 
> inter framework/platform compatibility. It has first class language bindings 
> for Perl, Python, Javascript, Java.
> It's trivial (see demo patch) to incorporate Jolokia servlet into Hadoop HTTP 
> servers and use the same security mechanisms.
> Adopting Jolokia will substantially improve the manageability of Hadoop and 
> its ecosystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9160) Adopt Jolokia as the JMX HTTP/JSON bridge.

2013-07-29 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13723054#comment-13723054
 ] 

Philip Zeyliger commented on HADOOP-9160:
-

Luke,

The issue is with *write* operations.  Things like "format HFDS" or 
"refreshNodes" or "decomission" or "enterSafemode."  Those are currently done 
with DFSAdmin and friends.  They operate over our RPC mechanism.  I don't like 
the prospect of having yet another RPC mechanism to do the same thing.

> Adopt Jolokia as the JMX HTTP/JSON bridge.
> --
>
> Key: HADOOP-9160
> URL: https://issues.apache.org/jira/browse/HADOOP-9160
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Luke Lu
>Assignee: Junping Du
>  Labels: features
> Attachments: hadoop-9160-demo-branch-1.txt, HADOOP-9160.patch
>
>
> The current JMX HTTP bridge has served its purpose, while a more complete 
> solution: Jolokia (formerly Jmx4Perl) has been developed/matured over the 
> years.
> Jolokia provides comprehensive JMX features over HTTP/JSON including search 
> and list of JMX attributes and operations metadata, which helps to support 
> inter framework/platform compatibility. It has first class language bindings 
> for Perl, Python, Javascript, Java.
> It's trivial (see demo patch) to incorporate Jolokia servlet into Hadoop HTTP 
> servers and use the same security mechanisms.
> Adopting Jolokia will substantially improve the manageability of Hadoop and 
> its ecosystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9562) Create REST interface for HDFS health data

2013-05-21 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663064#comment-13663064
 ] 

Philip Zeyliger commented on HADOOP-9562:
-

Y'all are aware that

{code}
GET http://namenode:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo
{code}

returns something along the lines of

{code}
{
  "beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeInfo",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
"Threads" : 60,
"Total" : 39793714372608,
"ClusterId" : "cluster19",
"BlockPoolId" : "BP-1187031495-172.29.122.20-1355867299076",
"Used" : 19269710336134,
"PercentUsed" : 48.424004,
"PercentRemaining" : 31.148054,
"Version" : "2.0.0-cdh4.3.0-SNAPSHOT, 
rcb6dc0d3d99891eb095dc6577a440d7dde067789",
"Free" : 12394968694784,
"Safemode" : "",
"UpgradeFinalized" : true,
"NonDfsUsedSpace" : 8129035341690,
"BlockPoolUsedSpace" : 19269710336134,
"PercentBlockPoolUsed" : 48.424004,
"TotalBlocks" : 64237,
"TotalFiles" : 27708,
"NumberOfMissingBlocks" : 0,
"LiveNodes" : 
"{\"p0431.mtv.cloudera.com\":{\"numBlocks\":32249,\"usedSpace\":3220494794752,\"lastContact\":1,\"capacity\":5688127021056,\"nonDfsUsedSpace\":560239116288,\"adminState\":\"In
 
Service\"},\"p0429.mtv.cloudera.com\":{\"numBlocks\":25709,\"usedSpace\":2468956155904,\"lastContact\":2,\"capacity\":5688127021056,\"nonDfsUsedSpace\":1509498798080,\"adminState\":\"In
 
Service\"},\"p0427.mtv.cloudera.com\":{\"numBlocks\":16919,\"usedSpace\":2056487919683,\"lastContact\":0,\"capacity\":5676539633664,\"nonDfsUsedSpace\":1999484977085,\"adminState\":\"In
 
Service\"},\"p0430.mtv.cloudera.com\":{\"numBlocks\":31258,\"usedSpace\":3117177036800,\"lastContact\":2,\"capacity\":5688127021056,\"nonDfsUsedSpace\":987432075264,\"adminState\":\"In
 
Service\"},\"p0432.mtv.cloudera.com\":{\"numBlocks\":20877,\"usedSpace\":1899904856064,\"lastContact\":0,\"capacity\":5688127021056,\"nonDfsUsedSpace\":1620254515200,\"adminState\":\"In
 
Service\"},\"p0433.mtv.cloudera.com\":{\"numBlocks\":31995,\"usedSpace\":3204067667968,\"lastContact\":1,\"capacity\":5688127021056,\"nonDfsUsedSpace\":172768587776,\"adminState\":\"In
 
Service\"},\"p0428.mtv.cloudera.com\":{\"numBlocks\":33680,\"usedSpace\":3302621904963,\"lastContact\":0,\"capacity\":5676539633664,\"nonDfsUsedSpace\":1279357271997,\"adminState\":\"In
 Service\"}}",
"DeadNodes" : "{}",
"DecomNodes" : "{}",
"NameDirStatuses" : 
"{\"failed\":{},\"active\":{\"/data/1/dfs2/nn\":\"IMAGE_AND_EDITS\",\"/data/2/dfs2/nn\":\"IMAGE_AND_EDITS\"}}"
  } ]
}
{code}

Is that sufficient?  I'd rather whatever missing information be added to 
existing JMX beans, which already accessible via HTTP, than new equivalent APIs 
be added.

There's also a java API in DFSClient
{code}
$hdfs dfsadmin -report
Configured Capacity: 39793714372608 (36.19 TB)
Present Capacity: 31664679067648 (28.80 TB)
DFS Remaining: 12394968727552 (11.27 TB)
DFS Used: 19269710340096 (17.53 TB)
DFS Used%: 60.86%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-
report: Access denied for user philip. Superuser privilege is required
{code}



> Create REST interface for HDFS health data
> --
>
> Key: HADOOP-9562
> URL: https://issues.apache.org/jira/browse/HADOOP-9562
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.0.4-alpha
>Reporter: Trevor Lorimer
>Priority: Minor
> Attachments: 0001-HAD-162-Final-revision-refactor.patch
>
>
> The HDFS health screen (dfshealth.jsp) displays basic Version, Security and 
> Health information concerning the NameNode, currently this information is 
> accessible from classes in the org.apache.hadoop,hdfs.server.namenode package 
> and cannot be accessed outside the NameNode. This becomes prevalent if the 
> data is required to be displayed using a new user interface.
> The proposal is to create a REST interface to expose all the information 
> displayed on dfshealth.jsp using GET methods. Wrapper classes will be created 
> to serve the data to the REST root resource within the hadoop-hdfs project.
> This will enable the HDFS health screen information to be accessed remotely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9194) RPC Support for QoS

2013-01-10 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549938#comment-13549938
 ] 

Philip Zeyliger commented on HADOOP-9194:
-

In a previous life, I used systems which has multiple ports open for the same 
protocols, and relied on the both hardware and OS queueing to make one port a 
higher priority than the other.  Sure was easy to reason about.

> RPC Support for QoS
> ---
>
> Key: HADOOP-9194
> URL: https://issues.apache.org/jira/browse/HADOOP-9194
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Affects Versions: 2.0.2-alpha
>Reporter: Luke Lu
>
> One of the next frontiers of Hadoop performance is QoS (Quality of Service). 
> We need QoS support to fight the inevitable "buffer bloat" (including various 
> queues, which are probably necessary for throughput) in our software stack. 
> This is important for mixed workload with different latency and throughput 
> requirements (e.g. OLTP vs OLAP, batch and even compaction I/O) against the 
> same DFS.
> Any potential bottleneck will need to be managed by QoS mechanisms, starting 
> with RPC. 
> How about adding a one byte DS (differentiated services) field (a la the 
> 6-bit DS field in IP header) in the RPC header to facilitate the QoS 
> mechanisms (in separate JIRAs)? The byte at a fixed offset (how about 0?) of 
> the header is helpful for implementing high performance QoS mechanisms in 
> switches (software or hardware) and servers with minimum decoding effort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8922) Provide alternate JSONP output for JMXJsonServlet to allow javascript in browser dashboard

2012-10-15 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476742#comment-13476742
 ] 

Philip Zeyliger commented on HADOOP-8922:
-

+1 (non-binding).  Patch looks good to me.  Please fix the spelling of 
"optionnal" in the javadoc.  

> Provide alternate JSONP output for JMXJsonServlet to allow javascript in 
> browser dashboard
> --
>
> Key: HADOOP-8922
> URL: https://issues.apache.org/jira/browse/HADOOP-8922
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Damien Hardy
>Priority: Trivial
>  Labels: newbie, patch
> Attachments: HADOOP-8922-2.patch, test.html
>
>
> JMXJsonServlet may provide a JSONP alternative to JSON to allow javascript in 
> browser GUI to make requests.
> For security purpose about XSS, browser limit request on other 
> domain[¹|#ref1] so that metrics from cluster nodes cannot be used in a full 
> js interface.
> An example of this kind of dashboard is the bigdesk[²|#ref2] plugin for 
> ElasticSearch.
> In order to achieve that the servlet should detect a GET parameter 
> (callback=) and modify the response by surrounding the Json value with 
> "(" and ");" [³|#ref3]
> value "" is variable and should be provide by client as callback 
> parameter value.
> {anchor:ref1}[1] 
> https://developer.mozilla.org/en-US/docs/Same_origin_policy_for_JavaScript
> {anchor:ref2}[2] https://github.com/lukas-vlcek/bigdesk
> {anchor:ref3}[3] http://en.wikipedia.org/wiki/JSONP

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8761) Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be bytes)

2012-09-04 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-8761:


Attachment: HADOOP-8761.patch.txt

Trivial patch attached.

> Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be 
> bytes)
> 
>
> Key: HADOOP-8761
> URL: https://issues.apache.org/jira/browse/HADOOP-8761
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: HADOOP-8761.patch.txt
>
>
> Trivial patch attached corrects the usage information.  Stat.java calls 
> FileStatus.getLen(), which is most definitely the file size in bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8761) Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be bytes)

2012-09-04 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-8761:


Assignee: Philip Zeyliger
  Status: Patch Available  (was: Open)

Submitting patch for Hudson.

No tests were added.

> Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be 
> bytes)
> 
>
> Key: HADOOP-8761
> URL: https://issues.apache.org/jira/browse/HADOOP-8761
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Minor
> Attachments: HADOOP-8761.patch.txt
>
>
> Trivial patch attached corrects the usage information.  Stat.java calls 
> FileStatus.getLen(), which is most definitely the file size in bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HADOOP-8761) Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be bytes)

2012-09-04 Thread Philip Zeyliger (JIRA)

Philip Zeyliger created HADOOP-8761:
---

 Summary: Help for FsShell's Stat incorrectly mentions "file size 
in blocks" (should be bytes)
 Key: HADOOP-8761
 URL: https://issues.apache.org/jira/browse/HADOOP-8761
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Philip Zeyliger
Priority: Minor


Trivial patch attached corrects the usage information.  Stat.java calls 
FileStatus.getLen(), which is most definitely the file size in bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8347) Hadoop Common logs misspell 'successful'

2012-05-02 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267231#comment-13267231
 ] 

Philip Zeyliger commented on HADOOP-8347:
-

Test failures seem trashy:

bq. >>> org.apache.hadoop.fs.viewfs.TestViewFsTrash.testTrash 

That's unrelated.


> Hadoop Common logs misspell 'successful'
> 
>
> Key: HADOOP-8347
> URL: https://issues.apache.org/jira/browse/HADOOP-8347
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.0.0
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
> Attachments: 0001-HADOOP-8347.-Fixing-spelling-of-successful.patch
>
>
> 'successfull' is a misspelling of 'successful.'  Trivial patch attached.  The 
> constants are private, and there doesn't seem to be any serialized form of 
> these comments except in log files, so this shouldn't have compatibility 
> issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8347) Hadoop Common logs misspell 'successful'

2012-05-02 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-8347:


Status: Patch Available  (was: Open)

> Hadoop Common logs misspell 'successful'
> 
>
> Key: HADOOP-8347
> URL: https://issues.apache.org/jira/browse/HADOOP-8347
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
> Attachments: 0001-HADOOP-8347.-Fixing-spelling-of-successful.patch
>
>
> 'successfull' is a misspelling of 'successful.'  Trivial patch attached.  The 
> constants are private, and there doesn't seem to be any serialized form of 
> these comments except in log files, so this shouldn't have compatibility 
> issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8347) Hadoop Common logs misspell 'successful'

2012-05-02 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-8347:


Attachment: 0001-HADOOP-8347.-Fixing-spelling-of-successful.patch

> Hadoop Common logs misspell 'successful'
> 
>
> Key: HADOOP-8347
> URL: https://issues.apache.org/jira/browse/HADOOP-8347
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
> Attachments: 0001-HADOOP-8347.-Fixing-spelling-of-successful.patch
>
>
> 'successfull' is a misspelling of 'successful.'  Trivial patch attached.  The 
> constants are private, and there doesn't seem to be any serialized form of 
> these comments except in log files, so this shouldn't have compatibility 
> issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HADOOP-8347) Hadoop Common logs misspell 'successful'

2012-05-02 Thread Philip Zeyliger (JIRA)

Philip Zeyliger created HADOOP-8347:
---

 Summary: Hadoop Common logs misspell 'successful'
 Key: HADOOP-8347
 URL: https://issues.apache.org/jira/browse/HADOOP-8347
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
 Attachments: 0001-HADOOP-8347.-Fixing-spelling-of-successful.patch

'successfull' is a misspelling of 'successful.'  Trivial patch attached.  The 
constants are private, and there doesn't seem to be any serialized form of 
these comments except in log files, so this shouldn't have compatibility issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HADOOP-8343) Allow configuration of authorization for JmxJsonServlet and MetricsServlet

2012-05-01 Thread Philip Zeyliger (JIRA)

Philip Zeyliger created HADOOP-8343:
---

 Summary: Allow configuration of authorization for JmxJsonServlet 
and MetricsServlet
 Key: HADOOP-8343
 URL: https://issues.apache.org/jira/browse/HADOOP-8343
 Project: Hadoop Common
  Issue Type: New Feature
  Components: util
Affects Versions: 2.0.0
Reporter: Philip Zeyliger


When using authorization for the daemons' web server, it would be useful to 
specifically control the authorization requirements for accessing /jmx and 
/metrics.  Currently, they require administrative access.  This JIRA would 
propose that whether or not they are available to administrators only or to all 
users be controlled by "hadoop.instrumentation.requires.administrator" (or 
similar).  The default would be that administrator access is required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7652) Provide a mechanism for a client Hadoop configuration to 'poison' daemon startup; i.e., disallow daemon start up on a client config.

2011-09-18 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107546#comment-13107546
 ] 

Philip Zeyliger commented on HADOOP-7652:
-

Hi Arun,

As far as I can tell, service-level authorization restricts who can join in 
terms of unix users, but, for a non-kerberos packaged install (including both 
the RPMs based on 0.20.204 and various Cloudera packages), this doesn't help 
much.  Users still type "/etc/init.d/hadoop-datanode start", and the username 
of the datanode process is "hdfs," thereby circumventing service level 
authorization (if I'm guessing correctly how it works.)

-- Philip

> Provide a mechanism for a client Hadoop configuration to 'poison' daemon 
> startup; i.e., disallow daemon start up on a client config.
> 
>
> Key: HADOOP-7652
> URL: https://issues.apache.org/jira/browse/HADOOP-7652
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Reporter: Philip Zeyliger
>
> We've seen folks who have been given Hadoop configuration to act as a client 
> accidentally type "hadoop namenode" and get things into a confused, or 
> incorrect state.  Most recently, we've seen data corruption when users 
> accidentally run extra secondary namenodes 
> (https://issues.apache.org/jira/browse/HDFS-2305).
> I'd like to propose that we introduce a configuration property, say, 
> "client.poison.servers", which, if set, disables the Hadoop daemons (nn, snn, 
> jt, tt, etc.) with a reasonable error message.  Hadoop administrators can 
> hand out/install configs that are on machines intended to just be clients 
> with a little less worry that they'll accidentally get run.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HADOOP-7652) Provide a mechanism for a client Hadoop configuration to 'poison' daemon startup; i.e., disallow daemon start up on a client config.

2011-09-18 Thread Philip Zeyliger (JIRA)

Provide a mechanism for a client Hadoop configuration to 'poison' daemon 
startup; i.e., disallow daemon start up on a client config.


 Key: HADOOP-7652
 URL: https://issues.apache.org/jira/browse/HADOOP-7652
 Project: Hadoop Common
  Issue Type: Improvement
  Components: conf
Reporter: Philip Zeyliger


We've seen folks who have been given Hadoop configuration to act as a client 
accidentally type "hadoop namenode" and get things into a confused, or 
incorrect state.  Most recently, we've seen data corruption when users 
accidentally run extra secondary namenodes 
(https://issues.apache.org/jira/browse/HDFS-2305).

I'd like to propose that we introduce a configuration property, say, 
"client.poison.servers", which, if set, disables the Hadoop daemons (nn, snn, 
jt, tt, etc.) with a reasonable error message.  Hadoop administrators can hand 
out/install configs that are on machines intended to just be clients with a 
little less worry that they'll accidentally get run.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7585) hadoop-config.sh should be changed to not rely on java6 behavior for classpath expansion since it breaks jsvc

2011-08-25 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091519#comment-13091519
 ] 

Philip Zeyliger commented on HADOOP-7585:
-

For what it's worth, I've had good luck asking the jsvc folks for compatibility 
fixes.  https://issues.apache.org/jira/browse/DAEMON-208 was fairly similar.  
Might be worth filing a bug with jsvc.

-- Philip

> hadoop-config.sh should be changed to not rely on java6 behavior for 
> classpath expansion since it breaks jsvc
> -
>
> Key: HADOOP-7585
> URL: https://issues.apache.org/jira/browse/HADOOP-7585
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Arun C Murthy
>Assignee: Eric Yang
>Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: HADOOP-7585.patch
>
>
> hadoop-config.sh should be changed to not rely on java6 behavior for 
> classpath expansion since it breaks jsvc - we need to add back the for loops 
> in hadoop-config.sh which were changed in HADOOP-7563

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-7357) hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if test failed

2011-06-03 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-7357:


Attachment: HADOOP-7357-v2.patch.txt

Removing the entire try catch, as per Todd's suggestion.

> hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if 
> test failed
> --
>
> Key: HADOOP-7357
> URL: https://issues.apache.org/jira/browse/HADOOP-7357
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Trivial
> Attachments: HADOOP-7357-v2.patch.txt, HADOOP-7357.patch.txt
>
>
> It's convenient to run something like
> {noformat}
> HADOOP_CLASSPATH=hadoop-test-0.20.2.jar bin/hadoop 
> org.apache.hadoop.io.compress.TestCodec  -count 3 -codec fo
> {noformat}
> but the error code it returns isn't interesting.
> 1-line patch attached fixes that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-7357) hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if test failed

2011-06-03 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-7357:


Status: Patch Available  (was: Open)

> hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if 
> test failed
> --
>
> Key: HADOOP-7357
> URL: https://issues.apache.org/jira/browse/HADOOP-7357
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Trivial
> Attachments: HADOOP-7357.patch.txt
>
>
> It's convenient to run something like
> {noformat}
> HADOOP_CLASSPATH=hadoop-test-0.20.2.jar bin/hadoop 
> org.apache.hadoop.io.compress.TestCodec  -count 3 -codec fo
> {noformat}
> but the error code it returns isn't interesting.
> 1-line patch attached fixes that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-7357) hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if test failed

2011-06-03 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-7357:


Attachment: HADOOP-7357.patch.txt

> hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if 
> test failed
> --
>
> Key: HADOOP-7357
> URL: https://issues.apache.org/jira/browse/HADOOP-7357
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Trivial
> Attachments: HADOOP-7357.patch.txt
>
>
> It's convenient to run something like
> {noformat}
> HADOOP_CLASSPATH=hadoop-test-0.20.2.jar bin/hadoop 
> org.apache.hadoop.io.compress.TestCodec  -count 3 -codec fo
> {noformat}
> but the error code it returns isn't interesting.
> 1-line patch attached fixes that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HADOOP-7357) hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if test failed

2011-06-03 Thread Philip Zeyliger (JIRA)

hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if test 
failed
--

 Key: HADOOP-7357
 URL: https://issues.apache.org/jira/browse/HADOOP-7357
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
Priority: Trivial
 Attachments: HADOOP-7357.patch.txt

It's convenient to run something like
{noformat}
HADOOP_CLASSPATH=hadoop-test-0.20.2.jar bin/hadoop 
org.apache.hadoop.io.compress.TestCodec  -count 3 -codec fo
{noformat}
but the error code it returns isn't interesting.

1-line patch attached fixes that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (HADOOP-7198) Hadoop defaults for web UI ports often fall smack in the middle of Linux ephemeral port range

2011-03-17 Thread Philip Zeyliger (JIRA)

Hadoop defaults for web UI ports often fall smack in the middle of Linux 
ephemeral port range
-

 Key: HADOOP-7198
 URL: https://issues.apache.org/jira/browse/HADOOP-7198
 Project: Hadoop Common
  Issue Type: Wish
Reporter: Philip Zeyliger
Priority: Trivial


It turns out (see http://en.wikipedia.org/wiki/Ephemeral_port and  
/proc/sys/net/ipv4/ip_local_port_range) that when you bind to port 0, Linux 
chooses an ephemeral port.  On my default-ridden Ubuntu Maverick box and on 
CentOS 5.5, that range is 32768-61000.  So, when HBase binds to 60030 or when 
mapReduce binds to 50070, there's a small chance that you'll conflict with, 
say, an FTP session, or with some other Hadoop daemon that's had a listening 
address configured as :0.

I don't know that there's a practical resolution here, since changing the 
defaults seems like an ill-fated effort, but if you have any ephemeral port 
use, you can run into this.  We've now run into it once.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7150) Create a module system for adding extensions to Hadoop

2011-02-17 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996169#comment-12996169
 ] 

Philip Zeyliger commented on HADOOP-7150:
-

Owen,

To clarify, are you referring about adding libraries for user MapReduce code 
(e.g., an image manipulation library) or more about adding plugins/modules to 
the core services (e.g., https://issues.apache.org/jira/browse/HADOOP-5257, 
"adding service plugins for namenode/datanode"; there's a similar MR ticket).

Both are very handy; just curious which one you're thinking about.

Thanks!

> Create a module system for adding extensions to Hadoop
> --
>
> Key: HADOOP-7150
> URL: https://issues.apache.org/jira/browse/HADOOP-7150
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: util
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> Currently adding extensions to Hadoop is difficult. I propose adding the 
> concept of modules that can add jars and libraries to the system.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7001) Allow configuration changes without restarting configured nodes

2010-10-15 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921553#action_12921553
 ] 

Philip Zeyliger commented on HADOOP-7001:
-

I think that the number of reconfigurable parameters will be quite small, and 
that they will require extra code.  (e.g., changing the size of a thread pool 
requires a line or two of code).  So maybe it's just better to expose those 
things in some more direct way?  At the very least, we should make it quite 
clear where it's going to be documented what things are re-configurable.

> Allow configuration changes without restarting configured nodes
> ---
>
> Key: HADOOP-7001
> URL: https://issues.apache.org/jira/browse/HADOOP-7001
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: Patrick Kling
>
> Currently, changing the configuration on a node (e.g., the name node) 
> requires that we restart the node. We propose a change that would allow us to 
> make configuration changes without restarting. Nodes that support 
> configuration changes at run time should implement the following interface:
> interface ChangeableConfigured extends Configured {
>void changeConfiguration(Configuration newConf) throws 
> ConfigurationChangeException;
> }
> The contract of changeConfiguration is as follows:
> The node will compare newConf to the existing configuration. For each 
> configuration property that is set to a different value than in the current 
> configuration, the node will either adjust its behaviour to conform to the 
> new configuration or throw a ConfigurationChangeException if this change is 
> not possible at run time. If a configuration property is set in the current 
> configuration but is unset in newConf, the node should use its default value 
> for this property. After a successful invocation of changeConfiguration, the 
> behaviour of the configured node should be indistinguishable from the 
> behaviour of a node that was configured with newConf at creation.
> It should be easy to change existing nodes to implement this interface. We 
> can start by throwing the exception for all changes and then gradually start 
> supporting more and more changes at run time. (We might even consider 
> replacing Configured with ChangeableConfigured entirely, but I think the 
> proposal above afford greater flexibility).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-7001) Allow configuration changes without restarting configured nodes

2010-10-15 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921493#action_12921493
 ] 

Philip Zeyliger commented on HADOOP-7001:
-

I might be grumpy, but I think the right way to deal with configuration changes 
in a distributed, fault-tolerant system is to just restart the daemon entirely. 
 We already have to deal with the daemon suddenly crashing and that not 
affecting the system too much, so I'm wary of extra complexity of yet another 
process.  In practice, many configuration variables are loaded at start time 
and then stored as statics: stuff like threadpool sizes, for example.  Not to 
mention that Configuration objects get copied along, so it's hard to make sure 
that a configuration change propagates to all possible children.

I'll point out that the namenode and the jobtracker's fair scheduler already 
have mechanisms for dynamic configuration changes.  In namenode, that's 
-refreshNodes.  In the jt, I think the fair scheduler re-reads its XML 
configuration file on occasion.  Both of these make a lot of sense: these are 
specific endpoints for managing specific data, and the semantics of those 
changes are easy to understand.

-- Philip

> Allow configuration changes without restarting configured nodes
> ---
>
> Key: HADOOP-7001
> URL: https://issues.apache.org/jira/browse/HADOOP-7001
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: Patrick Kling
>
> Currently, changing the configuration on a node (e.g., the name node) 
> requires that we restart the node. We propose a change that would allow us to 
> make configuration changes without restarting. Nodes that support 
> configuration changes at run time should implement the following interface:
> interface ChangeableConfigured extends Configured {
>void changeConfiguration(Configuration newConf) throws 
> ConfigurationChangeException;
> }
> The contract of changeConfiguration is as follows:
> The node will compare newConf to the existing configuration. For each 
> configuration property that is set to a different value than in the current 
> configuration, the node will either adjust its behaviour to conform to the 
> new configuration or throw a ConfigurationChangeException if this change is 
> not possible at run time. If a configuration property is set in the current 
> configuration but is unset in newConf, the node should use its default value 
> for this property. After a successful invocation of changeConfiguration, the 
> behaviour of the configured node should be indistinguishable from the 
> behaviour of a node that was configured with newConf at creation.
> It should be easy to change existing nodes to implement this interface. We 
> can start by throwing the exception for all changes and then gradually start 
> supporting more and more changes at run time. (We might even consider 
> replacing Configured with ChangeableConfigured entirely, but I think the 
> proposal above afford greater flexibility).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6977) Herriot daemon clients should vend statistics

2010-09-27 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915590#action_12915590
 ] 

Philip Zeyliger commented on HADOOP-6977:
-

Konstantin,

Sounds, well, sound.

> Herriot daemon clients should vend statistics
> -
>
> Key: HADOOP-6977
> URL: https://issues.apache.org/jira/browse/HADOOP-6977
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: HADOOP-6977.patch, HADOOP-6977.y20S.patch
>
>
> The HDFS web user interface serves useful information through dfshealth.jsp 
> and dfsnodelist.jsp.
> The Herriot interface to Hadoop cluster daemons would benefit from the 
> addition of some way to channel metics information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6977) Herriot daemon clients should vend statistics

2010-09-27 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915566#action_12915566
 ] 

Philip Zeyliger commented on HADOOP-6977:
-

Separating out the presentation and the data in dfshealth.jsp would be a boon 
all around.  It would make testing easier, and it would open the way for 
supporting other (XML, JSON, whatever) ways to get at those things.  +1 for the 
idea.

> Herriot daemon clients should vend statistics
> -
>
> Key: HADOOP-6977
> URL: https://issues.apache.org/jira/browse/HADOOP-6977
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Attachments: HADOOP-6977.patch
>
>
> The HDFS web user interface serves useful information through dfshealth.jsp 
> and dfsnodelist.jsp.
> The Herriot interface to Hadoop cluster daemons would benefit from the 
> addition of some way to channel metics information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6974) Configurable header buffer size for Hadoop HTTP server

2010-09-26 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915131#action_12915131
 ] 

Philip Zeyliger commented on HADOOP-6974:
-

I'm +1 the idea.  I've absolutely run into this limit before, when running web 
apps on the same host.

"dfs.http.header.buffer.size" seems like the wrong name for this parameter, 
since HttpServer is also used by other places.  Perhaps 
"core.http.header.buffer.size"?

I would be in favor of making the limit larger by default.

Typically, I believe, additions to config variables include a change to 
core-default.xml to document that variable.  Would be appropriate to see that 
as part of this patch, too.

-- Philip

> Configurable header buffer size for Hadoop HTTP server
> --
>
> Key: HADOOP-6974
> URL: https://issues.apache.org/jira/browse/HADOOP-6974
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Paul Butler
> Attachments: hadoop-6974.patch
>
>
> This patch adds a configurable parameter dfs.http.header.buffer.size to 
> Hadoop which allows the buffer size to be configured from the xml 
> configuration.
> This fixes an issue that came up in an environment where the Hadoop servers 
> share a domain with other web applications that use domain cookies. The large 
> cookies overwhelmed Jetty's buffer which caused it to return a 413 error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6950) Suggest that HADOOP_CLASSPATH should be preserved in hadoop-env.sh.template

2010-09-10 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6950:


Attachment: HADOOP-6950.patch.txt

Patch attached.  This is a comment/doc change.

> Suggest that HADOOP_CLASSPATH should be preserved in hadoop-env.sh.template
> ---
>
> Key: HADOOP-6950
> URL: https://issues.apache.org/jira/browse/HADOOP-6950
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Philip Zeyliger
>Priority: Trivial
> Attachments: HADOOP-6950.patch.txt
>
>
> HADOOP_CLASSPATH tends to be used to add to bin/hadoop's classpath.  Because 
> of the way the comment is written, administrator's who customize 
> hadoop-env.sh often inadvertently disable user's abilities to use it, by not 
> including the present value of the variable.
> I propose we change the commented out suggestion code to include the present 
> value.
> {noformat}
>  # Extra Java CLASSPATH elements.  Optional.
> -# export HADOOP_CLASSPATH=
> +# export HADOOP_CLASSPATH=":$HADOOP_CLASSPATH"
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-6950) Suggest that HADOOP_CLASSPATH should be preserved in hadoop-env.sh.template

2010-09-10 Thread Philip Zeyliger (JIRA)

Suggest that HADOOP_CLASSPATH should be preserved in hadoop-env.sh.template
---

 Key: HADOOP-6950
 URL: https://issues.apache.org/jira/browse/HADOOP-6950
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Reporter: Philip Zeyliger
Priority: Trivial


HADOOP_CLASSPATH tends to be used to add to bin/hadoop's classpath.  Because of 
the way the comment is written, administrator's who customize hadoop-env.sh 
often inadvertently disable user's abilities to use it, by not including the 
present value of the variable.

I propose we change the commented out suggestion code to include the present 
value.

{noformat}
 # Extra Java CLASSPATH elements.  Optional.
-# export HADOOP_CLASSPATH=
+# export HADOOP_CLASSPATH=":$HADOOP_CLASSPATH"
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6859) Introduce additional statistics to FileSystem

2010-07-13 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888049#action_12888049
 ] 

Philip Zeyliger commented on HADOOP-6859:
-

Patch looks fine.  You might want to mention in the javadoc what you mentioned 
here in the JIRA about what a "large read" operation may be.

> Introduce additional statistics to FileSystem
> -
>
> Key: HADOOP-6859
> URL: https://issues.apache.org/jira/browse/HADOOP-6859
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.22.0
>
> Attachments: HADOOP-6859.patch
>
>
> Currently FileSystem#statistics tracks bytesRead and bytesWritten. Additional 
> statistics that gives summary of operations performed will be useful for 
> tracking file system use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6579) A utility for reading and writing tokens into a URL safe string.

2010-02-24 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837952#action_12837952
 ] 

Philip Zeyliger commented on HADOOP-6579:
-

Hi Owen,

Trying to keep up with some of the security jiras.  You're producing a lot of 
code, thereby making it tricky :)

I think, in general, it's not that useful to write arbitrary writables to 
base64-encoded strings.  Most browsers limit how long URL strings can be, so 
you've got to be pretty careful about what you're up to.  Would you consider 
instead making this more specific, by moving this code into 
Token.getAsUrlSafeString() and (static) Token.fromUrlSafeString()?  Or, 
equivalently, leave the code here, but in redirectToRandomDataNode() (patch in 
HDFS-991), use a method on the Token instead of WritableUtils.  (This has the 
additional property that one could serialize tokens however; they just have to 
have a URL-safe string serialization.)

Looked at the code and tests.  Those look clear and good.

-- Philip

> A utility for reading and writing tokens into a URL safe string.
> 
>
> Key: HADOOP-6579
> URL: https://issues.apache.org/jira/browse/HADOOP-6579
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.22.0
>
> Attachments: c-6579-hdfs.patch, c-6579-mr.patch, c-6579.patch, 
> c-6579.patch
>
>
> We need to include HDFS delegation tokens in the URLs while browsing the file 
> system. Therefore, we need a url-safe way to encode and decode them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6537) Proposal for exceptions thrown by FileContext and Abstract File System

2010-02-08 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831199#action_12831199
 ] 

Philip Zeyliger commented on HADOOP-6537:
-

@Suresh, if the current proposal isn't using FileSystemException, then my point 
is moot.

All I was trying to say is that, when possible, we should avoid to have 
multiple classes named the same thing but in different packages.  Specifically, 
having both java.io.FileSystemException and 
org.apache.hadoop.io.FileSystemException is a bit confusing because a casual 
user won't know which to import, and why they're different.

-- Philip

> Proposal for exceptions thrown by FileContext and Abstract File System
> --
>
> Key: HADOOP-6537
> URL: https://issues.apache.org/jira/browse/HADOOP-6537
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Jitendra Nath Pandey
>Assignee: Suresh Srinivas
> Fix For: 0.22.0
>
> Attachments: hdfs-717.1.patch, hdfs-717.patch, hdfs-717.patch
>
>
> Currently the APIs in FileContext throw only IOException. Going forward these 
> APIs will throw more specific exceptions.
> This jira proposes following hierarchy of exceptions to be thrown by 
> FileContext and AFS (Abstract File System) classes.
> InterruptedException  (java.lang.InterruptedException)
> IOException
> /* Following exceptions extend IOException */
> FileNotFoundException
> FileAlreadyExistsException
> DirectoryNotEmptyException
> NotDirectoryException
> AccessDeniedException
> IsDirectoryException
> InvalidPathNameException
> 
> FileSystemException
>  /* Following exceptions extend 
> FileSystemException */
>  FileSystemNotReadyException
>  ReadOnlyFileSystemException
>  QuotaExceededException
>  OutOfSpaceException
> RemoteException   (java.rmi.RemoteException)
> Most of the IOExceptions above are caused by invalid user input, while 
> FileSystemException is thrown when FS is in such a state that the requested 
> operation cannot proceed.
> Please note that the proposed RemoteException is from standard java rmi 
> package, which also extends IOException.
> 
> HDFS throws many exceptions which are not in the above list. The DFSClient 
> will unwrap the exceptions thrown by HDFS, and any exception not in the above 
> list will be thrown as IOException or FileSystemException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6537) Proposal for exceptions thrown by FileContext and Abstract File System

2010-02-05 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830471#action_12830471
 ] 

Philip Zeyliger commented on HADOOP-6537:
-

It's hard to tell where the current proposal is, but if FileSystemException is 
going to be in Java 1.7, it might be nice to not name Hadoop's exception by the 
same name (unless they're one and the same).  Code that uses DFSClient might 
also be using the local file system using the regular Java APIs, and it might 
be nice to separate the two systems (local and Hadoop).

> Proposal for exceptions thrown by FileContext and Abstract File System
> --
>
> Key: HADOOP-6537
> URL: https://issues.apache.org/jira/browse/HADOOP-6537
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Jitendra Nath Pandey
>Assignee: Suresh Srinivas
> Fix For: 0.22.0
>
> Attachments: hdfs-717.1.patch, hdfs-717.patch, hdfs-717.patch
>
>
> Currently the APIs in FileContext throw only IOException. Going forward these 
> APIs will throw more specific exceptions.
> This jira proposes following hierarchy of exceptions to be thrown by 
> FileContext and AFS (Abstract File System) classes.
> InterruptedException  (java.lang.InterruptedException)
> IOException
> /* Following exceptions extend IOException */
> FileNotFoundException
> FileAlreadyExistsException
> DirectoryNotEmptyException
> NotDirectoryException
> AccessDeniedException
> IsDirectoryException
> InvalidPathNameException
> 
> FileSystemException
>  /* Following exceptions extend 
> FileSystemException */
>  FileSystemNotReadyException
>  ReadOnlyFileSystemException
>  QuotaExceededException
>  OutOfSpaceException
> RemoteException   (java.rmi.RemoteException)
> Most of the IOExceptions above are caused by invalid user input, while 
> FileSystemException is thrown when FS is in such a state that the requested 
> operation cannot proceed.
> Please note that the proposed RemoteException is from standard java rmi 
> package, which also extends IOException.
> 
> HDFS throws many exceptions which are not in the above list. The DFSClient 
> will unwrap the exceptions thrown by HDFS, and any exception not in the above 
> list will be thrown as IOException or FileSystemException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6419) Change RPC layer to support SASL based mutual authentication

2010-02-01 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828314#action_12828314
 ] 

Philip Zeyliger commented on HADOOP-6419:
-

Thanks, Kan and Owen, for answering my questions.

Kan, I looked through the new patch very briefly.  Thanks for addressing my 
concerns!

-- Philip

> Change RPC layer to support SASL based mutual authentication
> 
>
> Key: HADOOP-6419
> URL: https://issues.apache.org/jira/browse/HADOOP-6419
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Kan Zhang
>Assignee: Kan Zhang
> Attachments: c6419-26.patch, c6419-39.patch, c6419-45.patch, 
> c6419-66.patch, c6419-67.patch, c6419-69.patch, c6419-70.patch, c6419-72.patch
>
>
> The authentication mechanism to use will be SASL DIGEST-MD5 (see RFC- and 
> RFC-2831) or SASL GSSAPI/Kerberos. Since J2SE 5, Sun provides a SASL 
> implementation by default. Both our delegation token and job token can be 
> used as credentials for SASL DIGEST-MD5 authentication.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6419) Change RPC layer to support SASL based mutual authentication

2010-01-31 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828011#action_12828011
 ] 

Philip Zeyliger commented on HADOOP-6419:
-

Hi,

Coming late to the game here.  I've been reading SASL RFCs (oy), so wanted to
take a look (for my own education and to advise Avro-SASL) at how one
implements it in Java.  I've got some comments and quite a few questions;
thanks in advance for your patience.

bq. public static enum AuthMethod 

RFC4422 calls these "mechanisms".  Admittedly,
in their land, mechanisms are NUL-terminated 
C-strings, and not enums.  I think it's fine
that we restrict the implementation to support
only one mechanism per protocol.

{noformat}
  saslClient = Sasl.createSaslClient(
  new String[] { SaslRpcServer.SASL_DIGEST_MECH }, null, null,
  SaslRpcServer.SASL_DEFAULT_REALM, SaslRpcServer.SASL_PROPS,
  new SaslClientCallbackHandler(token));
{noformat}
Instead of SaslRpcServer.SASL_DIGEST_MECH, you could
put the constant inside the AuthMethod enum, which you
have available here.

{noformat}
  saslClient = Sasl.createSaslClient(
  new String[] { SaslRpcServer.SASL_KERBEROS_MECH }, null, names[0],
  names[1], SaslRpcServer.SASL_PROPS, null);
{noformat}
It's pretty unintuitive that you had to pass the two
parts of the server's kerberos identity as the protocol
and server parameters here.  How did you figure that out?
I couldn't find any documentation for it, outside of the
source for GssKrb5Client.java.

bq. useSasl = false

Am I right in guessing that the reason we don't use
the "plain" mechanism for SASL is that we wish
to avoid SASL's extra framing?

bq. TokenSelector, TokenIdentifier

Could you explain (perhaps in the javadoc) why
you need both of these classes.  The implementation
of TestTokenSelector suggests to me that all TokenSelectors
are just going to compare (kind, service) Text objects,
and that's all.  Are there likely to be different
types of TokenSelectors?  Likewise, when would TokenIdenitifier
not just be (kind, username)?  

I think you're going for some type safety by using generics
there, but I'm missing what it's buying you.

bq. byte[] token = saslServer.unwrap(saslToken, 0, saslToken.length);
bq.processUnwrappedData(token);

At this point, token's not a token, but merely data, no?
You might rename that variable, to avoid confusion.
(I was confused.)


bq. setupResponse(): if (call.connection.useSasl) {

The code here would be clearer if you extracted
this into a "void wrapWithSasl(response)" method.
I missed that response was being re-used,
and was scratching my head for a while :)

bq. SaslInputStream

Isn't it totally bizarre that the SaslServer javaDoc talks
about "SecureInputStream", and there doesn't seem to be 
such a thin?  I think they must have meant
com.sun.jndi.ldap.sasl.SaslInputStream, which seems
to be part of OpenJDK and GPL'd, so never us mind.

{noformat}
  @KerberosInfo(
  serverPrincipalKey = SERVER_PRINCIPAL_KEY
  )
  @TokenInfo(
  identifierClass = TestTokenIdentifier.class,
  secretManagerClass = TestTokenSecretManager.class,
  selectorClass = TestTokenSelector.class
  )
{noformat}
With my "how-much-work-is-this-going-to-be-to-port-to-AVRO" hat on,
I've been thinking about whether these annotations
should be on the protocol (like they are), or just part of 
RPC.getProxy()/RPC.getServer().  I think they're fine
as annotations: Hadoop's protocols are closely
tied with the type of authentication they expect.

That said: there's a lot of implicit information
being passed in this annotation (and Client.java is
correspondingly complicated).  Could this just be
@TokenRpcAuth(enum) and @KerberosRpcAuth(SERVER_PRINCIPLE_KEY)?
I can't imagine a case where one of the three parameters for
the @TokenInfo annotation wouldn't imply the other two, but 
I might be missing something.

I'll also point out that your test works by a little bit of trickery:
I initially thought that if @TokenInfo is specified, Client.java
would use that.  Turns out it will fall back to Kerberos if 
the token's not present.  This is all fine; it was just a bit
complicated to figure out how your test tries to cover both cases.
(It wouldn't be crazy to assert that only one non-plain authentication
type is supported, but maybe there are protocols where you could
do either...)

bq. static void testKerberosRpc

I take it that this is a main() test and not a @Test test
because Kerberos doesn't exist on Hudson?  Might be appropriate
to call that out.

bq. SaslInputStream/SaslOutputStream

Should these have tests?


Thanks for your patience!

-- Philip


> Change RPC layer to support SASL based mutual authentication
> 
>
> Key: HADOOP-6419
> URL: https://issues.apache.org/jira/browse/HADOOP-64

[jira] Commented: (HADOOP-4487) Security features for Hadoop

2009-12-30 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795482#action_12795482
 ] 

Philip Zeyliger commented on HADOOP-4487:
-

I'm surprised I'm the first to comment: is the discussion going on elsewhere?

I read the design document over Christmas.  Great to see a document with so 
much detail, thanks!  I had some questions, and thought a couple of places 
could be clearer; my comments are below.

**

One thing that hasn't been covered (outside of assumptions) is more detail 
about how to operationally secure a Hadoop cluster in Unix-land.  The 
assumptions section lays out some of these ("root" needs to be secure).  Some 
things that I thought about: (1) data nodes node to write their data with a 
unix user that users don't have access to, and with appropriate permissions (or 
umask).  (Looking at my local system, the DataNode has left blocks 
world-readable.)  (2) We assume that the JT and NN are also run under unix 
accounts which users do not have access to.

Since Data Nodes and the NameNode share a key, it's important to limit cluster 
membership.  (This is critical for task trackers, too, since an evil task 
tracker could do nasty things.)  What's the mechanism to limit cluster 
participation?

Is there a central registry of what users can access HDFS and queues?

Is there an "HDFS" superuser?  In existing Hadoop, it's the username 
corresponding to the uid of the running the Namenode process.


bq. If the token doesn't exist in memory, which indicates NameNode has restarted

It could also mean that the token is expired, no?  I think this is made clearer 
in the following sentences.

bq. READ, WRITE, COPY, REPLACE

What is the COPY access mode used for?

bq. "only the user will be able to kill their own jobs and tasks"

Somewhere else in the document, there's discussion of jobs having 
owners/groups, not just owners.  Surely a superuser or cluster manager can kill 
jobs with appropriate permissions?

bq. API and environment changes

Will users still be able to use Hadoop in a "non-secure" manner?  How much work 
would be involved in using a different security model?  This is probably 
answered by the patch itself :)


> Security features for Hadoop
> 
>
> Key: HADOOP-4487
> URL: https://issues.apache.org/jira/browse/HADOOP-4487
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Kan Zhang
>Assignee: Kan Zhang
> Attachments: security-design.pdf
>
>
> This is a top-level tracking JIRA for security work we are doing in Hadoop. 
> Please add reference to this when opening new security related JIRAs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6457) Set Hadoop User/Group by System properties or environment variables

2009-12-18 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792623#action_12792623
 ] 

Philip Zeyliger commented on HADOOP-6457:
-

Do note that users can be in more than one group, so hadoop.group.name should 
probably be hadoop.group.names, and be comma-delimited.

> Set Hadoop User/Group by System properties or environment variables
> ---
>
> Key: HADOOP-6457
> URL: https://issues.apache.org/jira/browse/HADOOP-6457
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: issei yoshida
> Attachments: 6457.patch
>
>
> Hadoop User/Group can be set by System properties or environment variables.
> For example, in environment variables,
> export HADOOP_USER=test
> export HADOOP_GROUP=user
> or in your MapReduce,
> System.setProperty("hadoop.user.name", "test");
> System.setProperty("hadoop.group.name", "user");

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6436) Remove auto-generated native build files

2009-12-11 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789429#action_12789429
 ] 

Philip Zeyliger commented on HADOOP-6436:
-

+1.

> Remove auto-generated native build files 
> -
>
> Key: HADOOP-6436
> URL: https://issues.apache.org/jira/browse/HADOOP-6436
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Eli Collins
>Assignee: Eli Collins
>
> The repo currently includes the automake and autoconf generated files for the 
> native build. Per discussion on HADOOP-6421 let's remove them and use the 
> host's automake and autoconf. We should also do this for libhdfs and 
> fuse-dfs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4998) Implement a native OS runtime for Hadoop

2009-11-30 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784048#action_12784048
 ] 

Philip Zeyliger commented on HADOOP-4998:
-

Figured I'd mention that Tomcat has some support for calling into APR:

Javadoc: 
http://tomcat.apache.org/tomcat-6.0-doc/api/org/apache/tomcat/jni/package-tree.html
Google codesearch link: 
http://www.google.com/codesearch/p?hl=en#cM_OVOKybvs/tomcat/tomcat-6/v6.0.10/src/apache-tomcat-6.0.10-src.zip|KNqCNnRERSg/apache-tomcat-6.0.10-src/java/org/apache/tomcat/jni/User.java&q=tomcat%20apr&d=10

> Implement a native OS runtime for Hadoop
> 
>
> Key: HADOOP-4998
> URL: https://issues.apache.org/jira/browse/HADOOP-4998
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: native
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.21.0
>
>
> It would be useful to implement a JNI-based runtime for Hadoop to get access 
> to the native OS runtime. This would allow us to stop relying on exec'ing 
> bash to get access to information such as user-groups, process limits etc. 
> and for features such as chown/chgrp (org.apache.hadoop.util.Shell).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6318) Upgrade to Avro 1.2.0

2009-10-22 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1276#action_1276
 ] 

Philip Zeyliger commented on HADOOP-6318:
-

Could you also update the Eclipse classpath

{noformat}
$cat .eclipse.templates/.classpath  | grep avro

{noformat}

when you commit this?

Thanks!

> Upgrade to Avro 1.2.0
> -
>
> Key: HADOOP-6318
> URL: https://issues.apache.org/jira/browse/HADOOP-6318
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: io, ipc
>Reporter: Doug Cutting
>Assignee: Doug Cutting
> Attachments: HADOOP-6318.java
>
>
> Avro 1.2 has been released.  The API's Hadoop Common uses have been 
> simplified, and it should be upgraded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-14 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6241:


Status: Patch Available  (was: Open)

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, 
> HADOOP-6241-v4.patch.txt, HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-14 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6241:


Status: Open  (was: Patch Available)

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, 
> HADOOP-6241-v4.patch.txt, HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-14 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6241:


Attachment: HADOOP-6241-v4.patch.txt

bq. # ./bin/split_config.py shows nice usage information including the same 
invocation. split_config.py --help lacks the _doc_ info which I found helpful. 
Can you convince OptionParser to print that for all usage notices?

done

bq. In dist releases, the -default.xml files don't exist. If we expect this to 
be useful for users and not just developers, it should be able to look inside 
the jars for the -default.xml files. Since jars are just zips, you can probably 
do this pretty easily using the zipfile module.

Done.  Added another option to specify templates this way.

Also, here's how I've been testing this (still has to be run manually);
{noformat}
$PYTHONPATH=bin:$PYTHONPATH python src/test/bin/split_config_test.py
Could not find template file to place property 'notintemplates'.  Ignoring.
Writing out to 
Writing out to 
.
--
Ran 1 test in 0.003s

OK
{noformat}


> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, 
> HADOOP-6241-v4.patch.txt, HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-13 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765210#action_12765210
 ] 

Philip Zeyliger commented on HADOOP-6241:
-

Nicholas,

Cool.  Is this ready for submission, then?  Looks like Hudson has looked at the 
last patch.

Thanks!

-- Philip

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, 
> HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-11 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764525#action_12764525
 ] 

Philip Zeyliger commented on HADOOP-6241:
-

Nicholas,

Unfortunately how we load pyAntTasks is a bit moot: split_config.py won't run 
without lxml installed, and that's just not part of the standard python 
installation (though is very commonly available via package managers, etc.)

-- Philip

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, 
> HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-09 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764252#action_12764252
 ] 

Philip Zeyliger commented on HADOOP-6241:
-

I managed to hack together the ant stuff to get the test to "run", but it 
fails, even on my machine, with an import error.  ("[py-test] ImportError: No 
module named lxml.etree").

{quote}
+  
+
+
+
+
+  
+
+  
+
+  
{quote}

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, 
> HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-09 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764242#action_12764242
 ] 

Philip Zeyliger commented on HADOOP-6241:
-

Nicholas,

Even if I pull in pyAntTasks, the build machine (unless it magically has 
python2.5 and/or some extra packages) won't pass the tests, because it won't 
have the python lxml package installed.  I can't think of an easy way around 
that...  I could only run the tests if lxml is there, but that seems like 
cheating.

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, 
> HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-09 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6241:


Status: Open  (was: Patch Available)

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, 
> HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-09 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6241:


Status: Patch Available  (was: Open)

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, 
> HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-09 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6241:


Attachment: HADOOP-6241-v3.patch.txt

Attaching another version, where I've split out the test file.

Nicholas--I tried to update the ant tasks, but there was a deeper problem: I'm 
using lxml.etree, which isn't part of the standard python distro (though is a 
commonly installed package).  I tried to backport to xml.ElementTree, which 
worked (though I couldn't preserve the comments anymore, since ElementTree 
drops them at parse), but then I realized that that package isn't even in 
python 2.4, so it would still be a burden to find machines with the right 
pre-requisites.

My original motivation here was to share a script that I found useful after 
splitting a couple of clusters' configuration files into multiple files.  I 
used python and lxml, since those tools were quickest for me.  In terms of 
integration with the build/test system, Java would have obviously been better 
:/   I'm ok with deciding that this isn't the right place for this script, and 
just throwing it up on the web somewhere else.  Do you think that's the way we 
should go?

BTW, if I were to update the build file, I'd need to pull in a depency to 
pyAntTasks (which Avro already uses).

-- Philip

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241-v3.patch.txt, 
> HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-09 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764152#action_12764152
 ] 

Philip Zeyliger commented on HADOOP-6241:
-

bq. -1 core tests

org.apache.hadoop.io.TestUTF8.testGetBytes  is failing, but has nothing to do 
with this patch.

bq. -1 tests included

There's an embedded python test, runnable with nose.

{quote}
$nosetests -v bin/split-config.py
Test for shuffle_properties. ... ok

--
Ran 1 test in 0.003s

OK
{quote}

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-09 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6241:


Status: Patch Available  (was: Open)

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-10-09 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6241:


Attachment: HADOOP-6241-v2.patch.txt

Attaching a new version that tries to preserve comments.

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241-v2.patch.txt, HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6287) Support units for configuration knobs

2009-09-29 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760665#action_12760665
 ] 

Philip Zeyliger commented on HADOOP-6287:
-

The URI is the type that you'll get if you get the value of the configuration 
variable with "fsDefaultName.get(Configuration)".  Different configuration 
variables will have different types there.  Today, Configuration.get() returns 
String, and there are helper methods to get integers, etc.  The basic types 
we'd want to support right off are Boolean, List, URI, String, 
List, Integer, and probably a few others.  In the context of this 
ticket, we might want to support a Size type, which has a 
Size.getValue(SizeUnit.BYTES) accessor.

The annotation is used to process the code to generate the documentation.  
There's more than one way to do this, but it's pretty easy to ask a variant of 
javac to spit out all the annotated variables.

> Support units for configuration knobs
> -
>
> Key: HADOOP-6287
> URL: https://issues.apache.org/jira/browse/HADOOP-6287
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Reporter: Arun C Murthy
>Assignee: Eli Collins
> Attachments: hadoop-6287-1.patch
>
>
> We should add support for units in our Configuration system so that we can 
> specify values to be *1GB* or *1MB* rather than forcing every component which 
> consumes such values to be aware of these.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6287) Support units for configuration knobs

2009-09-29 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760628#action_12760628
 ] 

Philip Zeyliger commented on HADOOP-6287:
-

I'm +1 eric's suggestion: I think config documentation and code should be 
co-located, and should generate things.  

I'd like to do something like this:

{code}
@ConfigVariableDeclaration
  public final static ConfigVariable fsDefaultName = 
ConfigVariableBuilder.newURI()
  .setDefault(null)
  .setHelp("Default filesystem")
  .setVisibility(Visibility.PUBLIC)
  .addAccessor("fs.default.name")
  .addDeprecatedAccessor("core.default.fs", "Use foo instead")
  .addValidator(new ValidateSupportedFilesystem());
{code}
(See more at https://issues.apache.org/jira/browse/HADOOP-6105#action_12727143)

Yes, this imposes a type system: fsDefaultName now has to be a URI.  There 
could be other units/validators for time, disk, etc.  You then should access 
the variable by using "fsDefaultName.get(Configuration)", which means that it's 
easy to find all usages of the configuration.  I'd very much like Hadoop to 
find configuration errors early, and for those errors to be sensible.  (It 
would help both with NPEs deep inside of things that aren't obviously looking 
at configuration, but also at the wide array of ways we store comma-delimited 
values in Configuration: we should only be writing that logic correctly once, 
and, you know, we should support filenames with commas in them.)

Steve, I actually think having a schema on configuration would make LDAP 
support easier, not harder.  For reasons of backwards compatibility, we're 
going to support toString/fromString for a long time, and if you know what 
you're storing, the mapping functions to/from LDAP are going to be considerably 
easier to write.

-- Philip

> Support units for configuration knobs
> -
>
> Key: HADOOP-6287
> URL: https://issues.apache.org/jira/browse/HADOOP-6287
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Reporter: Arun C Murthy
>Assignee: Eli Collins
> Attachments: hadoop-6287-1.patch
>
>
> We should add support for units in our Configuration system so that we can 
> specify values to be *1GB* or *1MB* rather than forcing every component which 
> consumes such values to be aware of these.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6288) Improve Configuration to support default and minimum values

2009-09-27 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760075#action_12760075
 ] 

Philip Zeyliger commented on HADOOP-6288:
-

I haven't had much time to work on it, but in 
https://issues.apache.org/jira/browse/HADOOP-6105#action_12727143 I suggested 
an approach that would give us typed configuration, including default values 
and some validation.  If there was some buy-in, I'd be very happy to bang out 
my prototype code into submission.

-- Philip

> Improve Configuration to support default and minimum values 
> 
>
> Key: HADOOP-6288
> URL: https://issues.apache.org/jira/browse/HADOOP-6288
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Reporter: Arun C Murthy
>
> HADOOP-6105 provided a way to automatically handle deprecation of configs - 
> I'd like to take this further to support default values, minimum values etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6170) add Avro-based RPC serialization

2009-09-16 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756167#action_12756167
 ] 

Philip Zeyliger commented on HADOOP-6170:
-

I took a look.  Overall, looks good.  I'm sufficiently behind on my Avro-fu 
that I
had a hard time following some of the magic.

I'm writing down below what my current understanding is.  Let me know
where I'm totally wrong.  Do add some JavaDoc per class.

TunnelProtocol is the meta-interface that's used by Hadoop IPC.

BufferListWritable is used by Hadoop IPC to ship
the Avro byte-stream around.  Its serialization format is:

{code}
data := size (buffer)[size]
buffer := len (bytes)[len]
size, len are writable integers
bytes are raw bytes
x[count] indicates count repetitions of x.
{code}

ClientTranceiver delegates shipping of bytes via Hadoop IPC.
ServerTranceiver doesn't do much of anything except store
the response data.  New instances of it are created at every
call in TunnelResponder.

TunnelResponder implements the Hadoop IPC (TunnelProtocol);
it's the thing running on the server, and converts calls
into Avro RPC calls.  This class would be clearer if, instead
of extending ReflectResponder, you kept a private ReflectResponder,
and called that explicitly in call.  Then you would rename
TunnelResponder to TunnelProtocolImpl.

Can a single RPC server satisfy multiple protocols?  Does that
work with AvroRPC?  I don't think it can right now, but I think
that's necessary, since several daemons implement a handful of
protocols.

Some specific notes:

bq. versioning features to for inter-Java RPCs.

Typo.

AvroTestProtocol is generated code, yes?  Or does ReflectData.getProtocol()
figure it out from reflection and paranamer data?  If the former,
should the source schema be checked in?  Should the generation
be done in build.xml?

bq. assertEquals(intResult, 3);

Total nit: JUnit prefers the expected argument on the left.

bq. public BufferListWritable() {}

Eclipse tells me this is unused.  Might be worth
a quick comment indicating that it's required because
Writables get instantiated via reflection.

bq. getRemoteName() { return "remote"; }

Should these be customized to the Protocol being used?

Cheers,

-- Philip

> add Avro-based RPC serialization
> 
>
> Key: HADOOP-6170
> URL: https://issues.apache.org/jira/browse/HADOOP-6170
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Doug Cutting
>Assignee: Doug Cutting
> Fix For: 0.21.0
>
> Attachments: HADOOP-6170.patch, HADOOP-6170.patch, HADOOP-6170.patch
>
>
> Permit RPC protocols to use Avro to serialize requests and responses, so that 
> protocols may better evolve without breaking compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6257) Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr

2009-09-15 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6257:


Status: Patch Available  (was: Open)

> Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr
> --
>
> Key: HADOOP-6257
> URL: https://issues.apache.org/jira/browse/HADOOP-6257
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, fs, test
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: HADOOP-6257.patch
>
>
> I propose to rename 
> hadoop-common/src/test/core/org/apache/hadoop/fs/TestFileSystem.java -> 
> src/test/core/org/apache/hadoop/fs/TestFileSystemCaching.java.  Otherwise, it 
> conflicts with 
> hadoop-hdfs/src/test/hdfs-with-mr/org/apache/hadoop/fs/TestFileSystem.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6257) Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr

2009-09-15 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6257:


Attachment: HADOOP-6257.patch

Trivial rename.

BTW, the conflict showed up in the recent HADOOP-6231.  And I ran into it 
because hadoop-hdfs-hdfswithmr-test's driver was finding the wrong 
TestFileSystem.

{code}
$ bin/hadoop jar lib/hadoop-hdfs-hdfswithmr-test-0.21.0-dev.jar 
java.lang.NoSuchMethodException: 
org.apache.hadoop.fs.TestFileSystem.main([Ljava.lang.String;)
at java.lang.Class.getMethod(Class.java:1605)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.(ProgramDriver.java:56)
at org.apache.hadoop.util.ProgramDriver.addClass(ProgramDriver.java:99)
at 
org.apache.hadoop.test.HdfsWithMRTestDriver.(HdfsWithMRTestDriver.java:47)
at 
org.apache.hadoop.test.HdfsWithMRTestDriver.(HdfsWithMRTestDriver.java:39)
at 
org.apache.hadoop.test.HdfsWithMRTestDriver.main(HdfsWithMRTestDriver.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
An example program must be given as the first argument.
Valid program names are:
  nnbench: A benchmark that stresses the namenode.
{code}


> Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr
> --
>
> Key: HADOOP-6257
> URL: https://issues.apache.org/jira/browse/HADOOP-6257
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, fs, test
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: HADOOP-6257.patch
>
>
> I propose to rename 
> hadoop-common/src/test/core/org/apache/hadoop/fs/TestFileSystem.java -> 
> src/test/core/org/apache/hadoop/fs/TestFileSystemCaching.java.  Otherwise, it 
> conflicts with 
> hadoop-hdfs/src/test/hdfs-with-mr/org/apache/hadoop/fs/TestFileSystem.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-6257) Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr

2009-09-15 Thread Philip Zeyliger (JIRA)

Two TestFileSystem classes are confusing hadoop-hdfs-hdfwithmr
--

 Key: HADOOP-6257
 URL: https://issues.apache.org/jira/browse/HADOOP-6257
 Project: Hadoop Common
  Issue Type: Bug
  Components: build, fs, test
Reporter: Philip Zeyliger
Priority: Minor


I propose to rename 
hadoop-common/src/test/core/org/apache/hadoop/fs/TestFileSystem.java -> 
src/test/core/org/apache/hadoop/fs/TestFileSystemCaching.java.  Otherwise, it 
conflicts with 
hadoop-hdfs/src/test/hdfs-with-mr/org/apache/hadoop/fs/TestFileSystem.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-09-04 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6241:


Attachment: HADOOP-6241.patch.txt

I'm attaching a quick script I wrote for this purpose.  The help is pasted in 
below for lighter reading.

I'm very open to suggestions about what to name the script (split-config.py is 
a bit generic).  Is bin/ an appropriate place for this, or is there a better 
place?

To run, the script requires that lxml for python is installed on your system.  
It's not included and is BSD-licensed.  
http://codespeak.net/lxml/index.html#license  .  There's a built-in test that 
should be run with the nose test runner 
(http://pypi.python.org/pypi/nose/0.11.1 -- LGPL), though that dependency could 
be removed quite easily.

{noformat}
This script separates a single Hadoop XML configuration file
into multiple ones, by finding properties that are
in supplied templates and storing them in the designated
output files.

This script comes about to solve the problem of splitting
up "hadoop-site.xml" into "core-site.xml", "mapred-site.xml",
and "hdfs-site.xml", and, in fact, a common usage would be

split-config.py --input hadoop-site.xml \\
  --template core-default.xml \\
  --template hdfs-default.xml \\
  --template mapred-default.xml \\
  --output core-site.xml \\
  --output hdfs-site.xml \\
  --output mapred-site.xml
{noformat}

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-09-04 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751667#action_12751667
 ] 

Philip Zeyliger commented on HADOOP-6241:
-

Also, if we commit this, we should consider committing to the 0.20 branch, 
since that's when people are likely to be migrating from one set of 
configurations to the other.

> Script to split configurations from one file into multiple, according to 
> templates.
> ---
>
> Key: HADOOP-6241
> URL: https://issues.apache.org/jira/browse/HADOOP-6241
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Philip Zeyliger
> Attachments: HADOOP-6241.patch.txt
>
>
> This script moves properties from hadoop-site.xml into common-site, 
> mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
> upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-6241) Script to split configurations from one file into multiple, according to templates.

2009-09-04 Thread Philip Zeyliger (JIRA)

Script to split configurations from one file into multiple, according to 
templates.
---

 Key: HADOOP-6241
 URL: https://issues.apache.org/jira/browse/HADOOP-6241
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Philip Zeyliger


This script moves properties from hadoop-site.xml into common-site, 
mapred-site, and hdfs-site in a reasonably generic way.  This is useful for 
upgrading clusters from 0.18 to 0.20 and 0.21.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6105) Provide a way to automatically handle backward compatibility of deprecated keys

2009-07-21 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733919#action_12733919
 ] 

Philip Zeyliger commented on HADOOP-6105:
-

I think you have to think about how Hadoop's notion of a "final" flag interacts 
with this, too.  If a system administrator has set either A or B to be final, 
then that value must override any user-submitted value, regardless of which was 
set first.

> Provide a way to automatically handle backward compatibility of deprecated 
> keys
> ---
>
> Key: HADOOP-6105
> URL: https://issues.apache.org/jira/browse/HADOOP-6105
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Reporter: Hemanth Yamijala
>
> There are cases when we have had to deprecate configuration keys. Use cases 
> include, changing the names of variables to better match intent, splitting a 
> single parameter into two - for maps, reduces etc.
> In such cases, we typically provide a backwards compatible option for the old 
> keys. The handling of such cases might typically be common enough to actually 
> add support for it in a generic fashion in the Configuration class. Some 
> initial discussion around this started in HADOOP-5919, but since the project 
> split happened in between we decided to open this issue to fix it in common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6105) Provide a way to automatically handle backward compatibility of deprecated keys

2009-07-16 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732002#action_12732002
 ] 

Philip Zeyliger commented on HADOOP-6105:
-

You might consider logging a warning every time you run into (either via get or 
set) a "set" deprecated key.

> Provide a way to automatically handle backward compatibility of deprecated 
> keys
> ---
>
> Key: HADOOP-6105
> URL: https://issues.apache.org/jira/browse/HADOOP-6105
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Reporter: Hemanth Yamijala
>
> There are cases when we have had to deprecate configuration keys. Use cases 
> include, changing the names of variables to better match intent, splitting a 
> single parameter into two - for maps, reduces etc.
> In such cases, we typically provide a backwards compatible option for the old 
> keys. The handling of such cases might typically be common enough to actually 
> add support for it in a generic fashion in the Configuration class. Some 
> initial discussion around this started in HADOOP-5919, but since the project 
> split happened in between we decided to open this issue to fix it in common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6105) Provide a way to automatically handle backward compatibility of deprecated keys

2009-07-15 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731560#action_12731560
 ] 

Philip Zeyliger commented on HADOOP-6105:
-

Owen,

Apologies for missing this e-mail for so long.  I'm behind on the "all-jira" 
bucket, and I failed to set a watch.

Hemanth, you should definitely forge ahead with the simple, expedient solution. 
 

I'd like to convince you and Owen that the more complicated proposal is a net 
win (and I've used a similar system in the past), but I think the best way to 
do that is to actually write the code and transform a few usages.  I've been 
busy with some other deadlines, so when I get there, I'll file a JIRA and 
bother you all again.

(To answer Owen's questions: the couple of classes for ConfigVariable go into 
the configuration package; users are welcome to use the same classes to set 
their variables, or they can set them manually; the documentation for the 
variables themselves is generated, the documentation for the system lives in 
JavaDoc on the individual classes and the package.)

-- Philip



> Provide a way to automatically handle backward compatibility of deprecated 
> keys
> ---
>
> Key: HADOOP-6105
> URL: https://issues.apache.org/jira/browse/HADOOP-6105
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Reporter: Hemanth Yamijala
>
> There are cases when we have had to deprecate configuration keys. Use cases 
> include, changing the names of variables to better match intent, splitting a 
> single parameter into two - for maps, reduces etc.
> In such cases, we typically provide a backwards compatible option for the old 
> keys. The handling of such cases might typically be common enough to actually 
> add support for it in a generic fashion in the Configuration class. Some 
> initial discussion around this started in HADOOP-5919, but since the project 
> split happened in between we decided to open this issue to fix it in common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

2009-07-10 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729870#action_12729870
 ] 

Philip Zeyliger commented on HADOOP-6140:
-

One creates clusters in unit tests using MiniMRCluster, but that's too 
heavy-weight: I think it's ok to extract the relevant function and test it via 
unit test.

+1 to the tests for (1).

-- Philip

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> 
>
> Key: HADOOP-6140
> URL: https://issues.apache.org/jira/browse/HADOOP-6140
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 0.18.3
>Reporter: Vladimir Klimontovich
> Attachments: HADOOP-6140-ver2.patch, HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be 
> called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this 
> file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to 
> mapred.job.classpath.archives property. It uses 
> System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits 
> mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path 
> urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS 
> paths and local paths in distributed cache. 
> It compares
>if (archives[i].getPath().equals(
>
> archiveClasspaths[j].toString())){
> instead of
> if (archives[i].toString().equals(
>
> archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

2009-07-10 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729812#action_12729812
 ] 

Philip Zeyliger commented on HADOOP-6140:
-

Vladimir,

Patch looks good.  It would be nice to have a test for (2).  It may also be 
appropriate to add an exception if someone passes in a filename with a comma.

-- Philip

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> 
>
> Key: HADOOP-6140
> URL: https://issues.apache.org/jira/browse/HADOOP-6140
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 0.18.3
>Reporter: Vladimir Klimontovich
> Attachments: HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be 
> called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this 
> file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to 
> mapred.job.classpath.archives property. It uses 
> System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits 
> mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path 
> urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS 
> paths and local paths in distributed cache. 
> It compares
>if (archives[i].getPath().equals(
>
> archiveClasspaths[j].toString())){
> instead of
> if (archives[i].toString().equals(
>
> archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6125) extend DistributedCache to work locally (LocalJobRunner) (common half)

2009-07-09 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6125:


Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

MAPREDUCE-711 moves the distributed cache to common, which makes separating the 
patch in MAPREDUCE-476 into two (this being one half of it) is no longer 
necessary.

> extend DistributedCache to work locally (LocalJobRunner) (common half)
> --
>
> Key: HADOOP-6125
> URL: https://issues.apache.org/jira/browse/HADOOP-6125
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Minor
> Attachments: HADOOP-6125.patch
>
>
> This is the co-ticket to MAPREDUCE-476, covering the significant part of 
> DistributedCache that's in common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6105) Provide a way to automatically handle backward compatibility of deprecated keys

2009-07-06 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727615#action_12727615
 ] 

Philip Zeyliger commented on HADOOP-6105:
-

Hemanth,

This JIRA is about backwards-compatibility of deprecated keys, which is 
something my comment addresses, so I thought it fit in well here.  Think of it 
as an alternative solution to the problem you're trying to solve by keeping the 
map of deprecated keys in Configuration.java.  Keeping a deprecation map is 
expedient and simple, but I think it may hamper a better, longer-term solution.

The design goals above are "out of thin air" (in the sense that they haven't 
been discussed on JIRA outside of the JIRAs mentioned above and MAPREDUCE-475), 
though I hope they're reasonable.  They were discussed a bit at 
http://wiki.apache.org/hadoop/DeveloperOffsite20090612, too.  That said, I hope 
they help to frame the conversation a bit

I very very much want there to be a path to be able to rename configuration 
keys, but I want to make sure that the solution that comes out of this JIRA is 
compatible with some future work.

-- Philip

> Provide a way to automatically handle backward compatibility of deprecated 
> keys
> ---
>
> Key: HADOOP-6105
> URL: https://issues.apache.org/jira/browse/HADOOP-6105
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Reporter: Hemanth Yamijala
>
> There are cases when we have had to deprecate configuration keys. Use cases 
> include, changing the names of variables to better match intent, splitting a 
> single parameter into two - for maps, reduces etc.
> In such cases, we typically provide a backwards compatible option for the old 
> keys. The handling of such cases might typically be common enough to actually 
> add support for it in a generic fashion in the Configuration class. Some 
> initial discussion around this started in HADOOP-5919, but since the project 
> split happened in between we decided to open this issue to fix it in common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6105) Provide a way to automatically handle backward compatibility of deprecated keys

2009-07-03 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727143#action_12727143
 ] 

Philip Zeyliger commented on HADOOP-6105:
-

I'm not enamored of this approach and would like to propose
a slightly heavier-weight, but, I think, cleaner approach
than stuffing more logic into the Configuration class.
My apologies for coming to this conversation a bit late.

If you don't want to read a long e-mail, skip down to the code examples
at the bottom. :)

Before I get to the proposal, I wanted to lay out what I think the goals
are.  Note that HADOOP-475 is also related.

* Standardization of configuration names, documentation, and 
value formats.  Today, the names tend to appear in the code, or, at best,
in constants in the code, and the documentation, when it exists,
may be in -default.xml.  It would be nice if it was very difficult
to avoid writing documentation for the variable you're introducing.
Right now there are and have been a handful of bugs where the default
in the code is different than the default in the XML file, and
that gets really confusing.

* Backwards compatibility.  We'd love to rename "mapred.foo" and "mr.bar"
to be consistent, but we want to maintain backwards compatibility.
This ticket is all about that.

* Availability to user code.  Users should be able to use configuration the 
same way the core does.
Users pass information to their jobs via Configuration, and they should
use the same mechanism.  This is true today.

* Type-safety. Configurations have a handful of recurring types: number of 
bytes,
filename, URI, hostport combination, arrays of paths, etc.  The parsing
is done in an ad-hoc fashion, which is a shame, since it doesn't have to be.
It would be nice to have some generic runtime checking of configuration
parameters, too, and perhaps even ranges (that number can't be negative!).

* Upgradeability to a different configuration format.  I don't think we'll
leave a place where configuration has to be a key->value map (especially
because of "availability to user code", but it would eventually be nice
if configuration could be queried from other places, or if the
values could have a bit more structure.  (For example, we could use XML
to separate out a list of paths, instead of blindly using comma-delimited,
unescaped text.)

* Development ease.  It ought to be easier to find the places where 
configuration
is used.  Today the best we can do is a grep, and then follow references
manually.

* Autogenerated documentation.  No-brainer.

* Ability to specify visibility, scope, and stability.  Alogn the lines of 
HADOOP-5073, configuration
variables should be classified as deprecated, unstable, evolving, and stable.  
It would be
nice to introduce variables (say, that were used for tuning), with the 
expectation that they are
not part of the public API.  Use at your own risk sort of thing.

My proposal is to represent every configuration variable that's accessed in
the Hadoop code by a static instance of a ConfigVariable class.  The 
interface
is something like:

{code}
public interface ConfigValue {
  T get(Configuration conf);
  T getDefault();
  void set(Configuration conf, T value);
  String getHelp();
}
{code}

There's more than one way to implement this.  Here's one proposal that uses
Java annotations:

{code}
  @ConfigDescription(help="Some help text", 
  visibility=Visibility.PUBLIC)
  @ConfigAccessors({
@ConfigAccessor(name="common.sample"),
@ConfigAccessor(name="core.sample", deprecated="Use common.sample instead")
  })
  public final static ConfigVariable myConfigVariable = 
ConfigVariables.newIntConfigVariable(15 /* default value */);
{code}
This approach would require pre-processing (at build time) the annotations
into a data file, and then, at runtime, querying this data file.
(It's not easily possible to get at the annotations on the
field from within myConfigVariable.)

I'm half-way to getting this working, and I actually think something
like the following would be better:
{code}
  @ConfigVariableDeclaration
  public final static ConfigVariable fsDefaultName = 
ConfigVariableBuilder.newURI()
  .setDefault(null)
  .setHelp("Default filesystem")
  .setVisibility(Visibility.PUBLIC)
  .addAccessor("fs.default.name")
  .addDeprecatedAccessor("core.default.fs", "Use foo instead")
  .addValidator(new ValidateSupportedFilesystem());
{code}
This would still require build-time preprocessing (javac supports
this) to find the variables, instantiate them, and output
the documentation, but the rest of the processing is easy
at runtime.

A drawback of this approach is how to handle the defaults that
default to other variables.  Perhaps the easiest thing to do 
is to handle the same syntax we support now, like 
'addIndirectDefault("${default.dir}/mapred")',
but something that references the other variable directly is m

[jira] Updated: (HADOOP-6125) extend DistributedCache to work locally (LocalJobRunner) (common half)

2009-07-01 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6125:


Status: Patch Available  (was: Open)

> extend DistributedCache to work locally (LocalJobRunner) (common half)
> --
>
> Key: HADOOP-6125
> URL: https://issues.apache.org/jira/browse/HADOOP-6125
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Minor
> Attachments: HADOOP-6125.patch
>
>
> This is the co-ticket to MAPREDUCE-476, covering the significant part of 
> DistributedCache that's in common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6125) extend DistributedCache to work locally (LocalJobRunner) (common half)

2009-07-01 Thread Philip Zeyliger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-6125:


Attachment: HADOOP-6125.patch

Regenerated patches from HADOOP-2914.  I used:

bq. cat HADOOP-2914-v3.patch | sed -e 's,src/core/,src/java/,' | patch -p0

> extend DistributedCache to work locally (LocalJobRunner) (common half)
> --
>
> Key: HADOOP-6125
> URL: https://issues.apache.org/jira/browse/HADOOP-6125
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Minor
> Attachments: HADOOP-6125.patch
>
>
> This is the co-ticket to MAPREDUCE-476, covering the significant part of 
> DistributedCache that's in common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-6125) extend DistributedCache to work locally (LocalJobRunner) (common half)

2009-07-01 Thread Philip Zeyliger (JIRA)

extend DistributedCache to work locally (LocalJobRunner) (common half)
--

 Key: HADOOP-6125
 URL: https://issues.apache.org/jira/browse/HADOOP-6125
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
Priority: Minor


This is the co-ticket to MAPREDUCE-476, covering the significant part of 
DistributedCache that's in common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

78 matches

Mail list logo