[jira] [Reopened] (HADOOP-10434) Is it possible to use "df" to calculate the dfs usage instead of "du"

2016-12-18 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HADOOP-10434:
--

Reopening to close as Duplicate status vs. Fixed.

> Is it possible to use "df" to calculate the dfs usage instead of "du"
> -
>
> Key: HADOOP-10434
> URL: https://issues.apache.org/jira/browse/HADOOP-10434
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.3.0
>Reporter: MaoYuan Xian
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-10434-1.patch
>
>
> When we run datanode from the machine with big disk volume, it's found du 
> operations from org.apache.hadoop.fs.DU's DURefreshThread cost lots of disk 
> performance.
> As we use the whole disk for hdfs storage, it is possible calculate volume 
> usage via "df" command. Is it necessary adding the "df" option for usage 
> calculation in hdfs 
> (org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-10434) Is it possible to use "df" to calculate the dfs usage instead of "du"

2016-12-18 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-10434.
--
Resolution: Duplicate

> Is it possible to use "df" to calculate the dfs usage instead of "du"
> -
>
> Key: HADOOP-10434
> URL: https://issues.apache.org/jira/browse/HADOOP-10434
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.3.0
>Reporter: MaoYuan Xian
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: HADOOP-10434-1.patch
>
>
> When we run datanode from the machine with big disk volume, it's found du 
> operations from org.apache.hadoop.fs.DU's DURefreshThread cost lots of disk 
> performance.
> As we use the whole disk for hdfs storage, it is possible calculate volume 
> usage via "df" command. Is it necessary adding the "df" option for usage 
> calculation in hdfs 
> (org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13817) Add a finite shell command timeout to ShellBasedUnixGroupsMapping

2016-11-14 Thread Harsh J (JIRA)
Harsh J created HADOOP-13817:


 Summary: Add a finite shell command timeout to 
ShellBasedUnixGroupsMapping
 Key: HADOOP-13817
 URL: https://issues.apache.org/jira/browse/HADOOP-13817
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.6.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


The ShellBasedUnixGroupsMapping run various {{id}} commands via the 
ShellCommandExecutor modules without a timeout set (its set to 0, which implies 
infinite).

If this command hangs for a long time on the OS end due to an unresponsive 
groups backend or other reasons, it also blocks the handlers that use it on the 
NameNode (or other services that use this class). That inadvertently causes odd 
timeout troubles on the client end where its forced to retry (only to likely 
run into such hangs again with every attempt until at least one command 
returns).

It would be helpful to have a finite command timeout after which we may give up 
on the command and return the result equivalent of no groups found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-8134) DNS claims to return a hostname but returns a PTR record in some cases

2016-10-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8134.
-
Resolution: Not A Problem
  Assignee: (was: Harsh J)

This hasn't proven as a problem in late. Closing as stale.

> DNS claims to return a hostname but returns a PTR record in some cases
> --
>
> Key: HADOOP-8134
> URL: https://issues.apache.org/jira/browse/HADOOP-8134
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Priority: Minor
>
> Per Shrijeet on HBASE-4109:
> {quote}
> If you are using an interface anything other than 'default' (literally that 
> keyword) DNS.java's getDefaultHost will return a string which will have a 
> trailing period at the end. It seems javadoc of reverseDns in DNS.java (see 
> below) is conflicting with what that function is actually doing. 
> It is returning a PTR record while claims it returns a hostname. The PTR 
> record always has period at the end , RFC: 
> http://irbs.net/bog-4.9.5/bog47.html
> We make call to DNS.getDefaultHost at more than one places and treat that as 
> actual hostname.
> Quoting HRegionServer for example
> String machineName = DNS.getDefaultHost(conf.get(
> "hbase.regionserver.dns.interface", "default"), conf.get(
> "hbase.regionserver.dns.nameserver", "default"));
> We may want to sanitize the string returned from DNS class. Or better we can 
> take a path of overhauling the way we do DNS name matching all over.
> {quote}
> While HBase has worked around the issue, we should fix the methods that 
> aren't doing what they've intended.
> 1. We fix the method. This may be an 'incompatible change'. But I do not know 
> who outside of us uses DNS classes.
> 2. We fix HDFS's DN at the calling end, cause that is affected by the 
> trailing period in its reporting back to the NN as well (Just affects NN->DN 
> weblinks, non critical).
> For 2, we can close this and open a HDFS JIRA.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-7505) EOFException in RPC stack should have a nicer error message

2016-10-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-7505.
-
Resolution: Duplicate
  Assignee: (was: Harsh J)

This seems to be taken care (in part) via HADOOP-7346

> EOFException in RPC stack should have a nicer error message
> ---
>
> Key: HADOOP-7505
> URL: https://issues.apache.org/jira/browse/HADOOP-7505
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 0.23.0
>Reporter: Eli Collins
>Priority: Minor
>
> Lots of user logs involve a user running mismatched versions, and for some 
> reason or another, they get EOFException instead of a proper version mismatch 
> exception. We should be able to catch this at appropriate points, and have a 
> nicer exception message explaining that it's a possible version mismatch, or 
> that they're trying to connect to the incorrect port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-8579) Websites for HDFS and MapReduce both send users to video training resource which is non-public

2016-10-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8579.
-
Resolution: Not A Problem
  Assignee: (was: Harsh J)

This does not appear to be a problem after the project re-merge.

> Websites for HDFS and MapReduce both send users to video training resource 
> which is non-public
> --
>
> Key: HADOOP-8579
> URL: https://issues.apache.org/jira/browse/HADOOP-8579
> Project: Hadoop Common
>  Issue Type: Bug
> Environment: website
>Reporter: David L. Willson
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Main pages for HDFS and MapReduce send new user to unavailable training 
> resource.
> These two pages:
> http://hadoop.apache.org/mapreduce/
> http://hadoop.apache.org/hdfs/
> Link to this page:
> http://vimeo.com/3584536
> That page is not public, and not shared to all registered Vimeo users, and I 
> see nothing indicating how to ask for access to the resource.
> Please make the vids public, or remove the link of disappointment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-8863) Eclipse plugin may not be working on Juno due to changes in it

2016-10-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8863.
-
Resolution: Won't Fix
  Assignee: (was: Harsh J)

The eclipse plugin is formally out.

> Eclipse plugin may not be working on Juno due to changes in it
> --
>
> Key: HADOOP-8863
> URL: https://issues.apache.org/jira/browse/HADOOP-8863
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: contrib/eclipse-plugin
>Affects Versions: 1.2.0
>Reporter: Harsh J
>
> We need to debug/investigate why it is so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13515) Redundant transitionToActive call can cause a NameNode to crash

2016-08-18 Thread Harsh J (JIRA)
Harsh J created HADOOP-13515:


 Summary: Redundant transitionToActive call can cause a NameNode to 
crash
 Key: HADOOP-13515
 URL: https://issues.apache.org/jira/browse/HADOOP-13515
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Affects Versions: 2.5.0
Reporter: Harsh J
Priority: Minor


The situation in parts is similar to HADOOP-8217, but the cause is different 
and so is the result.

Consider this situation:

- At the beginning NN1 is Active, NN2 is Standby
- ZKFC1 faces a ZK disconnect (not a session timeout, just a socket disconnect) 
and thereby reconnects

{code}
2016-08-11 07:00:46,068 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 4000ms for sessionid 
0x4566f0c97500bd9, closing socket connection and attempting reconnect
2016-08-11 07:00:46,169 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session 
disconnected. Entering neutral mode...
…
2016-08-11 07:00:46,610 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session 
connected.
{code}

- The reconnection on the ZKFC1 triggers the elector code, and the elector 
re-run finds that NN1 should be the new active (a redundant decision cause NN1 
is already active)

{code}
2016-08-11 07:00:46,615 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
Checking for any old active which needs to be fenced...
2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old 
node exists: …
2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: But old 
node has our own data, so don't need to fence it.
{code}

- The ZKFC1 sets the new ZK data, and fires a NN1 RPC call of transitionToActive

{code}
2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing 
znode /hadoop-ha/nameservice1/ActiveBreadCrumb to indicate that the local node 
is the most recent active...
2016-08-11 07:00:46,649 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 175: 
Call -> nn01/10.10.10.10:8022: transitionToActive {reqInfo { reqSource: 
REQUEST_BY_ZKFC }}
{code}

- At the same time as the transitionToActive call is in progress at NN1, but 
not complete yet, the ZK session of ZKFC1 is timed out by ZK Quorum, and a 
watch notification is sent to ZKFC2

{code}
2016-08-11 07:01:00,003 DEBUG org.apache.zookeeper.ClientCnxn: Got notification 
sessionid:0x4566f0c97500bde
2016-08-11 07:01:00,004 DEBUG org.apache.zookeeper.ClientCnxn: Got WatchedEvent 
state:SyncConnected type:NodeDeleted 
path:/hadoop-ha/nameservice1/ActiveStandbyElectorLock for sessionid 
0x4566f0c97500bde
{code}

- ZKFC2 responds by marking NN2 as standby, which succeeds (NN hasn't handled 
transitionToActive call yet due to busy status, but has handled 
transitionToStandby before it)

{code}
2016-08-11 07:01:00,013 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
Checking for any old active which needs to be fenced...
2016-08-11 07:01:00,018 INFO org.apache.hadoop.ha.ZKFailoverController: Should 
fence: NameNode at nn01/10.10.10.10:8022
2016-08-11 07:01:00,020 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 412: 
Call -> nn01/10.10.10.10:8022: transitionToStandby {reqInfo { reqSource: 
REQUEST_BY_ZKFC }}
2016-08-11 07:01:03,880 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: 
transitionToStandby took 3860ms
{code}

- ZKFC2 then marks NN2 as active, and NN2 begins its transition (is in midst of 
it, not done yet at this point)

{code}
2016-08-11 07:01:03,894 INFO org.apache.hadoop.ha.ZKFailoverController: Trying 
to make NameNode at nn02/11.11.11.11:8022 active...
2016-08-11 07:01:03,895 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 412: 
Call -> nn02/11.11.11.11:8022: transitionToActive {reqInfo { reqSource: 
REQUEST_BY_ZKFC }}
…
{code}

{code}
2016-08-11 07:01:09,558 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required 
for active state
…
2016-08-11 07:01:19,968 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
edit logs at txnid 5635
{code}

- At the same time in parallel NN1 processes the transitionToActive requests 
finally, and becomes active

{code}
2016-08-11 07:01:13,281 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required 
for active state
…
2016-08-11 07:01:19,599 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
edit logs at txnid 5635
…
2016-08-11 07:01:19,602 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Starting log segment at 5635
{code}

- NN2's active transition fails as a result of this parallel active transition 
on NN1 which has completed right before it tries to take over

{code}
2016-08-11 07:01:19,968 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
edit logs at txnid 5635
2016-08-11 07:01:22,799 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: 
Error encounter

[jira] [Created] (HADOOP-13056) Print expected values when rejecting a server's determined principal

2016-04-22 Thread Harsh J (JIRA)
Harsh J created HADOOP-13056:


 Summary: Print expected values when rejecting a server's 
determined principal
 Key: HADOOP-13056
 URL: https://issues.apache.org/jira/browse/HADOOP-13056
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.5.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial


When an address-constructed service principal by a client does not match a 
provided pattern or the configured principal property, the error is very 
uninformative on what the specific cause is. Currently the only error printed 
is, in both cases:

{code}
 java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
hdfs/host.internal@REALM
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-13051) Test for special characters in path being respected during globPaths

2016-04-22 Thread Harsh J (JIRA)
Harsh J created HADOOP-13051:


 Summary: Test for special characters in path being respected 
during globPaths
 Key: HADOOP-13051
 URL: https://issues.apache.org/jira/browse/HADOOP-13051
 Project: Hadoop Common
  Issue Type: Test
  Components: fs
Affects Versions: 3.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


On {{branch-2}}, the below is the (incorrect) behaviour today, where paths with 
special characters get dropped during globStatus calls:

{code}
bin/hdfs dfs -mkdir /foo
bin/hdfs dfs -touchz /foo/foo1
bin/hdfs dfs -touchz $'/foo/foo1\r'
bin/hdfs dfs -ls '/foo/*'
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1^M
bin/hdfs dfs -ls '/foo/*'
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1
{code}

Whereas trunk has the right behaviour, subtly fixed via the pattern library 
change of HADOOP-12436:

{code}
bin/hdfs dfs -mkdir /foo
bin/hdfs dfs -touchz /foo/foo1
bin/hdfs dfs -touchz $'/foo/foo1\r'
bin/hdfs dfs -ls '/foo/*'
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1^M
bin/hdfs dfs -ls '/foo/*'
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1^M
{code}

(I've placed a ^M explicitly to indicate presence of the intentional hidden 
character)

We should still add a simple test-case to cover this situation for future 
regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12970) Intermittent signature match failures in S3AFileSystem due connection closure

2016-03-28 Thread Harsh J (JIRA)
Harsh J created HADOOP-12970:


 Summary: Intermittent signature match failures in S3AFileSystem 
due connection closure
 Key: HADOOP-12970
 URL: https://issues.apache.org/jira/browse/HADOOP-12970
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.7.0
Reporter: Harsh J
Assignee: Harsh J


S3AFileSystem's use of {{ObjectMetadata#clone()}} method inside the 
{{copyFile}} implementation may fail in circumstances where the connection used 
for obtaining the metadata is closed by the server (i.e. response carries a 
{{Connection: close}} header). Due to this header not being stripped away when 
the {{ObjectMetadata}} is created, and due to us cloning it for use in the next 
{{CopyObjectRequest}}, it causes the request to use {{Connection: close}} 
headers as a part of itself.

This causes signer related exceptions because the client now includes the 
{{Connection}} header as part of the {{SignedHeaders}}, but the S3 server does 
not receive the same value for it ({{Connection}} headers are likely stripped 
away before the S3 Server tries to match signature hashes), causing a failure 
like below:

{code}
2016-03-29 19:59:30,120 DEBUG [s3a-transfer-shared--pool1-t35] 
org.apache.http.wire: >> "Authorization: AWS4-HMAC-SHA256 
Credential=XXX/20160329/eu-central-1/s3/aws4_request, 
SignedHeaders=accept-ranges;connection;content-length;content-type;etag;host;last-modified;user-agent;x-amz-acl;x-amz-content-sha256;x-amz-copy-source;x-amz-date;x-amz-metadata-directive;x-amz-server-side-encryption;x-amz-version-id,
 Signature=MNOPQRSTUVWXYZ[\r][\n]"
…
com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we 
calculated does not match the signature you provided. Check your key and 
signing method. (Service: Amazon S3; Status Code: 403; Error Code: 
SignatureDoesNotMatch; Request ID: ABC), S3 Extended Request ID: XYZ
{code}

This is intermittent because the S3 Server does not always add a {{Connection: 
close}} directive in its response, but whenever we receive it AND we clone it, 
the above exception would happen for the copy request. The copy request is 
often used in the context of FileOutputCommitter, when a lot of the MR attempt 
files on {{s3a://}} destination filesystem are to be moved to their parent 
directories post-commit.

I've also submitted a fix upstream with AWS Java SDK to strip out the 
{{Connection}} headers when dealing with {{ObjectMetadata}}, which is pending 
acceptance and release at: https://github.com/aws/aws-sdk-java/pull/669, but 
until that release is available and can be used by us, we'll need to workaround 
the clone approach by manually excluding the {{Connection}} header (not 
straight-forward due to the {{metadata}} object being private with no mutable 
access). We can remove such a change in future when there's a release available 
with the upstream fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12894) Add yarn.app.mapreduce.am.log.level to mapred-default.xml

2016-03-05 Thread Harsh J (JIRA)
Harsh J created HADOOP-12894:


 Summary: Add yarn.app.mapreduce.am.log.level to mapred-default.xml
 Key: HADOOP-12894
 URL: https://issues.apache.org/jira/browse/HADOOP-12894
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.9.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12549) Extend HDFS-7456 default generically to all pattern lookups

2015-11-03 Thread Harsh J (JIRA)
Harsh J created HADOOP-12549:


 Summary: Extend HDFS-7456 default generically to all pattern 
lookups
 Key: HADOOP-12549
 URL: https://issues.apache.org/jira/browse/HADOOP-12549
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc, security
Affects Versions: 2.7.1
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


In HDFS-7546 we added a hdfs-default.xml property to bring back the regular 
behaviour of trusting all principals (as was the case before HADOOP-9789). 
However, the change only targeted HDFS users and also only those that used the 
default-loading mechanism of Configuration class (i.e. not {{new 
Configuration(false)}} users).

I'd like to propose adding the same default to the generic RPC client code 
also, so the default affects all form of clients equally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-9461) JobTracker and NameNode both grant delegation tokens to non-secure clients

2015-03-25 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9461.
-
Resolution: Won't Fix

Not an issue on trunk/branch-2.

> JobTracker and NameNode both grant delegation tokens to non-secure clients
> --
>
> Key: HADOOP-9461
> URL: https://issues.apache.org/jira/browse/HADOOP-9461
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
>
> If one looks at the MAPREDUCE-1516 added logic in JobTracker.java's 
> isAllowedDelegationTokenOp() method, and apply non-secure states of 
> UGI.isSecurityEnabled == false and authMethod == SIMPLE, the return result is 
> true when the intention is false (due to the shorted conditionals).
> This is allowing non-secure JobClients to easily request and use 
> DelegationTokens and cause unwanted errors to be printed in the JobTracker 
> when the renewer attempts to run. Ideally such clients ought to get an error 
> if they request a DT in non-secure mode.
> HDFS in trunk and branch-1 both too have the same problem. Trunk MR 
> (HistoryServer) and YARN are however, unaffected due to a simpler, inlined 
> logic instead of reuse of this faulty method.
> Note that fixing this will break Oozie today, due to the merged logic of 
> OOZIE-734. Oozie will require a fix as well if this is to be fixed in 
> branch-1. As a result, I'm going to mark this as an Incompatible Change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11512) Use getTrimmedStrings when reading serialization keys

2015-01-27 Thread Harsh J (JIRA)
Harsh J created HADOOP-11512:


 Summary: Use getTrimmedStrings when reading serialization keys
 Key: HADOOP-11512
 URL: https://issues.apache.org/jira/browse/HADOOP-11512
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.6.0
Reporter: Harsh J
Priority: Minor


In the file 
{{hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/serializer/SerializationFactory.java}},
 we grab the IO_SERIALIZATIONS_KEY config as Configuration#getStrings(…) which 
does not trim the input. This could cause confusing user issues if someone 
manually overrides the key in the XML files/Configuration object without using 
the dynamic approach.

The call should instead use Configuration#getTrimmedStrings(…), so the 
whitespace is trimmed before the class names are searched on the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11488) Difference in default connection timeout for S3A FS

2015-01-18 Thread Harsh J (JIRA)
Harsh J created HADOOP-11488:


 Summary: Difference in default connection timeout for S3A FS
 Key: HADOOP-11488
 URL: https://issues.apache.org/jira/browse/HADOOP-11488
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Harsh J
Priority: Minor


The core-default.xml defines fs.s3a.connection.timeout as 5000, and the code 
under hadoop-tools/hadoop-aws defines it as 5.

We should update the former to 50s so it gets taken properly, as we're also 
noticing that 5s is often too low, especially in cases such as large DistCp 
operations (which fail with {{Read timed out}} errors from the S3 service).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: a friendly suggestion for developers when uploading patches

2014-12-04 Thread Harsh J
I've added you in as YongjunZhang. Please let me know if you are still
unable to edit after a relogin.

On Wed, Dec 3, 2014 at 1:43 AM, Yongjun Zhang  wrote:
> Thanks Allen, Andrew and Tsuyoshi.
>
> My wiki user name is YongjunZhang, I will appreciate it very much if
> someone can give me the permission to edit the wiki pages. Thanks.
>
> --Yongjun
>
> On Tue, Dec 2, 2014 at 11:04 AM, Andrew Wang 
> wrote:
>
>> I just updated the wiki to say that the version number format is preferred.
>> Yongjun, if you email out your wiki username, someone (?) can give you
>> privs.
>>
>> On Tue, Dec 2, 2014 at 10:16 AM, Allen Wittenauer 
>> wrote:
>>
>> > I think people forget we have a wiki that documents this and other things
>> > ...
>> >
>> > https://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch
>> >
>> > On Dec 2, 2014, at 10:01 AM, Tsuyoshi OZAWA 
>> > wrote:
>> >
>> > >> .[branchName.].patch*
>> > >
>> > > +1 for this format. Thanks for starting the discussion, Yongjun.
>> > >
>> > > - Tsuyoshi
>> > >
>> > > On Tue, Dec 2, 2014 at 9:34 AM, Yongjun Zhang 
>> > wrote:
>> > >> Thank you all for the feedback.
>> > >>
>> > >> About how many digits to use, I personally find it's not annoying to
>> > type
>> > >> one extra digit, but as long as we have the rev number, it achieves
>> the
>> > >> goal of identifying individual patch.
>> > >>
>> > >> About the rest of the name, as long as we keep it the same for the
>> same
>> > >> patch, it would work fine.
>> > >>
>> > >> This boils down to patch naming guideline:
>> > >>
>> > >> *.[branchName.].patch*
>> > >>
>> > >> - Example jiraNameId: HADOOP-1234, HDFS-4321
>> > >> - When the patch is targeted for trunk, then there is no need for
>> > the
>> > >> branchName portion, otherwise, specify the branchName accordingly.
>> > Example:
>> > >> branch1, branch2.
>> > >> - It's recommended to use three digits for  for
>> better
>> > >> sorting of different versions of patches.
>> > >>
>> > >> Would anyone who has the privilege please help to modify the following
>> > page
>> > >>
>> > >> http://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch
>> > >>
>> > >> accordingly?
>> > >>
>> > >> Thanks a lot.
>> > >>
>> > >> --Yongjun
>> > >>
>> > >> On Mon, Dec 1, 2014 at 10:22 AM, Colin McCabe > >
>> > >> wrote:
>> > >>
>> > >>> On Wed, Nov 26, 2014 at 2:58 PM, Karthik Kambatla <
>> ka...@cloudera.com>
>> > >>> wrote:
>> > >>>
>> > >>>> Yongjun, thanks for starting this thread. I personally like Steve's
>> > >>>> suggestions, but think two digits should be enough.
>> > >>>>
>> > >>>> I propose we limit the restrictions to versioning the patches with
>> > >>> version
>> > >>>> numbers and .patch extension. People have their own preferences for
>> > the
>> > >>>> rest of the name (e.g. MAPREDUCE, MapReduce, MR, mr, mapred) and I
>> > don't
>> > >>>> see a gain in forcing everyone to use one.
>> > >>>>
>> > >>>> Putting the suggestions (tight and loose) on the wiki would help new
>> > >>>> contributors as well.
>> > >>>>
>> > >>>>
>> > >>> +1
>> > >>>
>> > >>> best,
>> > >>> Colin
>> > >>>
>> > >>>
>> > >>>> On Wed, Nov 26, 2014 at 2:43 PM, Eric Payne
>> > >>> > > >>>>>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> +1.The "different color for newest patch" doesn't work very well if
>> > you
>> > >>>>> are color blind, so I do appreciate a revision number in the name.
>> > >>>>>
>> > >>>>>  From: Yongjun Zhang 
>> > >>>>> To: common-dev@hadoop.apache.

Re: a friendly suggestion for developers when uploading patches

2014-11-25 Thread Harsh J
For the same filename, you can observe also that the JIRA colors the
latest one to be different than the older ones automatically - this is
what I rely on.

On Sat, Nov 22, 2014 at 12:36 AM, Yongjun Zhang  wrote:
> Hi,
>
> When I look at patches uploaded to jiras, from time to time I notice that
> different revisions of the patch is uploaded with the same patch file name,
> some time for quite a few times. It's confusing which is which.
>
> I'd suggest that as a guideline, we do the following when uploading a patch:
>
>- include a revision number in the patch file name.A
>- include a comment, stating that a new patch is uploaded, including the
>revision number of the patch in the comment.
>
> This way, it's easier to refer to a specific version of a patch, and to
> know which patch a comment is made about.
>
> Hope that makes sense to you.
>
> Thanks.
>
> --Yongjun



-- 
Harsh J


[jira] [Created] (HADOOP-11224) Improve error messages for all permission related failures

2014-10-23 Thread Harsh J (JIRA)
Harsh J created HADOOP-11224:


 Summary: Improve error messages for all permission related failures
 Key: HADOOP-11224
 URL: https://issues.apache.org/jira/browse/HADOOP-11224
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.2.0
Reporter: Harsh J
Priority: Trivial


If a bad file create request fails, you get a juicy error self-describing the 
reason almost:

{code}Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 Permission denied: user=root, access=WRITE, 
inode="/":hdfs:supergroup:drwxr-xr-x{code}

However, if a setPermission fails, one only gets a vague:

{code}Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 Permission denied{code}

It would be nicer if all forms of permission failures logged the accessed inode 
and current ownership and permissions in the same way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-8719) Workaround for kerberos-related log errors upon running any hadoop command on OSX

2014-07-10 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8719.
-

Resolution: Fixed

When this was committed, OSX was not a targeted platform for security or native 
support. If that has changed recently, lets revert this fix over a new JIRA - I 
see no issues with doing that. The fix here merely got rid of a verbose warning 
appearing unnecessarily over unsecured pseudo-distributed clusters running on 
OSX.

Re-resolving. Thanks!

> Workaround for kerberos-related log errors upon running any hadoop command on 
> OSX
> -
>
> Key: HADOOP-8719
> URL: https://issues.apache.org/jira/browse/HADOOP-8719
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
> Environment: Mac OS X 10.7, Java 1.6.0_26
>Reporter: Jianbin Wei
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HADOOP-8719.patch, HADOOP-8719.patch, HADOOP-8719.patch, 
> HADOOP-8719.patch
>
>
> When starting Hadoop on OS X 10.7 ("Lion") using start-all.sh, Hadoop logs 
> the following errors:
> 2011-07-28 11:45:31.469 java[77427:1a03] Unable to load realm info from 
> SCDynamicStore
> Hadoop does seem to function properly despite this.
> The workaround takes only 10 minutes.
> There are numerous discussions about this:
> google "Unable to load realm mapping info from SCDynamicStore" returns 1770 
> hits.  Each one has many discussions.  
> Assume each discussion take only 5 minute, a 10-minute fix can save ~150 
> hours.  This does not count much search of this issue and its 
> solution/workaround, which can easily hit (wasted) thousands of hours!!!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Edit permission to Hadoop Wiki page

2014-06-17 Thread Harsh J
Hi,

You should be able to edit pages on the wiki.apache.org/hadoop wiki as
your username's in there (thanks Steve!). Are you unable to? Let us
know.

On Tue, Jun 17, 2014 at 1:55 AM, Asokan, M  wrote:
> I would like to update the page 
> http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support with 
> my company's Hadoop related offerings.
>
> My Wiki user id is: masokan
>
> Can someone point out how I can get edit permission?
>
> Thanks in advance.
>
> -- Asokan
>
>
>
> 
>
>
>
> ATTENTION: -
>
> The information contained in this message (including any files transmitted 
> with this message) may contain proprietary, trade secret or other 
> confidential and/or legally privileged information. Any pricing information 
> contained in this message or in any files transmitted with this message is 
> always confidential and cannot be shared with any third parties without prior 
> written approval from Syncsort. This message is intended to be read only by 
> the individual or entity to whom it is addressed or by their designee. If the 
> reader of this message is not the intended recipient, you are on notice that 
> any use, disclosure, copying or distribution of this message, in any form, is 
> strictly prohibited. If you have received this message in error, please 
> immediately notify the sender and/or Syncsort and destroy all copies of this 
> message in your possession, custody or control.



-- 
Harsh J


[jira] [Resolved] (HADOOP-10707) support bzip2 in python avro tool

2014-06-17 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-10707.
--

Resolution: Invalid

Moved to AVRO-1527

> support bzip2 in python avro tool
> -
>
> Key: HADOOP-10707
> URL: https://issues.apache.org/jira/browse/HADOOP-10707
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools
>Reporter: Eustache
>Priority: Minor
>  Labels: avro
>
> The Python tool to decode avro files is currently missing support for bzip2 
> compression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: I couldn't assign ticket to myself. Can someone add me to the contributors list.

2014-05-04 Thread Harsh J
Done and done. Looking forward to your contribution!

On Mon, May 5, 2014 at 12:33 AM, Anandha L Ranganathan
 wrote:
> Can someone add me to the contributors list so that I can I want to assign
> the ticket to myself.
>
> https://issues.apache.org/jira/browse/YARN-1918
>
> Thanks
> Anand



-- 
Harsh J


[jira] [Created] (HADOOP-10572) Example NFS mount command must pass noacl as it isn't supported by the server yet

2014-05-03 Thread Harsh J (JIRA)
Harsh J created HADOOP-10572:


 Summary: Example NFS mount command must pass noacl as it isn't 
supported by the server yet
 Key: HADOOP-10572
 URL: https://issues.apache.org/jira/browse/HADOOP-10572
 Project: Hadoop Common
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.4.0
Reporter: Harsh J
Priority: Trivial


Use of the documented default mount command results in the below server side 
log WARN event, cause the client tries to locate the ACL program (#100227):

{code}
12:26:11.975 AM TRACE   org.apache.hadoop.oncrpc.RpcCall
Xid:-1114380537, messageType:RPC_CALL, rpcVersion:2, program:100227, version:3, 
procedure:0, credential:(AuthFlavor:AUTH_NONE), verifier:(AuthFlavor:AUTH_NONE)
12:26:11.976 AM TRACE   org.apache.hadoop.oncrpc.RpcProgram 
NFS3 procedure #0
12:26:11.976 AM WARNorg.apache.hadoop.oncrpc.RpcProgram 
Invalid RPC call program 100227
{code}

The client mount command must pass {{noacl}} to avoid this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: To be able to edit Hadoop distributions and commercial support page

2014-04-27 Thread Harsh J
Hello Amol,

Certainly - What is your wiki username, so we may add you to the can-edit
list?


On Thu, Apr 24, 2014 at 1:57 AM, Amol Kekre  wrote:

>
> Can someone give me edit writes to the following page
> https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support
>
> DataTorrent is a Hadoop native platform and I want to maintain our blurb
> on it.
>
> Thanks,
> Amol
>
>


-- 
Harsh J


Re: Wiki Edit Permission

2014-04-27 Thread Harsh J
User "Zhijie Shen" has been added to contributors group on wiki. Let us
know if you still face issues!


On Sat, Apr 26, 2014 at 12:20 AM, Zhijie Shen  wrote:

> To whom it may concern,
>
> would you mind granting me Wiki edit permission? My username is "Zhijie
> Shen".
>
> Thanks,
> Zhijie
>
> --
> Zhijie Shen
> Hortonworks Inc.
> http://hortonworks.com/
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Harsh J


Re: Hadoop v1.8 data transfer protocol

2014-04-06 Thread Harsh J
There's been no Apache Hadoop release versioned v1.8 historically, nor
is one upcoming. Do you mean 0.18?

Either way, can you point to the specific code lines in BlockSender
which have you confused? The sendBlock and sendPacket methods would
interest you I assume, but they appear to be well constructed/named
internally and commented in a few important spots.

On Mon, Apr 7, 2014 at 6:39 AM, Dhaivat Pandya  wrote:
> Hi,
>
> I'm trying to figure out how data is transferred between client and
> DataNode in Hadoop v1.8.
>
> This is my understanding so far:
>
> The client first fires an OP_READ_BLOCK request. The DataNode responds with
> a status code, checksum header, chunk offset, packet length, sequence
> number, the last packet boolean, the length and the data (in that order).
>
> However, I'm running into an issue. First of all, which of these lengths
> describes the length of the data? I tried both PacketLength and Length it
> seems that they leave data on the stream (I tried to "cat" a file with the
> numbers 1-1000 in it).
>
> Also, how does the DataNode signal the start of another packet? After
> "Length" number of bytes have been read, I assumed that the header would be
> repeated, but this is not the case (I'm not getting sane values for any of
> the fields of the header).
>
> I've looked through the DataXceiver, BlockSender, DFSClient
> (RemoteBlockReader) classes but I still can't quite grasp how this data
> transfer is conducted.
>
> Any help would be appreciated,
>
> Dhaivat Pandya



-- 
Harsh J


[jira] [Resolved] (HADOOP-6287) Support units for configuration knobs

2014-03-05 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-6287.
-

Resolution: Duplicate

Resolving as duplicate (duplicate JIRAs linked).

> Support units for configuration knobs
> -
>
> Key: HADOOP-6287
> URL: https://issues.apache.org/jira/browse/HADOOP-6287
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Reporter: Arun C Murthy
> Attachments: hadoop-6287-1.patch
>
>
> We should add support for units in our Configuration system so that we can 
> specify values to be *1GB* or *1MB* rather than forcing every component which 
> consumes such values to be aware of these.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Permission to edit wiki?

2014-02-19 Thread Harsh J
Moving to common-dev@, right place to request wiki edit access.

I've added you into the contributor's group. Let us know here if you
continue to face issues!


On Tue, Feb 18, 2014 at 11:55 PM, Ben Connors wrote:

> Hi,
>
> I want to make a contribution to
> https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support
>
> I am a registered user, but I get this error message that I do not have
> permission to edit the page.  How can I get permission?
>
>
>
> __
>
>
>
> *Ben Connors* | Worldwide Head of Alliances
>
>
> *Jaspersoft Corporation*
> ► *Office: *+1 415 348 2391 
> ► *Mobile:* +1 650 400 3374 
>
> ► *Email: *bconn...@jaspersoft.com
> <http://login.jsp?at=80d7ff41-ba2d-e139-304b-a8bc5c10b4da&mailto=kvandenbe...@jaspersoft.com>
>  
>
> ► *Twitter: 
> *@benconnors<http://login.jsp?at=80d7ff41-ba2d-e139-304b-a8bc5c10b4da&mailto=kvandenbe...@jaspersoft.com>
> <http://login.jsp?at=80d7ff41-ba2d-e139-304b-a8bc5c10b4da&mailto=kvandenbe...@jaspersoft.com>
> ► *Website:* http://www.jaspersoft.com  Community*:*
> http://community.jaspersoft.com <http://www.jasperforge.org/>
> __
>
> JASPERSOFT - THE INTELLIGENCE INSIDE APPS AND BUSINESS PROCESSES
> ______
>  <http://bit.ly/Xo6erX>
>
> Winner, InfoWorld Technology of the Year. See Why> <http://bit.ly/Xo6erX>
>  <http://bit.ly/Xo6erX>
>
>


-- 
Harsh J


Re: Wiki Editing

2014-02-15 Thread Harsh J
Hey Steve,

(-user@)

Sorry on the delay, missed this one. You should be all set now - do
report problems here, if any. Thanks!

On Sat, Feb 15, 2014 at 2:47 PM, Steve Kallestad  wrote:
> Quick note.  As of yet, I have not received write permissions on the Hadoop
> Wiki.
>
>
>
> My login name is SteveKallestad.
>
>
>
> I appreciate any help getting started.
>
>
>
> Thanks,
>
> Steve
>
>
>
> From: Arpit Agarwal [mailto:aagar...@hortonworks.com]
> Sent: Tuesday, February 11, 2014 10:37 AM
> To: common-dev@hadoop.apache.org
> Subject: Re: Wiki Editing
>
>
>
> +common-dev, bcc user
>
>
>
> Hi Steve,
>
>
>
> I'm wondering if someone wouldn't mind adding my user to the list so I can
> add my (small) contribution to the project.
>
>
>
> A wiki admin should be able to do this for you (a few of them are on this
> mailing list). Feel free to send a reminder to the list if no one has added
> you in a day or so.
>
>
>
> Additionally, I'd like to help update the maven site documentation to add
> some clarity, but I know I'll have to look into how to get going on that
> side of the street.  Correct me if I'm wrong, but the process there would be
> to submit bugs with a patch into Jira, and there is probably a utility
> somewhere that I can run which will ensure that whatever changes I propose
> meet the project standards.
>
>
>
> Documentation patches are always welcome. There is a test-patch.sh script in
> the source tree which can be used to validate your patch.
>
>
>
> Alternatively if you generate your patch against trunk you can cheat and
> click 'Submit Patch' in the Jira to have Jenkins validate the patch for you.
> To build and stage the site locally you can run something like "mvn
> site:stage -DstagingDirectory=/tmp/myhadoopsite". This is useful to manually
> verify the formatting looks as expected.
>
>
>
> Thanks,
>
> Arpit
>
>
>
> On Tue, Feb 11, 2014 at 6:01 AM, One Box  wrote:
>
> I wanted to contribute to the Wiki tonight, but once I created an account it
> shows that all of the pages are immutable.
>
>
>
> I never did receive an email confirmation, but it did allow me to log in.
>
>
>
> After reading through some of the help documentation, I saw that with some
> ASF projects you have to be added to a list of Wiki Editors manually in
> order to prevent spam.
>
>
>
> I'm wondering if someone wouldn't mind adding my user to the list so I can
> add my (small) contribution to the project.
>
>
>
> My login name is SteveKallestad.
>
>
>
> There is a page that spells out instructions for building from source on
> Windows.  I struggled a bit building  on Ubuntu.  I documented the process
> and I'd like to add it.
>
>
>
> Additionally, I'd like to help update the maven site documentation to add
> some clarity, but I know I'll have to look into how to get going on that
> side of the street.  Correct me if I'm wrong, but the process there would be
> to submit bugs with a patch into Jira, and there is probably a utility
> somewhere that I can run which will ensure that whatever changes I propose
> meet the project standards.
>
>
>
> Any help to get me going is appreciated.
>
>
>
> Thanks,
>
> Steve
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.



-- 
Harsh J


Re: write permissions

2014-02-13 Thread Harsh J
Sorry on delay - done. Can you retry now to see if you're able to? Let us
know if you face any issues.


On Fri, Feb 14, 2014 at 12:53 AM, Kevin Wincott  wrote:

>  please?
>
> On 04/02/14 12:25, Kevin Wincott wrote:
>
>  Hello
>
>
>
> Please can I have write permissions on the wiki for user kevinwincott so
> that I may add us to the Hadoop Users list
>
>
>
> Kevin Wincott
>
> *Data Architect*
>
> T: 0800 471 4701
>
> www.sthenica.com
>
> [image: sthenica logo]
>
>
>


-- 
Harsh J


Re: Datanode registration, port number

2013-12-23 Thread Harsh J
Hi,

On Mon, Dec 23, 2013 at 9:41 AM, Dhaivat Pandya  wrote:
> Hi,
>
> I'm currently trying to build a cache layer that should sit "on top" of the
> datanode. Essentially, the namenode should know the port number of the
> cache layer instead of that of the datanode (since the namenode then relays
> this information to the default HDFS client). All of the communication
> between the datanode and the namenode currently flows through my cache
> layer (including heartbeats, etc.)

Curious Q: What does your cache layer aim to do btw? If its a data
cache, have you checked out the design being implemented currently by
https://issues.apache.org/jira/browse/HDFS-4949?

> *First question*: is there a way to tell the namenode where a datanode
> should be? Any way to trick it into thinking that the datanode is on a port
> number where it actually isn't? As far as I can tell, the port number is
> obtained from the DatanodeId object; can this be set in the configuration
> so that the port number derived is that of the cache layer?

The NN receives a DN host and port from the DN directly. The DN sends
it whatever its running on. See
https://github.com/apache/hadoop-common/blob/release-2.2.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L690

> I spent quite a bit of time on the above question and I could not find any
> sort of configuration option that would let me do that. So, I delved into
> the HDFS source code and tracked down the DatanodeRegistration class.
> However, I can't seem to find out *how* the NameNode figures out the
> Datanode's port number or if I could somehow change the packets to reflect
> the port number of cache layer?

See 
https://github.com/apache/hadoop-common/blob/release-2.2.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L690
(as above) for how the DN emits it. And no, IMO, that ("packet
changes") is not the right way to go about it if you're planning an
overhaul. Its easier and more supportable to make proper code changes
instead.

> *Second question: *how does the namenode
> figure out a newly-registered Datanode's port number?

Same as before. Registration sends the service addresses (so NN may
use them for sending to clients), beyond which the DN's heartbeats are
mere client-like connections to the NN, carried out on regular
ephemeral ports.

-- 
Harsh J


Re: How can I get FSNamesystem of running NameNode in cluster?

2013-12-09 Thread Harsh J
Hi Yoonmin,

Yes, your conclusions here are correct. The FSNamesystem is an object
internal to the NameNode server runtime.

On Mon, Dec 9, 2013 at 8:49 PM, Yoonmin Nam  wrote:
> Oh, I see. However a minicluster cannot replace the namenode, right?
> I knew that the minicluster is for testing components of hadoop.
>
> Then, the only way of implementing some features which use namenode or
> datanode is just in internal of namenode or datanode.
> Am I right?
>
> Thanks!
>
> -Original Message-
> From: Daryn Sharp [mailto:da...@yahoo-inc.com]
> Sent: Monday, December 09, 2013 11:42 PM
> To: 
> Subject: Re: How can I get FSNamesystem of running NameNode in cluster?
>
> Are you adding something internal to the NN?  If not, you cannot get the
> namesystem instance via a client unless you are using a minicluster object.
>
> Daryn
>
> On Dec 9, 2013, at 7:11 AM, Yoonmin Nam  wrote:
>
>> I want to get a running instance of FSNamesystem of HDFS. However, it
>> is somewhat complicated than I expected.
>>
>> If I can get NameNode instance of running cluster, then it can be
>> solved because there is a method "getNamespace()".
>>
>> Is there anyone who know about this stuff?
>>
>> I thought that using Servlet stuff is not normal way to do this
>> because my program is not web-application.
>>
>> Thanks!
>>
>>
>>
>
>
>
>
>
>



-- 
Harsh J


[jira] [Resolved] (HADOOP-10002) Tool's config option wouldn't work on secure clusters

2013-09-28 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-10002.
--

   Resolution: Duplicate
Fix Version/s: 2.0.3-alpha

Sorry about the noise. This should be fixed by HADOOP-9021 - turns out I wasn't 
looking at the right 2.0.x sources when debugging this.

> Tool's config option wouldn't work on secure clusters
> -
>
> Key: HADOOP-10002
> URL: https://issues.apache.org/jira/browse/HADOOP-10002
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security, util
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Priority: Minor
> Fix For: 2.0.3-alpha
>
>
> The Tool framework provides a way for clients to run without classpath 
> *-site.xml configs, by letting users pass "-conf " to parse into the 
> app's Configuration object.
> In a secure cluster config setup, such a runner will not work cause of 
> UserGroupInformation.isSecurityEnabled() check, which is used in Server.java 
> to determine what form of communication to use, will load statically a {{new 
> Configuration()}} object to inspect if security is turned on during its 
> initialization, which ignores the application config object and tries to load 
> from classpath and ends up loading non-secure defaults.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HADOOP-10002) Tool's config option wouldn't work on secure clusters

2013-09-27 Thread Harsh J (JIRA)
Harsh J created HADOOP-10002:


 Summary: Tool's config option wouldn't work on secure clusters
 Key: HADOOP-10002
 URL: https://issues.apache.org/jira/browse/HADOOP-10002
 Project: Hadoop Common
  Issue Type: Bug
  Components: security, util
Affects Versions: 2.0.6-alpha
Reporter: Harsh J
Priority: Minor


The Tool framework provides a way for clients to run without classpath 
*-site.xml configs, by letting users pass "-conf " to parse into the 
app's Configuration object.

In a secure cluster config setup, such a runner will not work cause of 
UserGroupInformation.isSecurityEnabled() check, which is used in Server.java to 
determine what form of communication to use, will load statically a {{new 
Configuration()}} object to inspect if security is turned on during its 
initialization, which ignores the application config object and tries to load 
from classpath and ends up loading non-secure defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: question about when do resource matching in YARN

2013-09-24 Thread Harsh J
Yes, but the heartbeat coupling isn't necessary I think. One could
even use ZK write/watch approach for faster assignment of regular
work?

On Tue, Sep 24, 2013 at 2:24 PM, Steve Loughran  wrote:
> On 21 September 2013 09:19, Sandy Ryza  wrote:
>
>> I don't believe there is any reason scheduling decisions need to be coupled
>> with NodeManager heartbeats.  It doesn't sidestep any race conditions
>> because a NodeManager could die immediately after heartbeating.
>>
>>
> historically its been done for scale: you don't need the JT reaching out to
> 4K TT's just to give them work to do, instead let them connect in anyway
> and get work that way. And once they start reporting in completion then
> they can get given more work. It's very biased towards "worker nodes talk
> to the master" over "master approaches workers"
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.



-- 
Harsh J


Re: issue of building with native

2013-09-18 Thread Harsh J
/velocity/velocity/1.5/velocity-1.5.jar
> [ERROR] urls[16] = file:/home/hhf/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar
> [ERROR] urls[17] = 
> file:/home/hhf/.m2/repository/velocity/velocity/1.5/velocity-1.5.jar
> [ERROR] urls[18] = 
> file:/home/hhf/.m2/repository/org/sonatype/aether/aether-util/1.7/aether-util-1.7.jar
> [ERROR] urls[19] = 
> file:/home/hhf/.m2/repository/org/sonatype/sisu/sisu-inject-bean/1.4.2/sisu-inject-bean-1.4.2.jar
> [ERROR] urls[20] = 
> file:/home/hhf/.m2/repository/org/sonatype/sisu/sisu-guice/2.1.7/sisu-guice-2.1.7-noaop.jar
> [ERROR] urls[21] = 
> file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.14/plexus-interpolation-1.14.jar
> [ERROR] urls[22] = 
> file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-component-annotations/1.5.5/plexus-component-annotations-1.5.5.jar
> [ERROR] urls[23] = 
> file:/home/hhf/.m2/repository/org/sonatype/plexus/plexus-sec-dispatcher/1.3/plexus-sec-dispatcher-1.3.jar
> [ERROR] urls[24] = 
> file:/home/hhf/.m2/repository/org/sonatype/plexus/plexus-cipher/1.4/plexus-cipher-1.4.jar
> [ERROR] urls[25] = 
> file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.2/doxia-sink-api-1.2.jar
> [ERROR] urls[26] = 
> file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.2/doxia-logging-api-1.2.jar
> [ERROR] urls[27] = 
> file:/home/hhf/.m2/repository/junit/junit/3.8.1/junit-3.8.1.jar
> [ERROR] urls[28] = 
> file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-core/1.2/doxia-core-1.2.jar
> [ERROR] urls[29] = 
> file:/home/hhf/.m2/repository/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar
> [ERROR] urls[30] = 
> file:/home/hhf/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar
> [ERROR] urls[31] = 
> file:/home/hhf/.m2/repository/org/apache/httpcomponents/httpclient/4.0.2/httpclient-4.0.2.jar
> [ERROR] urls[32] = 
> file:/home/hhf/.m2/repository/org/apache/httpcomponents/httpcore/4.0.1/httpcore-4.0.1.jar
> [ERROR] urls[33] = 
> file:/home/hhf/.m2/repository/commons-codec/commons-codec/1.3/commons-codec-1.3.jar
> [ERROR] urls[34] = 
> file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-module-xhtml/1.2/doxia-module-xhtml-1.2.jar
> [ERROR] urls[35] = 
> file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-module-apt/1.2/doxia-module-apt-1.2.jar
> [ERROR] urls[36] = 
> file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-module-xdoc/1.2/doxia-module-xdoc-1.2.jar
> [ERROR] urls[37] = 
> file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-module-fml/1.2/doxia-module-fml-1.2.jar
> [ERROR] urls[38] = 
> file:/home/hhf/.m2/repository/javax/servlet/servlet-api/2.5/servlet-api-2.5.jar
> [ERROR] urls[39] = 
> file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-decoration-model/1.2/doxia-decoration-model-1.2.jar
> [ERROR] urls[40] = 
> file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-site-renderer/1.2/doxia-site-renderer-1.2.jar
> [ERROR] urls[41] = 
> file:/home/hhf/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar
> [ERROR] urls[42] = 
> file:/home/hhf/.m2/repository/org/apache/maven/shared/maven-doxia-tools/1.4/maven-doxia-tools-1.4.jar
> [ERROR] urls[43] = 
> file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-archiver/1.0/plexus-archiver-1.0.jar
> [ERROR] urls[44] = 
> file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-io/1.0/plexus-io-1.0.jar
> [ERROR] urls[45] = 
> file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-i18n/1.0-beta-7/plexus-i18n-1.0-beta-7.jar
> [ERROR] urls[46] = 
> file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-velocity/1.1.8/plexus-velocity-1.1.8.jar
> [ERROR] urls[47] = 
> file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-utils/1.5.10/plexus-utils-1.5.10.jar
> [ERROR] urls[48] = 
> file:/home/hhf/.m2/repository/org/mortbay/jetty/jetty/6.1.25/jetty-6.1.25.jar
> [ERROR] urls[49] = 
> file:/home/hhf/.m2/repository/org/mortbay/jetty/servlet-api/2.5-20081211/servlet-api-2.5-20081211.jar
> [ERROR] urls[50] = 
> file:/home/hhf/.m2/repository/org/mortbay/jetty/jetty-util/6.1.25/jetty-util-6.1.25.jar
> [ERROR] urls[51] = 
> file:/home/hhf/.m2/repository/commons-lang/commons-lang/2.5/commons-lang-2.5.jar
> [ERROR] urls[52] = 
> file:/home/hhf/.m2/repository/commons-io/commons-io/1.4/commons-io-1.4.jar
> [ERROR] Number of foreign imports: 1
> [ERROR] import: Entry[import  from realm ClassRealm[maven.api, parent: null]]
> [ERROR]
> [ERROR] -: 
> org.sonatype.aether.graph.DependencyFilter



-- 
Harsh J


[jira] [Resolved] (HADOOP-9878) getting rid of all the 'bin/../' from all the paths

2013-08-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9878.
-

Resolution: Duplicate

> getting rid of all the 'bin/../' from all the paths
> ---
>
> Key: HADOOP-9878
> URL: https://issues.apache.org/jira/browse/HADOOP-9878
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Reporter: kaveh minooie
>Priority: Trivial
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> by simply replacing line 34 of libexec/hadoop-config.sh from:
> {quote}
> export HADOOP_PREFIX=`dirname "$this"`/..
> {quote}
> to 
> {quote}
> export HADOOP_PREFIX=$( cd "$config_bin/.."; pwd -P )
> {quote}
> we can eliminate all the annoying 'bin/../' from the library paths and make 
> the output of commands like ps a lot more readable. not to mention that OS  
> would do just a bit less work as well. I can post a patch for it as well if 
> it is needed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HADOOP-9878) getting rid of all the 'bin/../' from all the paths

2013-08-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HADOOP-9878:
-


> getting rid of all the 'bin/../' from all the paths
> ---
>
> Key: HADOOP-9878
> URL: https://issues.apache.org/jira/browse/HADOOP-9878
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Reporter: kaveh minooie
>Priority: Trivial
> Fix For: 2.1.0-beta
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> by simply replacing line 34 of libexec/hadoop-config.sh from:
> {quote}
> export HADOOP_PREFIX=`dirname "$this"`/..
> {quote}
> to 
> {quote}
> export HADOOP_PREFIX=$( cd "$config_bin/.."; pwd -P )
> {quote}
> we can eliminate all the annoying 'bin/../' from the library paths and make 
> the output of commands like ps a lot more readable. not to mention that OS  
> would do just a bit less work as well. I can post a patch for it as well if 
> it is needed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-9346) Upgrading to protoc 2.5.0 fails the build

2013-08-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9346.
-

Resolution: Duplicate

Thanks for pinging Ravi. I'd discussed with Alejandro that this could be 
closed. Looks like we added a dupe link but failed to close. Closing now.

> Upgrading to protoc 2.5.0 fails the build
> -
>
> Key: HADOOP-9346
> URL: https://issues.apache.org/jira/browse/HADOOP-9346
> Project: Hadoop Common
>  Issue Type: Task
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
>  Labels: protobuf
> Attachments: HADOOP-9346.patch
>
>
> Reported over the impala lists, one of the errors received is:
> {code}
> src/hadoop-common-project/hadoop-common/target/generated-sources/java/org/apache/hadoop/ha/proto/ZKFCProtocolProtos.java:[104,37]
>  can not find symbol.
> symbol: class Parser
> location: package com.google.protobuf
> {code}
> Worth looking into as we'll eventually someday bump our protobuf deps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: protobuf upgrade on Jenkins slaves causing build failures?

2013-08-12 Thread Harsh J
Hey Chris,

Yep the protobuf version on the machine was upped. We have a parallel
ongoing discussion on the same lists under the title "Upgrade to
protobuf 2.5.0 for the 2.1.0 release" which you can follow for the
updates being done.

On Mon, Aug 12, 2013 at 11:00 PM, Chris Nauroth
 wrote:
> I'm curious if protobuf may have been upgraded to 2.5.0 on the Jenkins
> slaves, ahead of committing the Hadoop code's dependency upgrade to 2.5.0.
>  We've started to see build failures due to cannot find symbol
> com.google.protobuf.Parser.  This is the earliest example I could find,
> which happened 8/11 10:31 AM:
>
> https://builds.apache.org/job/Hadoop-Yarn-trunk/298/
>
> The Parser class does not exist in 2.4.1, but it does exist in 2.5.0, which
> leads me to believe that the Jenkins machines were upgraded to start using
> a 2.5.0 protoc binary.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/



-- 
Harsh J


[jira] [Created] (HADOOP-9861) Invert ReflectionUtils' stack trace

2013-08-10 Thread Harsh J (JIRA)
Harsh J created HADOOP-9861:
---

 Summary: Invert ReflectionUtils' stack trace
 Key: HADOOP-9861
 URL: https://issues.apache.org/jira/browse/HADOOP-9861
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.0.5-alpha
Reporter: Harsh J


Often an MR task (as an example) may fail at the configure stage due to a 
misconfiguration or whatever, and the only thing a user gets by virtue of MR 
pulling limited bytes of the diagnostic error data is the top part of the 
stacktrace:

{code}
java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
{code}

This is absolutely useless to a user, and he also goes ahead and blames the 
framework for having an issue, rather than thinking (non-intuitively) to go see 
the whole task log for the full trace, especially the last part.

Hundreds of time its been a mere class thats missing, etc. but there's just too 
much pain involved here to troubleshoot.

Would be much much better, if we inverted the trace. For example, here's what 
Hive can return back if we did so, for a random trouble I pulled from the web:

{code}
java.lang.RuntimeException: Error in configuring object
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:64)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:563)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:100)
... 22 more
{code}

This way the user can at least be sure what part's really failing, and not get 
lost trying to work their way through reflection utils and upwards/downwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Problem building branch-2

2013-05-24 Thread Harsh J
You seem to be using protoc-2.5.0, which is known not to work with
Hadoop yet: https://issues.apache.org/jira/browse/HADOOP-9346

Downgrading to 2.4.1 should let you go ahead.

On Sat, May 25, 2013 at 12:21 AM, Ralph Castain  wrote:
> Hi folks
>
> I'm trying to build the head of branch-2 on a CentOS box and hitting a rash 
> of errors like the following (all from the protobuf support area):
>
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile 
> (default-compile) on project hadoop-common: Compilation failure: Compilation 
> failure:
> [ERROR] 
> /home/common/hadoop/hadoop-common/hadoop-common-project/hadoop-common/target/generated-sources/java/org/apache/hadoop/ipc/protobuf/RpcHeaderProtos.java:[278,37]
>  error: cannot find symbol
> [ERROR] symbol:   class Parser
> [ERROR] location: package com.google.protobuf
>
> Per the BUILDING.txt instructions, I was using a command line of "mvn install 
> -DskipTests" from the top level directory.
>
> Any suggestions? I assume I must have some path incorrectly set or need to 
> build the sub-projects manually in some order, but I'm unsure of the nature 
> of the problem.
>
> Thanks
> Ralph
>



--
Harsh J


Re: Non existent config file to 'fs -conf'

2013-05-23 Thread Harsh J
The "quiet" behavior sorta goes all the way back to the very first
import of Nutch into Apache Incubator:
http://svn.apache.org/viewvc?view=revision&revision=155829 and seems
to deal with being relaxed about not finding added resources other
than required defaults. The behavior has almost been the same for over
8 years now :-)

The quiet flag is code-settable, but the output it would produce is
pretty verbose. I suppose we can turn it on at the FsShell level,
while also making those "parsing…", etc. INFO level logs into
checked-for DEBUG level logs. Would that suffice?

On Fri, May 24, 2013 at 12:52 AM, Ashwin Shankar  wrote:
> Hi,
> I'm working on  
> HADOOP-9582<https://issues.apache.org/jira/browse/HADOOP-9582>  and I have a 
> question about the current implementation in hadoop-common.
> Here is a brief background about the bug : Basically if I give a non-exitent 
> file to "hadoop fs  –conf NONEXISTENT_FILE",
> the current implementation never complains.
> But looking at the code(Configuration.loadResources()) it seems that in-fact 
> we check if input file exists and we throw an exception if the 'quiet' flag 
> is false.
> Problem is the 'quiet' flag is always true.
> Can somebody explain the rationale behind this behavior ? Would we break any 
> use-case if we complain when non-exitent file is given as input?
>
> Why we want this fixed :  say the user makes a typo and gives the wrong path 
> ,the code is just going to ignore this,not complain
> and use the default conf files(if the env variables are set). This would 
> confuse the user when he finds that the configs are different from what he 
> gave as input(typo) .
> Thoughts?
>
> Thanks,
> Ashwin
>



-- 
Harsh J


Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common

2013-05-23 Thread Harsh J
I think we do a fairly good work maintaining a stable and public FileSystem
and FileContext API for third-party plugins to exist outside of Apache
Hadoop but still be able to work well across versions.

The question of test pops up though, specifically that of testing against
trunk to catch regressions across various implementations, but it'd be much
work for us to also maintain glusterfs dependencies and mechanisms as part
of trunk.

We do provide trunk build snapshot artifacts publicly for downstream
projects to test against, which I think may help cover the continuous
testing concerns, if there are those.

Right now, I don't think the S3 FS we maintain really works all that well.
I also recall, per recent conversations on the lists, that AMZN has started
shipping their own library for a better implementation rather than
perfecting the implementation we have here (correct me if am wrong but I
think the changes were not all contributed back). I see some work going on
for OpenStack's Swift, for which I think Steve also raised a similar
discussion here: http://search-hadoop.com/m/W1S5h2SrxlG, but I don't recall
if the conversation proceeded at the time.

What's your perspective as the releaser though? Would you not find
maintaining this outside easier, especially in terms of maintaining your
code for quicker releases, for both bug fixes and features - also given
that you can CI it against Apache Hadoop trunk at the same time?


On Thu, May 23, 2013 at 11:47 PM, Stephen Watt  wrote:

> (Resending - I think the first time I sent this out it got lost within all
> the ByLaws voting)
>
> Hi Folks
>
> My name is Steve Watt and I am presently working on enabling glusterfs to
> be used as a Hadoop FileSystem. Most of the work thus far has involved
> developing a Hadoop FileSystem plugin for glusterfs. I'm getting to the
> point where the plugin is becoming stable and I've been trying to
> understand where the right place is to host/manage/version it.
>
> Steve Loughran was kind enough to point out a few past threads in the
> community (such as
> http://lucene.472066.n3.nabble.com/Need-to-add-fs-shim-to-use-QFS-td4012118.html)
> that show a project disposition to move away from Hadoop Common containing
> client code (plugins) for 3rd party FileSystems. This makes sense and
> allows the filesystem plugin developer more autonomy as well as reduces
> Hadoop Common's dependence on 3rd Party libraries.
>
> Before I embark down that path, can the PMC/Committers verify that the
> preference is still to have client code for 3rd Party FileSystems hosted
> and managed outside of Hadoop Common?
>
> Regards
> Steve Watt
>



-- 
Harsh J


Re: [VOTE] Plan to create release candidate for 0.23.8

2013-05-17 Thread Harsh J
+1

On Sat, May 18, 2013 at 2:40 AM, Thomas Graves  wrote:
> Hello all,
>
> We've had a few critical issues come up in 0.23.7 that I think warrants a
> 0.23.8 release. The main one is MAPREDUCE-5211.  There are a couple of
> other issues that I want finished up and get in before we spin it.  Those
> include HDFS-3875, HDFS-4805, and HDFS-4835.  I think those are on track
> to finish up early next week.   So I hope to spin 0.23.8 soon after this
> vote completes.
>
> Please vote '+1' to approve this plan. Voting will close on Friday May
> 24th at 2:00pm PDT.
>
> Thanks,
> Tom Graves
>



-- 
Harsh J


[jira] [Created] (HADOOP-9567) Provide auto-renewal for keytab based logins

2013-05-16 Thread Harsh J (JIRA)
Harsh J created HADOOP-9567:
---

 Summary: Provide auto-renewal for keytab based logins
 Key: HADOOP-9567
 URL: https://issues.apache.org/jira/browse/HADOOP-9567
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


We do a renewal for cached tickets (obtained via kinit before using a Hadoop 
application) but we explicitly seem to avoid doing a renewal for keytab based 
logins (done from within the client code) when we could do that as well via a 
similar thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: build failure: A required class is missing: org/apache/maven/surefire/util/NestedCheckedException

2013-05-08 Thread Harsh J
What version of Apache Maven are you using?

On Thu, May 9, 2013 at 11:44 AM, samar.opensource
 wrote:
>
> Hi Devs,
>
>  I am getting the following error building the trunk .
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-surefire-plugin:2.12.3:test
> (default-test) on project hadoop-annotations: Execution default-test of
> goal org.apache.maven.plugins:maven-surefire-plugin:2.12.3:test failed:
> Unable to load the mojo 'test' in the plugin
> 'org.apache.maven.plugins:maven-surefire-plugin:2.12.3'. A required
> class is missing: org/apache/maven/surefire/util/NestedCheckedException
> [ERROR] -
> [ERROR] realm = plugin>org.apache.maven.plugins:maven-surefire-plugin:2.12.3
> [ERROR] strategy =
> org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
>
> I have pulled the code from git(https://github.com/apache/hadoop-common.git)
>
> I used mvn clean install to build
>
> I tried deleting my local maven repository but dint seem to help
> much(http://jira.codehaus.org/browse/SUREFIRE-85)
>
>
>
> Regards,
> Samar
>
>
>



-- 
Harsh J


Re: Issue HADOOP-8905

2013-05-08 Thread Harsh J
Hi Steve,

Per my knowledge, no one is currently working on this or has planned
to. The request is also unassigned so you can go right ahead!

Do ping the common-dev@ with any review requests, or discussion
requests, should no one respond to the JIRA comments in time.

On Wed, May 8, 2013 at 4:19 AM,   wrote:
> Hi Dev list,
>
> I am looking into implementing Add metrics for HTTP Server (HADOOP-8905) and
> would first like to seek clarification that no one else has covered this off
> to their knowledge within an existing JIRA or nobody has the intention to
> cover this off shortly.
>
> Kind Regards
> Steve
>



-- 
Harsh J


Re: Failing to run ant test on clean Hadoop branch-1 checkout

2013-04-27 Thread Harsh J
Hi Amit,

The common-dev list is more suited for Apache Hadoop
development-related questions, so I've moved it to that and bcc'd
user@. Each failed test also produces a log under the build directory
for the real reason of failure - can you also inspect that to
determine the reason behind the failures? If there are genuine bugs
from your analysis, and the failures are consistent, please do file a
JIRA as well.

On Sun, Apr 28, 2013 at 12:05 AM, Amit Sela  wrote:
> Hi all,
>
> I'm trying to run ant test on a clean Hadoop branch-1 checkout.
> ant works fine but when I run ant test I get a lot of failures:
>
> Test org.apache.hadoop.cli.TestCLI FAILED
> Test org.apache.hadoop.fs.TestFileUtil FAILED
> Test org.apache.hadoop.fs.TestHarFileSystem FAILED
> Test org.apache.hadoop.fs.TestUrlStreamHandler FAILED
> Test org.apache.hadoop.hdfs.TestAbandonBlock FAILED
> Test org.apache.hadoop.hdfs.TestBlocksScheduledCounter FAILED
> Test org.apache.hadoop.hdfs.TestDFSShell FAILED
> Test org.apache.hadoop.hdfs.TestDFSShellGenericOptions FAILED
> Test org.apache.hadoop.hdfs.TestDataTransferProtocol FAILED
> Test org.apache.hadoop.hdfs.TestDatanodeReport FAILED
> Test org.apache.hadoop.hdfs.TestDistributedFileSystem FAILED
> Test org.apache.hadoop.hdfs.TestFSInputChecker FAILED
> Test org.apache.hadoop.hdfs.TestFSOutputSummer FAILED
> Test org.apache.hadoop.hdfs.TestFileAppend FAILED
> Test org.apache.hadoop.hdfs.TestFileAppend2 FAILED
> Test org.apache.hadoop.hdfs.TestFileAppend3 FAILED
> Test org.apache.hadoop.hdfs.TestFileCorruption FAILED
> Test org.apache.hadoop.hdfs.TestFileStatus FAILED
> Test org.apache.hadoop.hdfs.TestGetBlocks FAILED
> Test org.apache.hadoop.hdfs.TestHDFSTrash FAILED
> Test org.apache.hadoop.hdfs.TestLease FAILED
> Test org.apache.hadoop.hdfs.TestLeaseRecovery FAILED
> Test org.apache.hadoop.hdfs.TestLocalDFS FAILED
> Test org.apache.hadoop.hdfs.TestMissingBlocksAlert FAILED
> Test org.apache.hadoop.hdfs.TestPread FAILED
> Test org.apache.hadoop.hdfs.TestQuota FAILED
> Test org.apache.hadoop.hdfs.TestRestartDFS FAILED
>
> and more
>
> I do get some warnings before the tests start:
> Clover not found. Code coverage reports disabled.
> warning: [options] bootstrap class path not set in conjunction with -source
> 1.6
> 3/04/27 21:29:44 INFO mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> Trying to override old definition of task jsp-compile
> warning: [options] bootstrap class path not set in conjunction with -source
> 1.6
> Note: Some input files use unchecked or unsafe operations.
> Note: Recompile with -Xlint:unchecked for details.
>
> Thanks,
>
> Amit.
>



-- 
Harsh J


[jira] [Resolved] (HADOOP-9510) DU command should provide a -h flag to display a more human readable format.

2013-04-25 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9510.
-

Resolution: Not A Problem

This is already available in the revamped shell apps under 2.x releases today; 
see 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#du

> DU command should provide a -h flag to display a more human readable format.
> 
>
> Key: HADOOP-9510
> URL: https://issues.apache.org/jira/browse/HADOOP-9510
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Corey J. Nolet
>Priority: Minor
>
> Would be useful to have the sizes print out as 500M or 3.4G instead of bytes 
> only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-9496) Bad merge of HADOOP-9450 on branch-2 breaks all bin/hadoop calls that need HADOOP_CLASSPATH

2013-04-23 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9496.
-

   Resolution: Fixed
Fix Version/s: 2.0.5-beta

Committed revision 1471230 to fix this properly.

> Bad merge of HADOOP-9450 on branch-2 breaks all bin/hadoop calls that need 
> HADOOP_CLASSPATH 
> 
>
> Key: HADOOP-9496
> URL: https://issues.apache.org/jira/browse/HADOOP-9496
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 2.0.5-beta
>Reporter: Gopal V
>Assignee: Harsh J
>Priority: Critical
> Fix For: 2.0.5-beta
>
> Attachments: HADOOP-9496.patch
>
>
> Merge of HADOOP-9450 to branch-2 is broken for hadoop-config.sh
> on trunk
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh?r1=1453486&r2=1469214&pathrev=1469214
> vs on branch-2
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh?r1=1390222&r2=1469215
> This is breaking all hadoop client code which needs HADOOP_CLASSPATH to be 
> set correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Created a new Fix Version for 1.3.0

2013-04-18 Thread Harsh J
Just an FYI. Since 1.2 got branched, and I couldn't find one for 1.3,
I went ahead and created this under HADOOP and HDFS JIRAs (already
used under MAPREDUCE). I also updated bugs HDFS-4622 and HDFS-4581 to
reference their right fix versions.

Thanks,
--
Harsh J


Re: [VOTE] Release Apache Hadoop 0.23.7

2013-04-16 Thread Harsh J
+1

Downloaded sources, built successfully, stood up a 1-node cluster and
ran a Pi MR job.

On Wed, Apr 17, 2013 at 2:27 AM, Hitesh Shah  wrote:
> +1.
>
> Downloaded source, built and ran a couple of sample jobs on a single node 
> cluster.
>
> -- Hitesh
>
> On Apr 11, 2013, at 12:55 PM, Thomas Graves wrote:
>
>> I've created a release candidate (RC0) for hadoop-0.23.7 that I would like
>> to release.
>>
>> This release is a sustaining release with several important bug fixes in
>> it.
>>
>> The RC is available at:
>> http://people.apache.org/~tgraves/hadoop-0.23.7-candidate-0/
>> The RC tag in svn is here:
>> http://svn.apache.org/viewvc/hadoop/common/tags/release-0.23.7-rc0/
>>
>> The maven artifacts are available via repository.apache.org.
>>
>> Please try the release and vote; the vote will run for the usual 7 days.
>>
>> thanks,
>> Tom Graves
>>
>



-- 
Harsh J


Re: [VOTE] Release Apache Hadoop 2.0.4-alpha

2013-04-16 Thread Harsh J
+1

Built from source successfully, verified signatures, stood up a 1-node
cluster with CS, ran one Pi MR job, and the DistributedShell
application.

On Wed, Apr 17, 2013 at 6:10 AM, Sandy Ryza  wrote:
> +1 (non-binding)
>
> Built from source and ran sample jobs concurrently with the fair scheduler
> on a single node cluster.
>
>
> On Fri, Apr 12, 2013 at 2:56 PM, Arun C Murthy  wrote:
>
>> Folks,
>>
>> I've created a release candidate (RC2) for hadoop-2.0.4-alpha that I would
>> like to release.
>>
>> The RC is available at:
>> http://people.apache.org/~acmurthy/hadoop-2.0.4-alpha-rc2/
>> The RC tag in svn is here:
>> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.4-alpha-rc2
>>
>> The maven artifacts are available via repository.apache.org.
>>
>> Please try the release and vote; the vote will run for the usual 7 days.
>>
>> thanks,
>> Arun
>>
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>



-- 
Harsh J


Re: git clone hadoop taking too much time almost 12 hrs

2013-04-10 Thread Harsh J
I once blogged about cloning big repositories after experiencing the
mammoth Android's repos were:
http://www.harshj.com/2010/08/29/a-less-known-thing-about-cloning-git-repositories/

Try a git clone with a --depth=1 option, to reduce total download by not
getting all the history objects. This would have some side-effects vs. a
regular clone, but should be fine for contributions.


On Wed, Apr 10, 2013 at 11:53 PM, mugisha moses  wrote:

> The whole repo is like 290 mb  so make sure you have a decent internet
> connection
>
>
> On Wed, Apr 10, 2013 at 9:03 PM, maisnam ns  wrote:
>
> > Thanks Andrew for your suggestion,I will clone it from the mirror.
> >
> > Regards
> > Niranjan Singh
> >
> >
> > On Wed, Apr 10, 2013 at 11:04 PM, Andrew Wang  > >wrote:
> >
> > > Hi Niranjan,
> > >
> > > Try doing your initial clone from the github mirror instead, I found it
> > to
> > > be much faster:
> > >
> > > https://github.com/apache/hadoop-common
> > >
> > > I use the apache git for subsequent pulls.
> > >
> > > Best,
> > > Andrew
> > >
> > >
> > > On Tue, Apr 9, 2013 at 6:15 PM, maisnam ns 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to execute  - git clone git://
> > > > git.apache.org/hadoop-common.git so that I could setup a development
> > > > environment for Hadoop under the Eclipse IDE but it is taking too
> much
> > > > time.
> > > >
> > > > Can somebody let me know why it is taking too much time, I have a
> high
> > > > speed internet connection and I don't think connectivity is the issue
> > > here.
> > > >
> > > > Thanks
> > > > Niranjan Singh
> > > >
> > >
> >
>



-- 
Harsh J


Re: Building Hadoop from source code

2013-04-09 Thread Harsh J
need to start hadoop
>> > > >>> 1. Go to that bin directory and format your namenode
>> > > >>> 2. configure you site-xmls
>> > > >>> 3. start-all.sh  note there are other ways also
>> > > >>> 4. do jps to see if all daemons are running
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> On Tue, Apr 9, 2013 at 4:54 PM, Mohammad Mustaqeem <
>> > > >>> 3m.mustaq...@gmail.com> wrote:
>> > > >>>
>> > > >>>> Please, anyone tell me where is the build hadoop??
>> > > >>>> that can be used to install the hadoop cluster.
>> > > >>>>
>> > > >>>>
>> > > >>>> On Tue, Apr 9, 2013 at 2:32 PM, Mohammad Mustaqeem
>> > > >>>> <3m.mustaq...@gmail.com>wrote:
>> > > >>>>
>> > > >>>> > @Ling, you mean to say to run "find -name *tar*"??
>> > > >>>> > I run but don't know which file will be used to install Hadoop.
>> > > >>>> >
>> > > >>>> > @Niranjan
>> > > >>>> > I haven't changed anything I just executed "svn checkout
>> > > >>>> >
>> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-trunk";
>> > > >>>> to get
>> > > >>>> > the Hadoop source code in the "hadoop-trunk" directory.
>> > > >>>> > After that I executed the "cd hadoop-trunk" and finally executed
>> > > "mvn
>> > > >>>> > package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip=true".
>> > > >>>> > I want to ask what is location of tar file created which can be
>> > used
>> > > >>>> to
>> > > >>>> > install the Hadoop ??
>> > > >>>> >
>> > > >>>> >
>> > > >>>> > On Tue, Apr 9, 2013 at 2:08 PM, maisnam ns <
>> maisnam...@gmail.com>
>> > > >>>> wrote:
>> > > >>>> >
>> > > >>>> >> @Mohammad Mustaqeem - Once you create the patch as given in the
>> > > 'How
>> > > >>>> to
>> > > >>>> >> contribute to Hadoop' and apply the patch. The  changes will be
>> > > >>>> reflected
>> > > >>>> >> in the files you have modified.
>> > > >>>> >>
>> > > >>>> >> 1.Now you trigger the build ant or maven. I tried with ant so
>> the
>> > > ant
>> > > >>>> >> command I gave is  ant clean compile bin-package. Don't forget
>> to
>> > > >>>> download
>> > > >>>> >> ivy.jar and copy into you ant home/ lib folder. Once the build
>> is
>> > > >>>> >> triggered
>> > > >>>> >> Hadoop should get built along with the changes you made.
>> > > >>>> >>
>> > > >>>> >> If , I am not mistaken , you modified some hadoop files say
>> > > >>>> >> BlockLocation.java, in your
>> > > >>>> >> Hadoopx.x\src\core\org\apache\hadoop\fs\BlockLocation.java.
>> > > >>>> >>
>> > > >>>> >> The jar will be in
>> Hadoopx.x\build\hadoop-0.20.3-dev-core.jar(In
>> > my
>> > > >>>> >> version)
>> > > >>>> >>
>> > > >>>> >> Hope this clears your doubt.
>> > > >>>> >>
>> > > >>>> >> Regards
>> > > >>>> >> Niranjan Singh
>> > > >>>> >>
>> > > >>>> >>
>> > > >>>> >> On Tue, Apr 9, 2013 at 1:38 PM, Mohammad Mustaqeem
>> > > >>>> >> <3m.mustaq...@gmail.com>wrote:
>> > > >>>> >>
>> > > >>>> >> > @Steve
>> > > >>>> >> > I am new to Hadoop developement.
>> > > >>>> >> > Can you please tell me, whats is the location of tar file??
>> > > >>>> >> >
>> > > >>>> >> >
>> > > >>>> >> > On Tue, Apr 9, 2013 at 12:09 AM, Steve Loughran <
>> > > >>>> ste...@hortonworks.com
>> > > >>>> >> > >wrote:
>> > > >>>> >> >
>> > > >>>> >> > > On 8 April 2013 16:08, Mohammad Mustaqeem <
>> > > >>>> 3m.mustaq...@gmail.com>
>> > > >>>> >> > wrote:
>> > > >>>> >> > >
>> > > >>>> >> > > > Please, tell what I am doing wrong??
>> > > >>>> >> > > > Whats the problem??
>> > > >>>> >> > > >
>> > > >>>> >> > >
>> > > >>>> >> > > a lot of these seem to be network-related tests. You can
>> turn
>> > > >>>> off all
>> > > >>>> >> the
>> > > >>>> >> > > tests; look in BUILDING.TXT at the root of the source tree
>> > for
>> > > >>>> the
>> > > >>>> >> > various
>> > > >>>> >> > > operations, then add -DskipTests to the end of every
>> command,
>> > > >>>> such as
>> > > >>>> >> > >
>> > > >>>> >> > > mvn package -Pdist -Dtar -DskipTests
>> > > >>>> >> > >
>> > > >>>> >> > > to build the .tar packages
>> > > >>>> >> > >
>> > > >>>> >> > >  mvn package -Pdist -Dtar -DskipTests
>> > -Dmaven.javadoc.skip=true
>> > > >>>> >> > > to turn off the javadoc creation too, for an even faster
>> > build
>> > > >>>> >> > >
>> > > >>>> >> >
>> > > >>>> >> >
>> > > >>>> >> >
>> > > >>>> >> > --
>> > > >>>> >> > *With regards ---*
>> > > >>>> >> > *Mohammad Mustaqeem*,
>> > > >>>> >> > M.Tech (CSE)
>> > > >>>> >> > MNNIT Allahabad
>> > > >>>> >> > 9026604270
>> > > >>>> >> >
>> > > >>>> >>
>> > > >>>> >
>> > > >>>> >
>> > > >>>> >
>> > > >>>> > --
>> > > >>>> > *With regards ---*
>> > > >>>> > *Mohammad Mustaqeem*,
>> > > >>>> >  M.Tech (CSE)
>> > > >>>> > MNNIT Allahabad
>> > > >>>> > 9026604270
>> > > >>>> >
>> > > >>>> >
>> > > >>>> >
>> > > >>>>
>> > > >>>>
>> > > >>>> --
>> > > >>>> *With regards ---*
>> > > >>>> *Mohammad Mustaqeem*,
>> > > >>>> M.Tech (CSE)
>> > > >>>> MNNIT Allahabad
>> > > >>>> 9026604270
>> > > >>>>
>> > > >>>
>> > > >>>
>> > > >>
>> > > >
>> > > >
>> > > > --
>> > > > *With regards ---*
>> > > > *Mohammad Mustaqeem*,
>> > > > M.Tech (CSE)
>> > > > MNNIT Allahabad
>> > > > 9026604270
>> > > >
>> > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > http://www.lingcc.com
>> > >
>> >
>> >
>> >
>> > --
>> > *With regards ---*
>> > *Mohammad Mustaqeem*,
>> > M.Tech (CSE)
>> > MNNIT Allahabad
>> > 9026604270
>> >
>>
>>
>>
>> --
>> http://www.lingcc.com
>>
>
>
>
> --
> *With regards ---*
> *Mohammad Mustaqeem*,
> M.Tech (CSE)
> MNNIT Allahabad
> 9026604270



--
Harsh J


[jira] [Created] (HADOOP-9461) JobTracker and NameNode both grant delegation tokens to non-secure clients

2013-04-06 Thread Harsh J (JIRA)
Harsh J created HADOOP-9461:
---

 Summary: JobTracker and NameNode both grant delegation tokens to 
non-secure clients
 Key: HADOOP-9461
 URL: https://issues.apache.org/jira/browse/HADOOP-9461
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


If one looks at the MAPREDUCE-1516 added logic in JobTracker.java's 
isAllowedDelegationTokenOp() method, and apply non-secure states of 
UGI.isSecurityEnabled == false and authMethod == SIMPLE, the return result is 
true when the intention is false (due to the shorted conditionals).

This is allowing non-secure JobClients to easily request and use 
DelegationTokens and cause unwanted errors to be printed in the JobTracker when 
the renewer attempts to run. Ideally such clients ought to get an error if they 
request a DT in non-secure mode.

HDFS in trunk and branch-1 both too have the same problem. Trunk MR 
(HistoryServer) and YARN are however, unaffected due to a simpler, inlined 
logic instead of reuse of this faulty method.

Note that fixing this will break Oozie today, due to the merged logic of 
OOZIE-734. Oozie will require a fix as well if this is to be fixed in branch-1. 
As a result, I'm going to mark this as an Incompatible Change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Hadoop Eclipse plug-in distribution

2013-04-03 Thread Harsh J
Hi Rafael,

We do have a new project developing the plugin separately. Please
check out and join the development efforts at
http://hdt.incubator.apache.org (Hadoop Development Tools).

On Thu, Apr 4, 2013 at 2:01 AM, Rafael Medeiros Teixeira
 wrote:
> Hi,
>
> First of all I'm not sure if this is the correct audience for my question,
> apologies if it is not.
>
> I have seen many accounts of Hadoop developers who had a hard time figuring
> out how to compile and install the Hadoop plug-in for Eclipse. My question
> is: are there any reasons, technical or otherwise, to distribute the
> plug-in as source code with Hadoop releases instead of as an Eclipse
> project? Having the plug-in as a separate Eclipse project would allow
> installling it via update site, which is the most common way of installing
> plug-ins. I also think it makes sense from a development standpoint, since
> contributors to the eclipse plugin are rarely the same ones that contribute
> to MapReduce.
>
> Regards,
> Rafael M.



-- 
Harsh J


Re: Error in first build : Cannot run program "protoc": CreateProcess error=2

2013-04-03 Thread Harsh J
So this is the JIRA for tracking the protoc 2.5 issue:
https://issues.apache.org/jira/browse/HADOOP-9346.

Regarding the msbuild trouble, your issue is simply that you perhaps
do not have Visual Studio 2010 Professional/Windows SDK installed?
Apache Hadoop trunk has Windows specific build profiles in it since it
now supports the Windows platform but it has its own requirements of
build. Check out the section "Building on Windows" under your
checkout/clone's BUILDING.txt or
http://svn.apache.org/repos/asf/hadoop/common/trunk/BUILDING.txt.

I'm not currently sure how to turn the auto-detection off to make it
compile as it would on Linux minus natives (like how it used to,
before the feature drop), but perhaps there should be a way to allow
that.

On Wed, Apr 3, 2013 at 8:16 PM, Chandrashekhar Kotekar
 wrote:
> Thanks a lot for your help.
>
> You were right. Problem was with Protoc version 1.5 only. I downloaded and
> added protoc 1.4 version and now that error is gone. However now I am stuck
> at this new error. Now maven is not able to find "msbuild". Error is as
> follows :
>
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec
> (compile-ms-winutils) on project hadoop-common: Command execution failed.
> Cannot run program "msbuild" (in directory
> "D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common"):
> CreateProcess error=2, The system cannot find the file specified -> [Help 1]
>
>
> Can you please help with this error? I googled for this kind of error but
> couldn't find anything related to this error.
>
> One more thing I would like to know, how to search into mailing list for
> this kind of errors?
>
> I think maybe many people before me must have faced these type of errors.
> If I could search into mailing list or some forum related to Hadoop then I
> do not need to disturb all the people in the mailing list for this kind of
> trivial errors and people can concentrate on more important stuff.
>
>
>
> Regards,
> Chandrash3khar K0tekar
> Mobile - 8884631122
>
>
> On Wed, Apr 3, 2013 at 7:21 PM, Harsh J  wrote:
>
>> I'm not sure if trunk works with 2.5.x protoc yet. Try again with
>> 2.4.1 please? I remember filing a JIRA for this, will hunt and send in
>> a bit.
>>
>> On Wed, Apr 3, 2013 at 6:47 PM, Chandrashekhar Kotekar
>>  wrote:
>> > Hi,
>> >
>> > Just now I have downloaded Hadoop source code. I have successfully run
>> "mvn
>> > clean" target but while trying "mvn install" target I am getting
>> following
>> > error :
>> >
>> > *[INFO] Executed tasks
>> > [INFO]
>> > [INFO] --- hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) @
>> > hadoop-common ---
>> > [WARNING] [protoc,
>> >
>> --java_out=D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\target\generated-sources\java,
>> >
>> -ID:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto,
>> >
>> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\HAServiceProtocol.proto,
>> >
>> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\IpcConnectionContext.proto,
>> >
>> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ProtobufRpcEngine.proto,
>> >
>> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ProtocolInfo.proto,
>> >
>> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\RpcHeader.proto,
>> >
>> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\Security.proto,
>> >
>> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ZKFCProtocol.proto]
>> >  failed: java.io.IOException: Cannot run program "protoc": CreateProcess
>> > error=2, The system cannot find the file specified
>> > [ERROR] protoc compiler error
>> > [INFO]
>> > 
>> > [INFO] Reactor Summary:*
>> >
>> > I have copied exe file of Protoc and added path to that exe file in PATH
>> > variable. When I run "echo %PATH%" I get output something like below:
>> >
>> > *C:\Users\shekhar>echo %PATH%*
>> > *C:\Program Files\TortoiseSVN\bin;C:\Program
>> >
>> Files\Java\jdk1.7.0_11\bin;D:\softwares\apache-maven-3.0.5\bin;D:\softwares\protoc-2.5.0-win32;D:\softwares\apache-ant-1.9.0\bin;
>> > *
>> >
>> > So I think I have successfully put protoc in my path.
>> >
>> > I would like to know why this error is coming. Request someone to please
>> > help.
>> >
>> >
>> > Thanks and Regards,
>> > Chandrash3khar K0tekar
>> > Mobile - 8884631122
>>
>>
>>
>> --
>> Harsh J
>>



-- 
Harsh J


Re: Error in first build : Cannot run program "protoc": CreateProcess error=2

2013-04-03 Thread Harsh J
I'm not sure if trunk works with 2.5.x protoc yet. Try again with
2.4.1 please? I remember filing a JIRA for this, will hunt and send in
a bit.

On Wed, Apr 3, 2013 at 6:47 PM, Chandrashekhar Kotekar
 wrote:
> Hi,
>
> Just now I have downloaded Hadoop source code. I have successfully run "mvn
> clean" target but while trying "mvn install" target I am getting following
> error :
>
> *[INFO] Executed tasks
> [INFO]
> [INFO] --- hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) @
> hadoop-common ---
> [WARNING] [protoc,
> --java_out=D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\target\generated-sources\java,
> -ID:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto,
> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\HAServiceProtocol.proto,
> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\IpcConnectionContext.proto,
> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ProtobufRpcEngine.proto,
> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ProtocolInfo.proto,
> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\RpcHeader.proto,
> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\Security.proto,
> D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ZKFCProtocol.proto]
>  failed: java.io.IOException: Cannot run program "protoc": CreateProcess
> error=2, The system cannot find the file specified
> [ERROR] protoc compiler error
> [INFO]
> 
> [INFO] Reactor Summary:*
>
> I have copied exe file of Protoc and added path to that exe file in PATH
> variable. When I run "echo %PATH%" I get output something like below:
>
> *C:\Users\shekhar>echo %PATH%*
> *C:\Program Files\TortoiseSVN\bin;C:\Program
> Files\Java\jdk1.7.0_11\bin;D:\softwares\apache-maven-3.0.5\bin;D:\softwares\protoc-2.5.0-win32;D:\softwares\apache-ant-1.9.0\bin;
> *
>
> So I think I have successfully put protoc in my path.
>
> I would like to know why this error is coming. Request someone to please
> help.
>
>
> Thanks and Regards,
> Chandrash3khar K0tekar
> Mobile - 8884631122



-- 
Harsh J


Re: Reading partition for reducer

2013-04-01 Thread Harsh J
The question should be more specific here: Do you want to process a
map's sorted total output or do you want to pre-process a whole
partition (i.e. all data pertaining to one reducer)? Former would be
more ideal inside MapTask.java, latter in ReduceTask.java.

On Mon, Apr 1, 2013 at 5:36 PM, Vikas Jadhav  wrote:
> Hello
>
> I want to process output of mapper to processed before it is sent to
> reducer.
>
> @ what point i should hook in my code processing
>
>
> i guess it is ReduceTask.java file
>
> if anyone knows reagarding this please help me in this.
>
>
> Thank You.
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*



-- 
Harsh J


Re: Rack Awareness

2013-03-25 Thread Harsh J
Have you locally tested the script? What do you mean by "not work"? Do
you not see it loaded, not see it send back proper values, etc. -
what?

P.s. What version?

P.s. User questions are not to be sent to the developer/issue lists.
They should be sent just to u...@hadoop.apache.org. Thanks!

On Mon, Mar 25, 2013 at 3:50 PM, preethi ganeshan
 wrote:
> Hi,
> I wanted to known how to use a network topology script . I have set the
> net.topology.script.file.name to the topology script . But it does not
> work. What else must be done ?
>
> Thank you
> Regards ,
> Preethi



-- 
Harsh J


Re: shuffling one intermediate pair to more than one reducer

2013-03-24 Thread Harsh J
This one is rather easy. Not sure why you'd open a JIRA for a request.
JIRA is to be used for feature requests and bug/improvement requests.

Also, wasn't this discussed some time back by you? See
http://search-hadoop.com/?q=%22pair+from+mapper+to+multiple+reducer%22
for the many replies of solutions you already received. If you
disregarded them all for some reason, please state so when following
up with the same question all over again.

P.s. Please do not cross post to all lists you know of. Use one list
per question, based on relevancy, and help avoid confusion.

On Mon, Mar 25, 2013 at 10:33 AM, Vikas Jadhav  wrote:
> Hello
>
> I have use case where i want to shuffle same pair to more than one reducer.
> is there anyone tried this or can give suggestion how to implement it.
>
>
> I have crated jira for same
> https://issues.apache.org/jira/browse/MAPREDUCE-5063
>
> Thank you.
> --
>
>
> Thanx and Regards
>  Vikas Jadhav



--
Harsh J


Re: the part of the intermediate output fed to a reducer

2013-03-23 Thread Harsh J
Hi,

On Sun, Mar 24, 2013 at 12:00 AM, preethi ganeshan
 wrote:
> Hey all,
> I am working on project that schedules data local reduce tasks.

Great, are you planning to contribute it upstream too? See
https://issues.apache.org/jira/browse/MAPREDUCE-199. I'm also hoping
you're working on trunk and not the maintenance branch branch-1, which
is very outdated with where MR is today.

> However , i wanted to know if there is a way using MapTask.java to keep track 
> of the
> inputs and size of the input to every reducer. In other words what code do
> i add to get the size of the intermediate output that is fed to a reduce
> task before a reduce task begins.

Change the thinking here a bit: A map does not feed a reduce (i.e. its
not a push). A reduce consumes a map output after its completion (they
map task JVM may terminate for all it cares). Upon a map's completion,
its counters are available at the central (i.e. the ApplicationMaster)
which the reduce task can poll for sizes (it may already be doing
this).

--
Harsh J


[jira] [Resolved] (HADOOP-2781) Hadoop/Groovy integration

2013-03-22 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-2781.
-

Resolution: Won't Fix

Closing per comment below and inactive for couple years now:

bq. Grool was a dead end.

Possible alternatives (given FlumeJava's mention): Apache Crunch - 
http://crunch.apache.org and/or Cascading - http://cascading.org.

> Hadoop/Groovy integration
> -
>
> Key: HADOOP-2781
> URL: https://issues.apache.org/jira/browse/HADOOP-2781
> Project: Hadoop Common
>  Issue Type: New Feature
> Environment: Any
>Reporter: Ted Dunning
> Attachments: trunk.tgz
>
>
> This is a place-holder issue to hold initial release of the groovy 
> integration for hadoop.
> The goal is to be able to write very simple map-reduce programs in just a few 
> lines of code in a functional style.  Word count should be less than 5 lines 
> of code! 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Help on Hadoop wiki page

2013-03-21 Thread Harsh J
Thanks Robin, I've added your ID into the contributor's group. You should
be able to make the edits yourself now - please go ahead.


On Thu, Mar 21, 2013 at 6:41 PM, Robin Schumacher wrote:

> Harsh - thanks for getting back to me and pointing my request to the right
> place. My ID is RobinSchumacher.
>
> Regards,
>
> --Robin
>
>
> On Wed, Mar 20, 2013 at 8:33 PM, Harsh J  wrote:
>
>> Hi Robin,
>>
>> Moving this to common-dev as thats the right list to send this to. Can
>> you pass us your Apache Hadoop Wiki user ID so we can add you in as a
>> contributor and you can edit this in yourself (and make any other
>> contributions as well)?
>>
>> On Wed, Mar 20, 2013 at 8:06 PM, Robin Schumacher 
>> wrote:
>> > I apologize in advance if there is another address I should be sending
>> this
>> > to, but could someone please add DataStax to the commercial support
>> page on
>> > the wiki (http://wiki.apache.org/hadoop
>> > /Distributions%20and%20Commercial%20Support)?
>> >
>> > Below is the text we'd like used. Please let us know if you have any
>> > questions or need any of the text changed.
>> >
>> > Thanks in advance.
>> >
>> > Robin Schumacher
>> > VP Products, DataStax
>> >
>> >
>> > DataStax provides a distribution of Hadoop that is fully integrated
>> with Apache
>> > Cassandra<
>> https://datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra
>> >
>> >  and Apache Solr<
>> https://datastax.com/what-we-offer/products-services/datastax-enterprise/apache-solr
>> >
>> > in
>> > its DataStax Enterprise
>> > platform<
>> https://datastax.com/what-we-offer/products-services/datastax-enterprise
>> >.
>> > DataStax Enterprise is completely free to use for development
>> environments
>> > with no restrictions. In addition, DataStax supplies
>> > OpsCenter<
>> https://datastax.com/what-we-offer/products-services/datastax-opscenter>
>> > for
>> > visual management and monitoring, along with expert
>> > support<https://datastax.com/what-we-offer/products-services/support>
>> > , training <
>> https://datastax.com/what-we-offer/products-services/training>,
>> > andconsulting services<
>> https://datastax.com/what-we-offer/products-services/consulting>
>> >  for Hadoop, Cassandra, and Solr.
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> <http://www.datastax.com/events/cassandrasummit2013>
>



-- 
Harsh J


[jira] [Created] (HADOOP-9424) The "hadoop jar" invocation should include the passed jar on the classpath as a whole

2013-03-21 Thread Harsh J (JIRA)
Harsh J created HADOOP-9424:
---

 Summary: The "hadoop jar" invocation should include the passed jar 
on the classpath as a whole
 Key: HADOOP-9424
 URL: https://issues.apache.org/jira/browse/HADOOP-9424
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


When you have a case such as this:

{{X.jar -> Classes = Main, Foo}}
{{Y.jar -> Classes = Bar}}

With implementation details such as:

* Main references Bar and invokes a public, static method on it.
* Bar does a class lookup to find Foo (Class.forName("Foo")).

Then when you do a {{HADOOP_CLASSPATH=Y.jar hadoop jar X.jar Main}}, the Bar's 
method fails with a ClassNotFound exception cause of the way RunJar runs.

RunJar extracts the passed jar and includes its contents on the ClassLoader of 
its current thread but the {{Class.forName(…)}} call from another class does 
not check that class loader and hence cannot find the class as its not on any 
classpath it is aware of.

The script of "hadoop jar" should ideally include the passed jar argument to 
the CLASSPATH before RunJar is invoked, for this above case to pass.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Help on Hadoop wiki page

2013-03-20 Thread Harsh J
Hi Robin,

Moving this to common-dev as thats the right list to send this to. Can
you pass us your Apache Hadoop Wiki user ID so we can add you in as a
contributor and you can edit this in yourself (and make any other
contributions as well)?

On Wed, Mar 20, 2013 at 8:06 PM, Robin Schumacher  wrote:
> I apologize in advance if there is another address I should be sending this
> to, but could someone please add DataStax to the commercial support page on
> the wiki (http://wiki.apache.org/hadoop
> /Distributions%20and%20Commercial%20Support)?
>
> Below is the text we'd like used. Please let us know if you have any
> questions or need any of the text changed.
>
> Thanks in advance.
>
> Robin Schumacher
> VP Products, DataStax
>
>
> DataStax provides a distribution of Hadoop that is fully integrated with 
> Apache
> Cassandra<https://datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra>
>  and Apache 
> Solr<https://datastax.com/what-we-offer/products-services/datastax-enterprise/apache-solr>
> in
> its DataStax Enterprise
> platform<https://datastax.com/what-we-offer/products-services/datastax-enterprise>.
> DataStax Enterprise is completely free to use for development environments
> with no restrictions. In addition, DataStax supplies
> OpsCenter<https://datastax.com/what-we-offer/products-services/datastax-opscenter>
> for
> visual management and monitoring, along with expert
> support<https://datastax.com/what-we-offer/products-services/support>
> , training <https://datastax.com/what-we-offer/products-services/training>,
> andconsulting 
> services<https://datastax.com/what-we-offer/products-services/consulting>
>  for Hadoop, Cassandra, and Solr.



-- 
Harsh J


[jira] [Resolved] (HADOOP-6942) Ability for having user's classes take precedence over the system classes for tasks' classpath

2013-03-19 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-6942.
-

Resolution: Duplicate

Fixed via MAPREDUCE-1938. Closing as dupe.

> Ability for having user's classes take precedence over the system classes for 
> tasks' classpath
> --
>
> Key: HADOOP-6942
> URL: https://issues.apache.org/jira/browse/HADOOP-6942
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 0.22.0
>Reporter: Krishna Ramachandran
> Attachments: HADOOP-6942.y20.patch, hadoop-common-6942.patch
>
>
> Fix bin/hadoop script to facilitate mapred-1938

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: HTTP/1.1 405 HTTP method PUT is not supported by this URL??

2013-03-17 Thread Harsh J
Hi,

Please use u...@hadoop.apache.org for usage specific questions. The
dev lists are for project developer discussions alone.

Can you share your full curl command, including specifically the URL you used?

On Mon, Mar 18, 2013 at 9:33 AM, 小学园PHP  wrote:
> When i mkdir by curl using the WebHDFS.
> Hadoop return:
> HTTP/1.1 405 HTTP method PUT is not supported by this URL
>
>
> OK, Who know this why?
>
>
> TIA
> Levi



--
Harsh J


Re: Re: how to define new InputFormat with streaming?

2013-03-17 Thread Harsh J
It isn't as easy as changing that import line:

> package org.apache.hadoop.mapred.lib.input does not exist

The right package is package org.apache.hadoop.mapred.

On Mon, Mar 18, 2013 at 7:22 AM, springring  wrote:
> thanks
> I modify the java file with old "mapred" API, but there is still error
>
>  javac -classpath 
> /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d class9 
> ./*.java
> ./WholeFileInputFormat.java:16: error: package 
> org.apache.hadoop.mapred.lib.input does not exist
> import org.apache.hadoop.mapred.lib.input.*;
>
> does it because hadoop-0.20.2-cdh3u3 not include "mapred" API?
>
>
>
>
>
>
> At 2013-03-17 14:22:43,"Harsh J"  wrote:
>>The issue is that Streaming expects the old/stable MR API
>>(org.apache.hadoop.mapred.InputFormat) as its input format class, but your
>>WholeFileInputFormat is using the new MR API
>>(org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form
>>will let you pass.
>>
>>This has nothing to do with your version/distribution of Hadoop.
>>
>>
>>On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran wrote:
>>
>>> On 15 March 2013 09:18, springring  wrote:
>>>
>>> >  Hi,
>>> >
>>> >  my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new
>>> > InputFormat in hadoop book , but there is error
>>> > "class org.apache.hadoop.streaming.WholeFileInputFormat not
>>> > org.apache.hadoop.mapred.InputFormat"
>>> >
>>> > Hadoop version is 0.20, but the streaming still depend on 0.10 mapred
>>> api?
>>> >
>>>
>>>
>>> 1. please don't spam all the lists
>>> 2. grab a later version of the apache releases if you want help on them on
>>> these mailing lists, or go to the cloudera lists, where they will probably
>>> say "upgrade to CDH 4.x" before asking questions.
>>>
>>> thanks
>>>
>>
>>
>>
>>--
>>Harsh J



--
Harsh J


Re: how to define new InputFormat with streaming?

2013-03-16 Thread Harsh J
The issue is that Streaming expects the old/stable MR API
(org.apache.hadoop.mapred.InputFormat) as its input format class, but your
WholeFileInputFormat is using the new MR API
(org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form
will let you pass.

This has nothing to do with your version/distribution of Hadoop.


On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran wrote:

> On 15 March 2013 09:18, springring  wrote:
>
> >  Hi,
> >
> >  my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new
> > InputFormat in hadoop book , but there is error
> > "class org.apache.hadoop.streaming.WholeFileInputFormat not
> > org.apache.hadoop.mapred.InputFormat"
> >
> > Hadoop version is 0.20, but the streaming still depend on 0.10 mapred
> api?
> >
>
>
> 1. please don't spam all the lists
> 2. grab a later version of the apache releases if you want help on them on
> these mailing lists, or go to the cloudera lists, where they will probably
> say "upgrade to CDH 4.x" before asking questions.
>
> thanks
>



-- 
Harsh J


Re: [VOTE] Plan to create release candidate Monday 3/18

2013-03-16 Thread Harsh J
+1


On Fri, Mar 15, 2013 at 9:53 PM, Robert Evans  wrote:

> +1
>
> On 3/10/13 10:38 PM, "Matt Foley"  wrote:
>
> >Hi all,
> >I have created branch-1.2 from branch-1, and propose to cut the first
> >release candidate for 1.2.0 on Monday 3/18 (a week from tomorrow), or as
> >soon thereafter as I can achieve a stable build.
> >
> >Between 1.1.2 and the current 1.2.0, there are 176 patches!!  Draft
> >release
> >notes are available at .../branch-1.2/src/docs/releasenotes.html in the
> >sources.
> >
> >Any non-destabilizing patches committed to branch-1.2 during the coming
> >week (and of course also committed to branch-1) will be included in the
> >RC.
> > However, at this point I request that any big new developments not yet in
> >branch-1.2 be targeted for 1.3.
> >
> >Release plans have to be voted on too, so please vote '+1' to approve this
> >plan.  Voting will close on Sunday 3/17 at 8:30pm PDT.
> >
> >Thanks,
> >--Matt
> >(release manager)
>
>


-- 
Harsh J


Re: [VOTE] Plan to create release candidate for 0.23.7

2013-03-16 Thread Harsh J
+1


On Sat, Mar 16, 2013 at 12:19 AM, Karthik Kambatla wrote:

> +1 (non-binding)
>
> On Fri, Mar 15, 2013 at 9:12 AM, Robert Evans  wrote:
>
> > +1
> >
> > On 3/13/13 11:31 AM, "Thomas Graves"  wrote:
> >
> > >Hello all,
> > >
> > >I think enough critical bug fixes have went in to branch-0.23 that
> > >warrant another release. I plan on creating a 0.23.7 release by the end
> > >March.
> > >
> > >Please vote '+1' to approve this plan.  Voting will close on Wednesday
> > >3/20 at 10:00am PDT.
> > >
> > >Thanks,
> > >Tom Graves
> > >(release manager)
> >
> >
>



-- 
Harsh J


Re: Technical question on Capacity Scheduler.

2013-03-05 Thread Harsh J
The CS does support running jobs in parallel. Are you observing just
the UI or are also noticing a FIFO behavior in logs where assignments
can be seen with timestamps?

On Wed, Mar 6, 2013 at 9:03 AM, Jagmohan Chauhan
 wrote:
> Hi All
>
> Can someone please reply to my queries?
>
> On Sun, Mar 3, 2013 at 5:47 PM, Jagmohan Chauhan > wrote:
>
>> Thanks Harsh.
>>
>> I have a few more questions.
>>
>> Q1: I found it in my experiments using CS that for any user , its next job
>> does not start until its current one is finished. Is it true and are there
>> any exceptions and if true then why is it so?  I I did not find any such
>> condition in the implementation of CS.
>>
>> Q2: The concept of reserved slots  is true only if speculative execution
>> is on. Am i correct ? If yes,then the code dealing with reserved slots wont
>> be executed if speculative execution is off?
>>
>> PS: I am working on MRv1.
>>
>>
>> On Sun, Mar 3, 2013 at 2:41 AM, Harsh J  wrote:
>>
>>> On Sun, Mar 3, 2013 at 1:41 PM, Jagmohan Chauhan <
>>> simplefundumn...@gmail.com
>>> > wrote:
>>>
>>> >  Hi
>>> >
>>> > I am going through the Capacity Scheduler implementation. There is one
>>> > thing i did not understand clearly.
>>> >
>>>
>>> Are you reading the YARN CapacityScheduler or the older, MRv1 one? I'd
>>> suggest reading the newer one for any implementation or research goals,
>>> for
>>> it to be more current and future-applicable.
>>>
>>>
>>> > 1. Does the o ff-switch task refers to a task in which data has to be
>>> > fetched over the network. It means its not node-local ?
>>> >
>>>
>>> Off-switch would imply off-rack, i.e. not node local, nor rack-local.
>>>
>>>
>>> > 2. Does off-switch task  includes only the tasks for which map input
>>> has to
>>> > be fetched from a node on a different rack across the switch or it also
>>> > includes task where data has to be fetched from another node on same
>>> rack
>>> > on same switch?
>>> >
>>>
>>> A task's input split is generally supposed to define all locations of
>>> available inputs. If the CS is unable to schedule to any of those
>>> locations, nor their racks, then it schedules an off-rack (see above) task
>>> which has to pull the input from a different rack.
>>>
>>>
>>> >
>>> > --
>>> > Thanks and Regards
>>> > Jagmohan Chauhan
>>> > MSc student,CS
>>> > Univ. of Saskatchewan
>>> > IEEE Graduate Student Member
>>> >
>>> > http://homepage.usask.ca/~jac735/
>>> >
>>>
>>> Feel free to post any further impl. related questions! :)
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>>
>> --
>> Thanks and Regards
>> Jagmohan Chauhan
>> MSc student,CS
>> Univ. of Saskatchewan
>> IEEE Graduate Student Member
>>
>> http://homepage.usask.ca/~jac735/
>>
>
>
>
> --
> Thanks and Regards
> Jagmohan Chauhan
> MSc student,CS
> Univ. of Saskatchewan
> IEEE Graduate Student Member
>
> http://homepage.usask.ca/~jac735/



--
Harsh J


Re: [Vote] Merge branch-trunk-win to trunk

2013-03-04 Thread Harsh J
Thanks Suresh. Regarding where; we can state it on
http://wiki.apache.org/hadoop/HowToContribute in the test-patch
section perhaps.

+1 on the merge.

On Mon, Mar 4, 2013 at 11:39 PM, Suresh Srinivas  wrote:
> On Sun, Mar 3, 2013 at 8:50 PM, Harsh J  wrote:
>
>> Have we agreed (and stated it somewhere proper) that a -1 obtained for
>> a Windows CI build for a test-patch will not block the ongoing work
>> (unless it is Windows specific) and patches may still be committed to
>> trunk despite that?
>>
>
> This thread is long and possibly hard to follow. Yes, I and several others
> have
> stated that for now it is okay to commit even if Windows precommit build
> posts -1.
>
>>
>> I'm +1 if someone can assert and add the above into the formal
>> guidelines. I'd still prefer that Windows does its releases separately
>> as that ensures more quality for its audience and better testing
>> periods (and wouldn't block anything), but we can come to that iff we
>> are unable to maintain the currently proposed model.
>
>
> Which do you think is the right place to add this?
>
> At this time we are voting for merging into trunk. I prefer having a single
> release
> that supports both Linux and windows. Based on working on Windows support
> I think this is doable and should not hold up releases for Linux.



--
Harsh J


Re: [Vote] Merge branch-trunk-win to trunk

2013-03-03 Thread Harsh J
Have we agreed (and stated it somewhere proper) that a -1 obtained for
a Windows CI build for a test-patch will not block the ongoing work
(unless it is Windows specific) and patches may still be committed to
trunk despite that?

I'm +1 if someone can assert and add the above into the formal
guidelines. I'd still prefer that Windows does its releases separately
as that ensures more quality for its audience and better testing
periods (and wouldn't block anything), but we can come to that iff we
are unable to maintain the currently proposed model.

On Mon, Mar 4, 2013 at 7:39 AM, Tsuyoshi OZAWA  wrote:
> +1 (non-binding),
>
> Windows support is attractive for lots users.
> From point a view from Hadoop developer, Matt said that CI supports
> cross platform testing, and it's quite reasonable condition to merge.
>
> Thanks,
> Tsuyoshi



--
Harsh J


Re: Technical question on Capacity Scheduler.

2013-03-03 Thread Harsh J
On Sun, Mar 3, 2013 at 1:41 PM, Jagmohan Chauhan  wrote:

>  Hi
>
> I am going through the Capacity Scheduler implementation. There is one
> thing i did not understand clearly.
>

Are you reading the YARN CapacityScheduler or the older, MRv1 one? I'd
suggest reading the newer one for any implementation or research goals, for
it to be more current and future-applicable.


> 1. Does the o ff-switch task refers to a task in which data has to be
> fetched over the network. It means its not node-local ?
>

Off-switch would imply off-rack, i.e. not node local, nor rack-local.


> 2. Does off-switch task  includes only the tasks for which map input has to
> be fetched from a node on a different rack across the switch or it also
> includes task where data has to be fetched from another node on same rack
> on same switch?
>

A task's input split is generally supposed to define all locations of
available inputs. If the CS is unable to schedule to any of those
locations, nor their racks, then it schedules an off-rack (see above) task
which has to pull the input from a different rack.


>
> --
> Thanks and Regards
> Jagmohan Chauhan
> MSc student,CS
> Univ. of Saskatchewan
> IEEE Graduate Student Member
>
> http://homepage.usask.ca/~jac735/
>

Feel free to post any further impl. related questions! :)

-- 
Harsh J


[jira] [Created] (HADOOP-9346) Upgrading to protoc 2.5.0 fails the build

2013-02-28 Thread Harsh J (JIRA)
Harsh J created HADOOP-9346:
---

 Summary: Upgrading to protoc 2.5.0 fails the build
 Key: HADOOP-9346
 URL: https://issues.apache.org/jira/browse/HADOOP-9346
 Project: Hadoop Common
  Issue Type: Task
Reporter: Harsh J
Priority: Minor


Reported over the impala lists, one of the errors received is:

{code}
src/hadoop-common-project/hadoop-common/target/generated-sources/java/org/apache/hadoop/ha/proto/ZKFCProtocolProtos.java:[104,37]
 can not find symbol.
symbol: class Parser
location: package com.google.protobuf
{code}

Worth looking into as we'll eventually someday bump our protobuf deps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Vote] Merge branch-trunk-win to trunk

2013-02-27 Thread Harsh J
oop/common/branches/branch-1-win/CHANGES.
>>>branch-1-win.txt?view=markup
>>> >.
>>> This work has been ported to a branch, branch-trunk-win, based on trunk.
>>> Merge patch for this is available on
>>> HADOOP-8562<https://issues.apache.org/jira/browse/HADOOP-8562>
>>> .
>>>
>>> Highlights of the work done so far:
>>> 1. Necessary changes in Hadoop to run natively on Windows. These changes
>>> handle differences in platforms related to path names, process/task
>>> management etc.
>>> 2. Addition of winutils tools for managing file permissions and
>>>ownership,
>>> user group mapping, hardlinks, symbolic links, chmod, disk utilization,
>>>and
>>> process/task management.
>>> 3. Added cmd scripts equivalent to existing shell scripts
>>> hadoop-daemon.sh, start and stop scripts.
>>> 4. Addition of block placement policy implemnation to support cloud
>>> enviroment, more specifically Azure.
>>>
>>> We are very close to wrapping up the work in branch-trunk-win and
>>>getting
>>> ready for a merge. Currently the merge patch is passing close to 100% of
>>> unit tests on Linux. Soon I will call for a vote to merge this branch
>>>into
>>> trunk.
>>>
>>> Next steps:
>>> 1. Call for vote to merge branch-trunk-win to trunk, when the work
>>> completes and precommit build is clean.
>>> 2. Start a discussion on adding Jenkins precommit builds on windows and
>>> how to integrate that with the existing commit process.
>>>
>>> Let me know if you have any questions.
>>>
>>> Regards,
>>> Suresh
>>>
>>>
>>
>>
>>--
>>http://hortonworks.com/download/
>



--
Harsh J


Re: APIs to move data blocks within HDFS

2013-02-22 Thread Harsh J
There's no filesystem (i.e. client) level APIs to do this, but the
Balancer tool of HDFS does exactly this. Reading its sources should
let you understand what kinda calls you need to make to reuse the
balancer protocol and achieve what you need.

In trunk, the balancer is at
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java

HTH, and feel free to ask any relevant follow up questions.

On Fri, Feb 22, 2013 at 11:43 PM, Karthiek C  wrote:
> Hi,
>
> Is there any APIs to move data blocks in HDFS from one node to another *
> after* they have been added to HDFS? Also can we write some sort of
> pluggable module (like scheduler) that controls how data gets placed in
> hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't find
> any filesystem APIs available to do that.
>
> PS: I am working on a research project where we want to investigate how to
> optimally place data in hadoop.
>
> Thanks,
> Karthiek



--
Harsh J


[jira] [Created] (HADOOP-9322) LdapGroupsMapping doesn't seem to set a timeout for its directory search

2013-02-21 Thread Harsh J (JIRA)
Harsh J created HADOOP-9322:
---

 Summary: LdapGroupsMapping doesn't seem to set a timeout for its 
directory search
 Key: HADOOP-9322
 URL: https://issues.apache.org/jira/browse/HADOOP-9322
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Priority: Minor


We don't appear to be setting a timeout via 
http://docs.oracle.com/javase/6/docs/api/javax/naming/directory/SearchControls.html#setTimeLimit(int)
 before we search with 
http://docs.oracle.com/javase/6/docs/api/javax/naming/directory/DirContext.html#search(javax.naming.Name,%20java.lang.String,%20javax.naming.directory.SearchControls).

This may occasionally lead to some unwanted NN pauses due to lock-holding on 
the operations that do group lookups. A timeout is better to define than rely 
on "0" (infinite wait).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Compile and deploy source code for Hadoop 1.0.4

2013-02-09 Thread Harsh J
Hi Trupti,

Welcome! My responses inline.

On Sat, Feb 9, 2013 at 7:59 PM, Trupti Gaikwad  wrote:
> Hi,
>
> I want to work on release 1.0.4 source code. As per Hadoop
> wiki HowToContribute, I can download source code from trunk or from release
> 1.0.4 tag.

Although I do not know your goal here, note that the trunk is the best
place to do dev work if your goal is also to get your work accepted at
the end. We allow 1.x to continue receiving improvements but refuse
divergence in features compared to trunk and and the ongoing branch-2
releases. Just something to consider!

> 1. Source code from hadoop/common/trunk with revision 1397701 corresponding
> to release 1.0.4:
> I downloaded the source with svn revision 1397701 mentioned in release tag.
> My source code gets compiled, however tar file created by build does not
> contain start-mapred.sh file? It does contain start-yarn.sh. Even if source
> revision is old, why I am not getting start-mapred.sh. I really don't want
> to use resourcemanager or node manager to run my mapred job. How can I
> start jobtracker and tasktracker?

Unfortunately SVN revisions aren't exactly what you think they are.
What you need is to actually checkout a tag, not a revision. To get a
1.0.4 tag checked out from the Apache SVN repository, your command
could be:

$ svn checkout http://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/
hadoop-1.0.4
$ cd hadoop-1.0.4/

Likewise, if you want to work on the tip of the 1.x branch instead,
checkout the branch "branch-1":

$ svn checkout http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/
hadoop-1
$ cd hadoop-1/

> 2. Source code from tag release 1.0.4:
> Hadoop wiki also mentions that, If I want to work against any specific
> release then I will have to download release tag.
> I copied my code to src and tried to build it. However my code is not
> getting compiled because as I have developed it in above hadoop-common
> project. I am getting compilation error as there are inconsistencies in
> org.apache.hadoop.fs.FileSystem interface. Shall I develop my class by
> implementing interfaces provided in release 1.0.4?

You're attempting to build trunk (accidentally, in your case). See
above for getting proper 1.x code.

However, if you still wish to build trunk, whose build system is
different from the older 1.x system, some simple notes for building
trunk can be found here:
http://wiki.apache.org/hadoop/QwertyManiac/BuildingHadoopTrunk

> So
> 1. How to get all projects from hadoop-common?
> 2. what is correct way to compile and deploy any changes in core for
> release 1.0.4?

I believe I've answered both questions in the above inlines. Do feel
free to post any further questions you have!

--
Harsh J


Re: pre-historic record IO stuff, is this used anywhere?

2013-02-08 Thread Harsh J
Hadoop streaming is also tied to recordio as it is today:
https://issues.apache.org/jira/browse/MAPREDUCE-3303, but it can be
removed per Klaas.

On Sat, Feb 9, 2013 at 6:48 AM, Alejandro Abdelnur  wrote:
> This seems to be used only in tests in common and in a standalone class in
> streaming tests.
>
> What is the purpose of these classes as they don't seem to be used in the
> any of the source that ends up in Hadoop?
>
> hadoop-common-project/hadoop-common/src/test/ddl/buffer.jr
> hadoop-common-project/hadoop-common/src/test/ddl/int.jr
> hadoop-common-project/hadoop-common/src/test/ddl/string.jr
> hadoop-common-project/hadoop-common/src/test/ddl/test.jr
> hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/FromCpp.java
> hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/RecordBench.java
> hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/TestBuffer.java
> hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/TestRecordIO.java
> hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/TestRecordVersioning.java
> hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/ToCpp.java
> hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/typedbytes/TestIO.java
>
>
> I've deleted the above classes, cleaned up the common POM (not to compile
> the JR files) and everything compiles fine.
>
> To me all this is dead code, if so, can we nuke them?
>
> Thx
>
> --
> Alejandro



--
Harsh J


Re: Help to setup latest Hadoop source code on Eclipse

2013-01-31 Thread Harsh J
Hi,

You should be able to do this (assuming your dev env. has all
side-deps like protoc 2.4.x+, etc. installed):

$ mvn clean install -DskipTests
$ mvn eclipse:eclipse

Then import all the necessary projects in from Eclipse.

On Fri, Feb 1, 2013 at 5:44 AM, Karthiek C  wrote:
> I am trying to get the latest hadoop source code running in Eclipse but
> facing lot of build errors due to maven plugin dependencies. All the
> documentations I found online were for very old versions of hadoop. Is
> there any other documentation used by developer community to get the code
> up and running? Can someone please help me out?
>
> PS: I am working on a project where we want to develop a custom scheduler
> for hadoop.
>
> Thanks,
> Karthiek



-- 
Harsh J


[jira] [Reopened] (HADOOP-9241) DU refresh interval is not configurable

2013-01-29 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HADOOP-9241:
-


Thanks Nicholas; I have reverted HADOOP-9241 from trunk and branch-2. I will 
attach a proper patch now.

> DU refresh interval is not configurable
> ---
>
> Key: HADOOP-9241
> URL: https://issues.apache.org/jira/browse/HADOOP-9241
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.0.2-alpha
>    Reporter: Harsh J
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 2.0.3-alpha
>
> Attachments: HADOOP-9241.patch
>
>
> While the {{DF}} class's refresh interval is configurable, the {{DU}}'s 
> isn't. We should ensure both be configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9257) HADOOP-9241 changed DN's default DU interval to 1m instead of 10m accidentally

2013-01-28 Thread Harsh J (JIRA)
Harsh J created HADOOP-9257:
---

 Summary: HADOOP-9241 changed DN's default DU interval to 1m 
instead of 10m accidentally
 Key: HADOOP-9257
 URL: https://issues.apache.org/jira/browse/HADOOP-9257
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Assignee: Harsh J


Suresh caught this on HADOOP-9241:

{quote}
Even for trivial jiras, I suggest getting the code review done before 
committing the code. Such changes are easy and quick to review.
In this patch, did DU interval become 1 minute instead of 10 minutes?
{code}
-this(path, 60L);
-//10 minutes default refresh interval
+this(path, conf.getLong(CommonConfigurationKeys.FS_DU_INTERVAL_KEY,
+CommonConfigurationKeys.FS_DU_INTERVAL_DEFAULT));


+  /** See core-default.xml */
+  public static final String  FS_DU_INTERVAL_KEY = "fs.du.interval";
+  /** Default value for FS_DU_INTERVAL_KEY */
+  public static final longFS_DU_INTERVAL_DEFAULT = 6;
{code}
{quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9243) Some improvements to the mailing lists webpage for lowering unrelated content rate

2013-01-24 Thread Harsh J (JIRA)
Harsh J created HADOOP-9243:
---

 Summary: Some improvements to the mailing lists webpage for 
lowering unrelated content rate
 Key: HADOOP-9243
 URL: https://issues.apache.org/jira/browse/HADOOP-9243
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Reporter: Harsh J
Priority: Minor


>From Steve on HADOOP-9329:

{quote}
* could you add a bit of text to say "user@" is not the place to discuss 
installation problems related to any third party products that install some 
variant of Hadoop on people's desktops and servers. You're the one who ends up 
having to bounce off all the CDH-related queries -it would help you too.
* For the new "Invalid JIRA" link to paste into JIRA issues about this, I point 
to the distributions and Commercial support page on the wiki -something similar 
on the mailing lists page would avoid having to put any specific vendor links 
into the mailing lists page, and support a higher/more open update process. See 
http://wiki.apache.org/hadoop/InvalidJiraIssues
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9241) DU refresh interval is not configurable

2013-01-24 Thread Harsh J (JIRA)
Harsh J created HADOOP-9241:
---

 Summary: DU refresh interval is not configurable
 Key: HADOOP-9241
 URL: https://issues.apache.org/jira/browse/HADOOP-9241
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Priority: Trivial


While the {{DF}} class's refresh interval is configurable, the {{DU}}'s isn't. 
We should ensure both be configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9239) Move the general@ description to the end of lists in the mailing lists web page

2013-01-23 Thread Harsh J (JIRA)
Harsh J created HADOOP-9239:
---

 Summary: Move the general@ description to the end of lists in the 
mailing lists web page
 Key: HADOOP-9239
 URL: https://issues.apache.org/jira/browse/HADOOP-9239
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Reporter: Harsh J
Priority: Minor


We have users unnecessarily subscribing to and abusing the general@ list mainly 
cause of its presence as the first option in the page 
http://hadoop.apache.org/mailing_lists.html, and secondarily cause of its name.

This is to at least address the first one that is causing growing pain to its 
subscribers. Lets move it to the bottom of the presented list of lists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Hadoop datajoin package

2013-01-15 Thread Harsh J
Ah, my bad. The two appear to be different things. I am not aware of any
work being done for that datajoin tool package; also not sure if its really
used out there.


On Tue, Jan 15, 2013 at 4:51 PM, Hemanth Yamijala wrote:

> Thanks, Harsh.
>
> Where does this fit in then ?
>
>
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-datajoin/src/main/java/org/apache/hadoop/contrib/utils/join/
>
> Is it to be deprecated and removed ?
>
> Thanks
> Hemanth
>
>
> On Mon, Jan 14, 2013 at 8:08 PM, Harsh J  wrote:
>
> > Already done and available in trunk and 2.x releases today:
> >
> >
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/join/
> >
> >
> > On Mon, Jan 14, 2013 at 7:44 PM, Hemanth Yamijala  > >wrote:
> >
> > > On the user list, there was a question about the Hadoop datajoin
> package.
> > > Specifically, its dependency on the old API.
> > >
> > > Is this package still in use ? Should we file a JIRA to migrate it to
> the
> > > new API ?
> > >
> > > Thanks
> > > hemanth
> > >
> >
> >
> >
> > --
> > Harsh J
> >
>



-- 
Harsh J


Re: Hadoop datajoin package

2013-01-14 Thread Harsh J
Already done and available in trunk and 2.x releases today:
http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/join/


On Mon, Jan 14, 2013 at 7:44 PM, Hemanth Yamijala wrote:

> On the user list, there was a question about the Hadoop datajoin package.
> Specifically, its dependency on the old API.
>
> Is this package still in use ? Should we file a JIRA to migrate it to the
> new API ?
>
> Thanks
> hemanth
>



-- 
Harsh J


[jira] [Resolved] (HADOOP-8274) In pseudo or cluster model under Cygwin, tasktracker can not create a new job because of symlink problem.

2013-01-14 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8274.
-

Resolution: Won't Fix

For Windows, since the mainstream branch does not support it actively, am 
closing this as a Won't Fix.

I'm certain the same issue does not happen on the branch-1-win 1.x branch (or 
the branch-trunk-win branch), and I urge you to use that instead if you wish to 
continue using Windows for development or other usage. Find the 
Windows-optimized sources at 
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1-win/ or 
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-trunk-win/.

> In pseudo or cluster model under Cygwin, tasktracker can not create a new job 
> because of symlink problem.
> -
>
> Key: HADOOP-8274
> URL: https://issues.apache.org/jira/browse/HADOOP-8274
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 0.20.205.0, 1.0.0, 1.0.1, 0.22.0
> Environment: windows7+cygwin 1.7.11-1+jdk1.6.0_31+hadoop 1.0.0
>Reporter: tim.wu
>
> The standalone model is ok. But, in pseudo or cluster model, it always throw 
> errors, even I just run wordcount example.
> The HDFS works fine, but tasktracker can not create threads(jvm) for new job. 
>  It is empty under /logs/userlogs/job-/attempt-/.
> The reason looks like that in windows, Java can not recognize a symlink of 
> folder as a folder. 
> The detail description is as following,
> ==
> First, the error log of tasktracker is like:
> ==
> 12/03/28 14:35:13 INFO mapred.JvmManager: In JvmRunner constructed JVM ID: 
> jvm_201203280212_0005_m_-1386636958
> 12/03/28 14:35:13 INFO mapred.JvmManager: JVM Runner 
> jvm_201203280212_0005_m_-1386636958 spawned.
> 12/03/28 14:35:17 INFO mapred.JvmManager: JVM Not killed 
> jvm_201203280212_0005_m_-1386636958 but just removed
> 12/03/28 14:35:17 INFO mapred.JvmManager: JVM : 
> jvm_201203280212_0005_m_-1386636958 exited with exit code -1. Number of tasks 
> it ran: 0
> 12/03/28 14:35:17 WARN mapred.TaskRunner: 
> attempt_201203280212_0005_m_02_0 : Child Error
> java.io.IOException: Task process exit with nonzero status of -1.
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
> 12/03/28 14:35:21 INFO mapred.TaskTracker: addFreeSlot : current free slots : 
> 2
> 12/03/28 14:35:24 INFO mapred.TaskTracker: LaunchTaskAction (registerTask): 
> attempt_201203280212_0005_m_02_1 task's state:UNASSIGNED
> 12/03/28 14:35:24 INFO mapred.TaskTracker: Trying to launch : 
> attempt_201203280212_0005_m_02_1 which needs 1 slots
> 12/03/28 14:35:24 INFO mapred.TaskTracker: In TaskLauncher, current free 
> slots : 2 and trying to launch attempt_201203280212_0005_m_02_1 which 
> needs 1 slots
> 12/03/28 14:35:24 WARN mapred.TaskLog: Failed to retrieve stdout log for 
> task: attempt_201203280212_0005_m_02_0
> java.io.FileNotFoundException: 
> D:\cygwin\home\timwu\hadoop-1.0.0\logs\userlogs\job_201203280212_0005\attempt_201203280212_0005_m_02_0\log.index
>  (The system cannot find the path specified)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.(FileInputStream.java:120)
> at 
> org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102)
> at 
> org.apache.hadoop.mapred.TaskLog.getAllLogsFileDetails(TaskLog.java:188)
> at org.apache.hadoop.mapred.TaskLog$Reader.(TaskLog.java:423)
> at 
> org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81)
> at 
> org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.

[jira] [Resolved] (HADOOP-8845) When looking for parent paths info, globStatus must filter out non-directory elements to avoid an AccessControlException

2012-12-12 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8845.
-

Resolution: Duplicate

The testing at HADOOP-8906 is sufficient. Resolving as dupe of HADOOP-8906 as 
it does address this.

> When looking for parent paths info, globStatus must filter out non-directory 
> elements to avoid an AccessControlException
> 
>
> Key: HADOOP-8845
> URL: https://issues.apache.org/jira/browse/HADOOP-8845
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Assignee: Harsh J
>  Labels: glob
> Attachments: HADOOP-8845.patch, HADOOP-8845.patch, HADOOP-8845.patch
>
>
> A brief description from my colleague Stephen Fritz who helped discover it:
> {code}
> [root@node1 ~]# su - hdfs
> -bash-4.1$ echo "My Test String">testfile <-- just a text file, for testing 
> below
> -bash-4.1$ hadoop dfs -mkdir /tmp/testdir <-- create a directory
> -bash-4.1$ hadoop dfs -mkdir /tmp/testdir/1 <-- create a subdirectory
> -bash-4.1$ hadoop dfs -put testfile /tmp/testdir/1/testfile <-- put the test 
> file in the subdirectory
> -bash-4.1$ hadoop dfs -put testfile /tmp/testdir/testfile <-- put the test 
> file in the directory
> -bash-4.1$ hadoop dfs -lsr /tmp/testdir
> drwxr-xr-x   - hdfs hadoop  0 2012-09-25 06:52 /tmp/testdir/1
> -rw-r--r--   3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/1/testfile
> -rw-r--r--   3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/testfile
> All files are where we expect them...OK, let's try reading
> -bash-4.1$ hadoop dfs -cat /tmp/testdir/testfile
> My Test String <-- success!
> -bash-4.1$ hadoop dfs -cat /tmp/testdir/1/testfile
> My Test String <-- success!
> -bash-4.1$ hadoop dfs -cat /tmp/testdir/*/testfile
> My Test String <-- success!  
> Note that we used an '*' in the cat command, and it correctly found the 
> subdirectory '/tmp/testdir/1', and ignore the regular file 
> '/tmp/testdir/testfile'
> -bash-4.1$ exit
> logout
> [root@node1 ~]# su - testuser <-- lets try it as a different user:
> [testuser@node1 ~]$ hadoop dfs -lsr /tmp/testdir
> drwxr-xr-x   - hdfs hadoop  0 2012-09-25 06:52 /tmp/testdir/1
> -rw-r--r--   3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/1/testfile
> -rw-r--r--   3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/testfile
> [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/testfile
> My Test String <-- good
> [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/1/testfile
> My Test String <-- so far so good
> [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/*/testfile
> cat: org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=testuser, access=EXECUTE, 
> inode="/tmp/testdir/testfile":hdfs:hadoop:-rw-r--r--
> {code}
> Essentially, we hit a ACE with access=EXECUTE on file /tmp/testdir/testfile 
> cause we tried to access the /tmp/testdir/testfile/testfile as a path. This 
> shouldn't happen, as the testfile is a file and not a path parent to be 
> looked up upon.
> {code}
> 2012-09-25 07:24:27,406 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 8020, call getFileInfo(/tmp/testdir/testfile/testfile)
> {code}
> Surprisingly the superuser avoids hitting into the error, as a result of 
> bypassing permissions, but that can be looked up on another JIRA - if it is 
> fine to let it be like that or not.
> This JIRA targets a client-sided fix to not cause such /path/file/dir or 
> /path/file/file kinda lookups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-7386) Support concatenated bzip2 files

2012-12-10 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-7386.
-

Resolution: Duplicate

Thanks for confirming! Resolving as dupe.

> Support concatenated bzip2 files
> 
>
> Key: HADOOP-7386
> URL: https://issues.apache.org/jira/browse/HADOOP-7386
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Allen Wittenauer
>Assignee: Karthik Kambatla
>
> HADOOP-6835 added the framework and direct support for concatenated gzip 
> files.  We should do the same for bzip files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: SPEC files?

2012-12-04 Thread Harsh J
The right branch is branch-0.3 for Bigtop. You can get more
information upstream at Apache Bigtop itself
(http://bigtop.apache.org).

Branch 0.3 of the same URL Steve posted:
https://github.com/apache/bigtop/tree/branch-0.3/bigtop-packages/src/rpm/hadoop

On Tue, Dec 4, 2012 at 11:17 PM, Steve Loughran  wrote:
> The RPMs are being built with bigtop;
>
> grab it from here
> https://github.com/apache/bigtop/tree/master/bigtop-packages/src/rpm/hadoop
>
> I'm not sure which branch to use for hadoop-1.1.1; let me check that
>
> On 4 December 2012 17:24, Michael Johnson  wrote:
>
>> Hello All,
>>
>> I've browsed the common-dev list (the last six months of it anyway) and
>> haven't seen this request. So here it goes: Does anyone have an SRPM/SPEC
>> file for building the Hadoop 1.1.1 binaries? I found an old 0.20.0 SPEC on
>> the internet, and before I attempt to create one I thought I'd ask here.
>> Any help would be greatly appreciated.
>>
>> Sincerely,
>> Michael Johnson
>> m...@michaelpjohnson.com
>>



-- 
Harsh J


Re: Do we support contatenated/splittable bzip2 files in branch-1?

2012-12-03 Thread Harsh J
Thanks Yu, will appreciate if you can post your observances over
https://issues.apache.org/jira/browse/HADOOP-7386.

On Mon, Dec 3, 2012 at 9:22 PM, Yu Li  wrote:
> Hi Harsh,
>
> Thanks a lot for the information!
>
> My fault not looking into HADOOP-4012 carefully, will try and veriry
> whether HADOOP-7823 has resolved the issue on both write and read side, and
> report back.
>
> On 3 December 2012 19:42, Harsh J  wrote:
>
>> Hi Yu Li,
>>
>> The JIRA HADOOP-7823 backported support for splitting Bzip2 files plus
>> MR support for it, into branch-1, and it is already available in the
>> 1.1.x releases out currently.
>>
>> Concatenated Bzip2 files, i.e., HADOOP-7386, is not implemented yet
>> (AFAIK), but Chris over HADOOP-6335 suggests that HADOOP-4012 may have
>> fixed it - so can you try and report back?
>>
>> On Mon, Dec 3, 2012 at 3:19 PM, Yu Li  wrote:
>> > Dear all,
>> >
>> > About splitting support for bzip2, I checked on the JIRA list and found
>> > HADOOP-7386 marked as "Won't fix"; I also found some work done in
>> > branch-0.21(also in trunk), say HADOOP-4012 and MAPREDUCE-830, but not
>> > integrated/migrated into branch-1, so I guess we don't support
>> contatenated
>> > bzip2 in branch-1, correct? If so, is there any special reason? Many
>> thanks!
>> >
>> > --
>> > Best Regards,
>> > Li Yu
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> Best Regards,
> Li Yu



-- 
Harsh J


Re: Do we support contatenated/splittable bzip2 files in branch-1?

2012-12-03 Thread Harsh J
Hi Yu Li,

The JIRA HADOOP-7823 backported support for splitting Bzip2 files plus
MR support for it, into branch-1, and it is already available in the
1.1.x releases out currently.

Concatenated Bzip2 files, i.e., HADOOP-7386, is not implemented yet
(AFAIK), but Chris over HADOOP-6335 suggests that HADOOP-4012 may have
fixed it - so can you try and report back?

On Mon, Dec 3, 2012 at 3:19 PM, Yu Li  wrote:
> Dear all,
>
> About splitting support for bzip2, I checked on the JIRA list and found
> HADOOP-7386 marked as "Won't fix"; I also found some work done in
> branch-0.21(also in trunk), say HADOOP-4012 and MAPREDUCE-830, but not
> integrated/migrated into branch-1, so I guess we don't support contatenated
> bzip2 in branch-1, correct? If so, is there any special reason? Many thanks!
>
> --
> Best Regards,
> Li Yu



-- 
Harsh J


[jira] [Reopened] (HADOOP-7386) Support concatenated bzip2 files

2012-12-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HADOOP-7386:
-


Allen,

You'd closed this out without a reason as "Won't Fix", so am reopening it. If 
there was a reason for the Won't Fix, please provide, thanks!

> Support concatenated bzip2 files
> 
>
> Key: HADOOP-7386
> URL: https://issues.apache.org/jira/browse/HADOOP-7386
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Allen Wittenauer
>
> HADOOP-6835 added the framework and direct support for concatenated gzip 
> files.  We should do the same for bzip files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   5   >