[jira] [Commented] (HADOOP-16543) Cached DNS name resolution error

2019-09-04 Thread shanyu zhao (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922904#comment-16922904
 ] 

shanyu zhao commented on HADOOP-16543:
--

Hi [~ste...@apache.org], thanks for your suggestions. 

1) We've tried changing DNS TTL with no luck
2) The problem is due to Hadoop's RMProxy caches the InetSocketAddress, then 
retry connecting to the IP address.
{code:java}
InetSocketAddress rmAddress = rmProxy.getRMAddress(conf, protocol); {code}
The fix is to create these additional FailoverProxyProvider:

For Non-HA senario:

- DefaultNoHaRMFailoverProxyProvider (without doing DNS resolution)
- AutoRefreshNoHaRMFailoverProxyProvider (do DNS resolution during retries)

For HA scenario:

- ConfiguredRMFailoverProxyProvider (without doing DNS resolution)
- AutoRefreshRMFailoverProxyProvider (do DNS resolution during retries in HA 
scenario)

And add this configuration to cover non-ha mode config (in addition to 
yarn.client.failover-proxy-provider):

yarn.client.failover-no-ha-proxy-provider

 

 

> Cached DNS name resolution error
> 
>
> Key: HADOOP-16543
> URL: https://issues.apache.org/jira/browse/HADOOP-16543
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Roger Liu
>Priority: Major
>
> In Kubernetes, the a node may go down and then come back later with a 
> different IP address. Yarn clients which are already running will be unable 
> to rediscover the node after it comes back up due to caching the original IP 
> address. This is problematic for cases such as Spark HA on Kubernetes, as the 
> node containing the resource manager may go down and come back up, meaning 
> existing node managers must then also be restarted.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-27 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416283#comment-16416283
 ] 

shanyu zhao commented on HADOOP-15320:
--

I also run the following manual tests successfully:

1) Hive TPCH test with my change on WASB and it passed with correct number of 
splits.

2) Spark application to convert a huge CSV file to parquet.

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 2.7.3, 2.9.0, 3.0.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Attachments: HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-19 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405682#comment-16405682
 ] 

shanyu zhao commented on HADOOP-15320:
--

I've run a few Spark jobs on very large input file (hundreds of TB) and the 
getSplits() on this file took a few seconds, vs. 1.5 hours without the change.

I'm in the middle of running hive tpch tests.

Anything else we should run?

As [~chris.douglas] mentioned, since S3A is running file, we should be good to 
go for this patch.

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 2.7.3, 2.9.0, 3.0.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Attachments: HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-16 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-15320:
-
Attachment: HADOOP-15320.patch

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 2.7.3, 2.9.0, 3.0.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Attachments: HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-16 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-15320:


 Summary: Remove customized getFileBlockLocations for hadoop-azure 
and hadoop-azure-datalake
 Key: HADOOP-15320
 URL: https://issues.apache.org/jira/browse/HADOOP-15320
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/adl, fs/azure
Affects Versions: 3.0.0, 2.9.0, 2.7.3
Reporter: shanyu zhao
Assignee: shanyu zhao


hadoop-azure and hadoop-azure-datalake have its own implementation of 
getFileBlockLocations(), which faked a list of artificial blocks based on the 
hard-coded block size. And each block has one host with name "localhost". Take 
a look at this code:

[https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]

This is a unnecessary mock up for a "remote" file system to mimic HDFS. And the 
problem with this mock is that for large (~TB) files we generates lots of 
artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
splits based on these blocks.

We can safely remove this customized getFileBlockLocations() implementation, 
fall back to the default FileSystem.getFileBlockLocations() implementation, 
which is to return 1 block for any file with 1 host "localhost". Note that this 
doesn't mean we will create much less splits, because the number of splits is 
still limited by the blockSize in FileInputFormat.computeSplitSize():
{code:java}
return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2018-02-02 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-13345:
-
Comment: was deleted

(was: Part of this patch added "copy-dependencies" execution in 
hadoop-aws/pom.xml without scope. This caused all test jars copied to the lib 
folder as well. 
  
  package
  
  copy-dependencies
  
  
  ${project.build.directory}/lib
  
  
 
Should we limit the scope of the copy to runtime? e.g. add following to 
 section:
runtime)

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2018-02-01 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349673#comment-16349673
 ] 

shanyu zhao commented on HADOOP-13345:
--

Part of this patch added "copy-dependencies" execution in hadoop-aws/pom.xml 
without scope. This caused all test jars copied to the lib folder as well. 
  
  package
  
  copy-dependencies
  
  
  ${project.build.directory}/lib
  
  
 
Should we limit the scope of the copy to runtime? e.g. add following to 
 section:
runtime

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-11629) WASB filesystem should not start BandwidthGaugeUpdater if fs.azure.skip.metrics set to true

2015-02-24 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-11629:


 Summary: WASB filesystem should not start BandwidthGaugeUpdater if 
fs.azure.skip.metrics set to true
 Key: HADOOP-11629
 URL: https://issues.apache.org/jira/browse/HADOOP-11629
 Project: Hadoop Common
  Issue Type: Bug
  Components: tools
Affects Versions: 2.6.1
Reporter: shanyu zhao
Assignee: shanyu zhao


In Hadoop-11248 we added configuration fs.azure.skip.metrics. If set to true, 
we do not register Azure FileSystem metrics with the metrics system. However, 
BandwidthGaugeUpdater object is still created in AzureNativeFileSystemStore, 
resulting in unnecessary threads being spawned.

Under heavy load the system could be busy dealing with these threads and GC has 
to work on removing the thread objects. E.g. When multiple WebHCat clients 
submitting jobs to WebHCat server, we observed that the WebHCat server spawns 
~400 daemon threads, which slows down the server and sometimes cause timeout.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11629) WASB filesystem should not start BandwidthGaugeUpdater if fs.azure.skip.metrics set to true

2015-02-24 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-11629:
-
Attachment: HADOOP-11629.patch

patch attached.

 WASB filesystem should not start BandwidthGaugeUpdater if 
 fs.azure.skip.metrics set to true
 ---

 Key: HADOOP-11629
 URL: https://issues.apache.org/jira/browse/HADOOP-11629
 Project: Hadoop Common
  Issue Type: Bug
  Components: tools
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-11629.patch


 In Hadoop-11248 we added configuration fs.azure.skip.metrics. If set to 
 true, we do not register Azure FileSystem metrics with the metrics system. 
 However, BandwidthGaugeUpdater object is still created in 
 AzureNativeFileSystemStore, resulting in unnecessary threads being spawned.
 Under heavy load the system could be busy dealing with these threads and GC 
 has to work on removing the thread objects. E.g. When multiple WebHCat 
 clients submitting jobs to WebHCat server, we observed that the WebHCat 
 server spawns ~400 daemon threads, which slows down the server and sometimes 
 cause timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11629) WASB filesystem should not start BandwidthGaugeUpdater if fs.azure.skip.metrics set to true

2015-02-24 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-11629:
-
Attachment: HADOOP-11629.1.patch

 WASB filesystem should not start BandwidthGaugeUpdater if 
 fs.azure.skip.metrics set to true
 ---

 Key: HADOOP-11629
 URL: https://issues.apache.org/jira/browse/HADOOP-11629
 Project: Hadoop Common
  Issue Type: Bug
  Components: tools
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-11629.1.patch, HADOOP-11629.patch


 In Hadoop-11248 we added configuration fs.azure.skip.metrics. If set to 
 true, we do not register Azure FileSystem metrics with the metrics system. 
 However, BandwidthGaugeUpdater object is still created in 
 AzureNativeFileSystemStore, resulting in unnecessary threads being spawned.
 Under heavy load the system could be busy dealing with these threads and GC 
 has to work on removing the thread objects. E.g. When multiple WebHCat 
 clients submitting jobs to WebHCat server, we observed that the WebHCat 
 server spawns ~400 daemon threads, which slows down the server and sometimes 
 cause timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11629) WASB filesystem should not start BandwidthGaugeUpdater if fs.azure.skip.metrics set to true

2015-02-24 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-11629:
-
Attachment: HADOOP-11629.1.patch

Thanks [~cnauroth]! New patch attached.

 WASB filesystem should not start BandwidthGaugeUpdater if 
 fs.azure.skip.metrics set to true
 ---

 Key: HADOOP-11629
 URL: https://issues.apache.org/jira/browse/HADOOP-11629
 Project: Hadoop Common
  Issue Type: Bug
  Components: tools
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-11629.1.patch, HADOOP-11629.patch


 In Hadoop-11248 we added configuration fs.azure.skip.metrics. If set to 
 true, we do not register Azure FileSystem metrics with the metrics system. 
 However, BandwidthGaugeUpdater object is still created in 
 AzureNativeFileSystemStore, resulting in unnecessary threads being spawned.
 Under heavy load the system could be busy dealing with these threads and GC 
 has to work on removing the thread objects. E.g. When multiple WebHCat 
 clients submitting jobs to WebHCat server, we observed that the WebHCat 
 server spawns ~400 daemon threads, which slows down the server and sometimes 
 cause timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11629) WASB filesystem should not start BandwidthGaugeUpdater if fs.azure.skip.metrics set to true

2015-02-24 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-11629:
-
Attachment: (was: HADOOP-11629.1.patch)

 WASB filesystem should not start BandwidthGaugeUpdater if 
 fs.azure.skip.metrics set to true
 ---

 Key: HADOOP-11629
 URL: https://issues.apache.org/jira/browse/HADOOP-11629
 Project: Hadoop Common
  Issue Type: Bug
  Components: tools
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-11629.patch


 In Hadoop-11248 we added configuration fs.azure.skip.metrics. If set to 
 true, we do not register Azure FileSystem metrics with the metrics system. 
 However, BandwidthGaugeUpdater object is still created in 
 AzureNativeFileSystemStore, resulting in unnecessary threads being spawned.
 Under heavy load the system could be busy dealing with these threads and GC 
 has to work on removing the thread objects. E.g. When multiple WebHCat 
 clients submitting jobs to WebHCat server, we observed that the WebHCat 
 server spawns ~400 daemon threads, which slows down the server and sometimes 
 cause timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11248) Add hadoop configuration to disable Azure Filesystem metrics collection

2014-10-30 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-11248:
-
Attachment: HADOOP-11248.1.patch

[~cnauroth] Please see the updated patch that includes the test case. Thx!

 Add hadoop configuration to disable Azure Filesystem metrics collection
 ---

 Key: HADOOP-11248
 URL: https://issues.apache.org/jira/browse/HADOOP-11248
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.4.1, 2.5.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-11248.1.patch, HADOOP-11248.patch


 Today whenever Azure filesystem is used, metrics collection is enabled using 
 class AzureFileSystemMetricsSystem. Metrics being collected includes bytes 
 transferred and throughput.
 In some situation, we do not want to collect metrics for Azure file system. 
 E.g. for WebHCat server. We need to introduce a new configuration 
 fs.azure.skip.metrics to disable metrics collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-10840) Fix OutOfMemoryError caused by metrics system in Azure File System

2014-10-29 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10840:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Fix OutOfMemoryError caused by metrics system in Azure File System
 --

 Key: HADOOP-10840
 URL: https://issues.apache.org/jira/browse/HADOOP-10840
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Fix For: 3.0.0

 Attachments: HADOOP-10840.1.patch, HADOOP-10840.2.patch, 
 HADOOP-10840.patch


 In Hadoop 2.x the Hadoop File System framework changed and no cache is 
 implemented (refer to HADOOP-6356). This means for every WASB access, a new 
 NativeAzureFileSystem is created, along which a Metrics source created and 
 added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
 and causing Java OutOfMemoryError.
 The fix is to utilize the unregisterSource() method added to MetricsSystem in 
 HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11248) Add hadoop configuration to disable Azure Filesystem metrics collection

2014-10-29 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-11248:


 Summary: Add hadoop configuration to disable Azure Filesystem 
metrics collection
 Key: HADOOP-11248
 URL: https://issues.apache.org/jira/browse/HADOOP-11248
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.5.1, 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao


Today whenever Azure filesystem is used, metrics collection is enabled using 
class AzureFileSystemMetricsSystem. Metrics being collected includes bytes 
transferred and throughput.

In some situation, we do not want to collect metrics for Azure file system. 
E.g. for WebHCat server. We need to introduce a new configuration 
fs.azure.skip.metrics to disable metrics collection.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11248) Add hadoop configuration to disable Azure Filesystem metrics collection

2014-10-29 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-11248:
-
Attachment: HADOOP-11248.patch

patch attached.

 Add hadoop configuration to disable Azure Filesystem metrics collection
 ---

 Key: HADOOP-11248
 URL: https://issues.apache.org/jira/browse/HADOOP-11248
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.4.1, 2.5.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-11248.patch


 Today whenever Azure filesystem is used, metrics collection is enabled using 
 class AzureFileSystemMetricsSystem. Metrics being collected includes bytes 
 transferred and throughput.
 In some situation, we do not want to collect metrics for Azure file system. 
 E.g. for WebHCat server. We need to introduce a new configuration 
 fs.azure.skip.metrics to disable metrics collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-10840) Fix OutOfMemoryError caused by metrics system in Azure File System

2014-07-17 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064662#comment-14064662
 ] 

shanyu zhao commented on HADOOP-10840:
--

Thanks [~cnauroth]!

 Fix OutOfMemoryError caused by metrics system in Azure File System
 --

 Key: HADOOP-10840
 URL: https://issues.apache.org/jira/browse/HADOOP-10840
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Fix For: 3.0.0

 Attachments: HADOOP-10840.1.patch, HADOOP-10840.2.patch, 
 HADOOP-10840.patch


 In Hadoop 2.x the Hadoop File System framework changed and no cache is 
 implemented (refer to HADOOP-6356). This means for every WASB access, a new 
 NativeAzureFileSystem is created, along which a Metrics source created and 
 added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
 and causing Java OutOfMemoryError.
 The fix is to utilize the unregisterSource() method added to MetricsSystem in 
 HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10840) Fix OutOfMemoryError caused by metrics system in Azure File System

2014-07-16 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10840:
-

Attachment: HADOOP-10840.1.patch

[~cnauroth] thanks for the findings. I couldn't reproduce the specific failure 
you posted, but I think it is caused by NativeAzureFileSystem.close() being 
called multiple times. I verified that NativeAzureFileSystemStore.close() can 
be called multiple times, but not NativeAzureFileSystem. And for this class, 
since the metrics system keeps a reference counting, we cannot call its close() 
multiple times, so I introduced Boolean isClosed to prevent it from being 
called multiple times. I added a new test case to call 
NativeAzureFileSystem.close() twice to verify this scenario.

New patch attached.

 Fix OutOfMemoryError caused by metrics system in Azure File System
 --

 Key: HADOOP-10840
 URL: https://issues.apache.org/jira/browse/HADOOP-10840
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10840.1.patch, HADOOP-10840.patch


 In Hadoop 2.x the Hadoop File System framework changed and no cache is 
 implemented (refer to HADOOP-6356). This means for every WASB access, a new 
 NativeAzureFileSystem is created, along which a Metrics source created and 
 added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
 and causing Java OutOfMemoryError.
 The fix is to utilize the unregisterSource() method added to MetricsSystem in 
 HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10840) Fix OutOfMemoryError caused by metrics system in Azure File System

2014-07-16 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10840:
-

Attachment: (was: HADOOP-10840.1.patch)

 Fix OutOfMemoryError caused by metrics system in Azure File System
 --

 Key: HADOOP-10840
 URL: https://issues.apache.org/jira/browse/HADOOP-10840
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10840.1.patch, HADOOP-10840.patch


 In Hadoop 2.x the Hadoop File System framework changed and no cache is 
 implemented (refer to HADOOP-6356). This means for every WASB access, a new 
 NativeAzureFileSystem is created, along which a Metrics source created and 
 added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
 and causing Java OutOfMemoryError.
 The fix is to utilize the unregisterSource() method added to MetricsSystem in 
 HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10840) Fix OutOfMemoryError caused by metrics system in Azure File System

2014-07-16 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10840:
-

Attachment: HADOOP-10840.1.patch

 Fix OutOfMemoryError caused by metrics system in Azure File System
 --

 Key: HADOOP-10840
 URL: https://issues.apache.org/jira/browse/HADOOP-10840
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10840.1.patch, HADOOP-10840.patch


 In Hadoop 2.x the Hadoop File System framework changed and no cache is 
 implemented (refer to HADOOP-6356). This means for every WASB access, a new 
 NativeAzureFileSystem is created, along which a Metrics source created and 
 added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
 and causing Java OutOfMemoryError.
 The fix is to utilize the unregisterSource() method added to MetricsSystem in 
 HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10840) Fix OutOfMemoryError caused by metrics system in Azure File System

2014-07-16 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10840:
-

Attachment: HADOOP-10840.2.patch

Hi [~cnauroth], sorry I missed the live test scenarios which I have to manually 
configure a storage account. The problem with TestAzureConcurrentOutOfBandIo is 
I removed the registration code with DefaultFileSystem in production code but 
not the unit test files. I fixed that in the new patch (v2).

Regarding the two windows test failure, looks like they are not related to my 
changes. It has something to do with system clock. Do they always fail or just 
transient? I cannot reproduce in my Ubuntu box. Let me try it on Windows box.

I removed the publishMetricsNow() from fileSystemClosed() because I moved it to 
unregisterSource(). The expectation is the caller calls unregisterSource() 
before calling fileSystemClosed() which will push the last metrics out. To 
prevent confusion when someone doesn't call unregisterSource() but just call 
fileSystemClosed() I just added publishMetricsNow() to fileSystemClosed() under 
the condition that numFileSystems==1, this way before we are going to 
shutdown the metrics system, we'll do another safety push, usually at this 
point the metrics sources should be empty.



 Fix OutOfMemoryError caused by metrics system in Azure File System
 --

 Key: HADOOP-10840
 URL: https://issues.apache.org/jira/browse/HADOOP-10840
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10840.1.patch, HADOOP-10840.2.patch, 
 HADOOP-10840.patch


 In Hadoop 2.x the Hadoop File System framework changed and no cache is 
 implemented (refer to HADOOP-6356). This means for every WASB access, a new 
 NativeAzureFileSystem is created, along which a Metrics source created and 
 added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
 and causing Java OutOfMemoryError.
 The fix is to utilize the unregisterSource() method added to MetricsSystem in 
 HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10840) Fix OutOfMemoryError caused by metrics system in Azure File System

2014-07-16 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10840:
-

Attachment: HADOOP-10840.2.patch

 Fix OutOfMemoryError caused by metrics system in Azure File System
 --

 Key: HADOOP-10840
 URL: https://issues.apache.org/jira/browse/HADOOP-10840
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10840.1.patch, HADOOP-10840.2.patch, 
 HADOOP-10840.patch


 In Hadoop 2.x the Hadoop File System framework changed and no cache is 
 implemented (refer to HADOOP-6356). This means for every WASB access, a new 
 NativeAzureFileSystem is created, along which a Metrics source created and 
 added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
 and causing Java OutOfMemoryError.
 The fix is to utilize the unregisterSource() method added to MetricsSystem in 
 HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10840) Fix OutOfMemoryError caused by metrics system in Azure File System

2014-07-16 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10840:
-

Attachment: (was: HADOOP-10840.2.patch)

 Fix OutOfMemoryError caused by metrics system in Azure File System
 --

 Key: HADOOP-10840
 URL: https://issues.apache.org/jira/browse/HADOOP-10840
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10840.1.patch, HADOOP-10840.2.patch, 
 HADOOP-10840.patch


 In Hadoop 2.x the Hadoop File System framework changed and no cache is 
 implemented (refer to HADOOP-6356). This means for every WASB access, a new 
 NativeAzureFileSystem is created, along which a Metrics source created and 
 added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
 and causing Java OutOfMemoryError.
 The fix is to utilize the unregisterSource() method added to MetricsSystem in 
 HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10839) Add unregisterSource() to MetricsSystem API

2014-07-15 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-10839:


 Summary: Add unregisterSource() to MetricsSystem API
 Key: HADOOP-10839
 URL: https://issues.apache.org/jira/browse/HADOOP-10839
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao


Currently the MetrisSystem API has register() method to register a 
MetricsSource but doesn't have unregister() method. This means once a 
MetricsSource is registered with the MetricsSystem, it will be there forever 
until the MetricsSystem is shut down. This in some cases can cause Java 
OutOfMemoryError.

One such case is in file system metrics implementation. The new 
AbstractFileSystem/FileContext framework does not implement a cache so every 
file system access can lead to the creation of a NativeFileSystem instance. 
(refer to HADOOP-6356). And all these NativeFileSystem needs to share the same 
instance of MetricsSystemImpl, which means we cannot shut down MetricsSystem to 
clean up all the MetricsSources that has been registered but no longer active. 
Over time the MetricsSource instance accumulates and eventually we saw 
OutOfMemoryError.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10839) Add unregisterSource() to MetricsSystem API

2014-07-15 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10839:
-

Attachment: HADOOP-10839.patch

Patch attached.

[~cnauroth] Could you please take a look at this patch? Thx.

 Add unregisterSource() to MetricsSystem API
 ---

 Key: HADOOP-10839
 URL: https://issues.apache.org/jira/browse/HADOOP-10839
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10839.patch


 Currently the MetrisSystem API has register() method to register a 
 MetricsSource but doesn't have unregister() method. This means once a 
 MetricsSource is registered with the MetricsSystem, it will be there forever 
 until the MetricsSystem is shut down. This in some cases can cause Java 
 OutOfMemoryError.
 One such case is in file system metrics implementation. The new 
 AbstractFileSystem/FileContext framework does not implement a cache so every 
 file system access can lead to the creation of a NativeFileSystem instance. 
 (refer to HADOOP-6356). And all these NativeFileSystem needs to share the 
 same instance of MetricsSystemImpl, which means we cannot shut down 
 MetricsSystem to clean up all the MetricsSources that has been registered but 
 no longer active. Over time the MetricsSource instance accumulates and 
 eventually we saw OutOfMemoryError.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10840) Fix OutOfMemoryError caused by metrics system in Azure File System

2014-07-15 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-10840:


 Summary: Fix OutOfMemoryError caused by metrics system in Azure 
File System
 Key: HADOOP-10840
 URL: https://issues.apache.org/jira/browse/HADOOP-10840
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao


In Hadoop 2.x the Hadoop File System framework changed and no cache is 
implemented (refer to HADOOP-6356). This means for every WASB access, a new 
NativeAzureFileSystem is created, along which a Metrics source created and 
added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
and causing Java OutOfMemoryError.

The fix is to utilize the unregisterSource() method added to MetricsSystem in 
HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10840) Fix OutOfMemoryError caused by metrics system in Azure File System

2014-07-15 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062663#comment-14062663
 ] 

shanyu zhao commented on HADOOP-10840:
--

[~cnauroth] Could you please review this patch? Thx!

 Fix OutOfMemoryError caused by metrics system in Azure File System
 --

 Key: HADOOP-10840
 URL: https://issues.apache.org/jira/browse/HADOOP-10840
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10840.patch


 In Hadoop 2.x the Hadoop File System framework changed and no cache is 
 implemented (refer to HADOOP-6356). This means for every WASB access, a new 
 NativeAzureFileSystem is created, along which a Metrics source created and 
 added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
 and causing Java OutOfMemoryError.
 The fix is to utilize the unregisterSource() method added to MetricsSystem in 
 HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10840) Fix OutOfMemoryError caused by metrics system in Azure File System

2014-07-15 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10840:
-

Attachment: HADOOP-10840.patch

Patch attached. 

This patch added a finalize() method to NativeAzureFileSystem to force a call 
to close() which calls unregisterSoruce() on the metrics system impl. Note that 
close() can be called multiple times without negative impact so that in the 
future when we introduce close() to FileContext to let client manually call 
close() there shall be no changes in this area. I also removed registration 
code to DefaultMetricsSystem because we only need to register with one metrics 
system implementation which is AzureFileSystemMetricsSystem.



 Fix OutOfMemoryError caused by metrics system in Azure File System
 --

 Key: HADOOP-10840
 URL: https://issues.apache.org/jira/browse/HADOOP-10840
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10840.patch


 In Hadoop 2.x the Hadoop File System framework changed and no cache is 
 implemented (refer to HADOOP-6356). This means for every WASB access, a new 
 NativeAzureFileSystem is created, along which a Metrics source created and 
 added to MetricsSystemImpl. Over time the sources accumulated, eating memory 
 and causing Java OutOfMemoryError.
 The fix is to utilize the unregisterSource() method added to MetricsSystem in 
 HADOOP-10839.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10839) Add unregisterSource() to MetricsSystem API

2014-07-15 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062748#comment-14062748
 ] 

shanyu zhao commented on HADOOP-10839:
--

Hi [~cnauroth] Thanks for the review!

I pulled from trunk in github before I generate the patch so it should be 
against the current trunk...

Regarding the unusual indentation, I was just trying to use the same 
indentation in that file. If you look at other methods in the same file, it all 
has the way I wrote in the patch. Do you think I need to modify the indentation 
to what you proposed above? That's more normal way but would seem odd I in that 
source code file.

 Add unregisterSource() to MetricsSystem API
 ---

 Key: HADOOP-10839
 URL: https://issues.apache.org/jira/browse/HADOOP-10839
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.4.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10839.patch


 Currently the MetrisSystem API has register() method to register a 
 MetricsSource but doesn't have unregister() method. This means once a 
 MetricsSource is registered with the MetricsSystem, it will be there forever 
 until the MetricsSystem is shut down. This in some cases can cause Java 
 OutOfMemoryError.
 One such case is in file system metrics implementation. The new 
 AbstractFileSystem/FileContext framework does not implement a cache so every 
 file system access can lead to the creation of a NativeFileSystem instance. 
 (refer to HADOOP-6356). And all these NativeFileSystem needs to share the 
 same instance of MetricsSystemImpl, which means we cannot shut down 
 MetricsSystem to clean up all the MetricsSources that has been registered but 
 no longer active. Over time the MetricsSource instance accumulates and 
 eventually we saw OutOfMemoryError.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10245) Hadoop command line always appends -Xmx option twice

2014-01-21 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1384#comment-1384
 ] 

shanyu zhao commented on HADOOP-10245:
--

[~drankye], [~qwertymaniac] would you please help review this patch?

 Hadoop command line always appends -Xmx option twice
 --

 Key: HADOOP-10245
 URL: https://issues.apache.org/jira/browse/HADOOP-10245
 Project: Hadoop Common
  Issue Type: Bug
  Components: bin
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10245.patch


 The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
 -Xmx options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
 env variable will take no effect because it is overwritten by the second 
 -Xmx option.
 For example, here is the java cmd generated for command hadoop fs -ls /, 
 Notice that there are two -Xmx options: -Xmx1000m and -Xmx512m in the 
 command line:
 java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
 -Dhadoop.root.logger=INFO,c
 onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
 org.apache.hadoop.fs.FsShell -ls /
 Here is the root cause:
 The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
 hadoop-env.sh. 
 In hadoop.sh, the command line is generated by the following pseudo code:
 java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
 In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as -Xmx1000m if user 
 didn't set $HADOOP_HEAP_SIZE env variable.
 In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
 export HADOOP_CLIENT_OPTS=-Xmx512m $HADOOP_CLIENT_OPTS
 To fix this problem, we should remove the -Xmx512m from HADOOP_CLIENT_OPTS. 
 If we really want to change the memory settings we need to use 
 $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10245) Hadoop command line always appends -Xmx option twice

2014-01-21 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877905#comment-13877905
 ] 

shanyu zhao commented on HADOOP-10245:
--

[~ywskycn] Thank you for your comment! If we remove -Xmx512m from 
HADOOP_CLIENT_OPTS in hadoop_env.cmd, there will be one and only one -Xmx, 
which is the $JAVA_HEAP_MAX in bin/hadoop. 

HADOOP-9870 may have solved the problem for you, but I think the fix in 
HADOOP-9870 might be too complicated and hard to maintain. For example, what 
about user use -Xmx in HADOOP_OPTS instead of HADOOP_CLIENT_OPTS? I think we 
should avoid using HADOOP_CLIENT_OPTS or HADOOP_OPTS to specify memory, because 
the fact that we've defined HADOOP_HEAPSIZE but not using it for memory 
specification is confusing. If you want to change heap size, just change 
HADOOP_HEAPSIZE, I think this is simple and clear. Thoughts?

 Hadoop command line always appends -Xmx option twice
 --

 Key: HADOOP-10245
 URL: https://issues.apache.org/jira/browse/HADOOP-10245
 Project: Hadoop Common
  Issue Type: Bug
  Components: bin
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10245.patch


 The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
 -Xmx options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
 env variable will take no effect because it is overwritten by the second 
 -Xmx option.
 For example, here is the java cmd generated for command hadoop fs -ls /, 
 Notice that there are two -Xmx options: -Xmx1000m and -Xmx512m in the 
 command line:
 java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
 -Dhadoop.root.logger=INFO,c
 onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
 org.apache.hadoop.fs.FsShell -ls /
 Here is the root cause:
 The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
 hadoop-env.sh. 
 In hadoop.sh, the command line is generated by the following pseudo code:
 java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
 In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as -Xmx1000m if user 
 didn't set $HADOOP_HEAP_SIZE env variable.
 In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
 export HADOOP_CLIENT_OPTS=-Xmx512m $HADOOP_CLIENT_OPTS
 To fix this problem, we should remove the -Xmx512m from HADOOP_CLIENT_OPTS. 
 If we really want to change the memory settings we need to use 
 $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HADOOP-10245) Hadoop command line always appends -Xmx option twice

2014-01-20 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-10245:


 Summary: Hadoop command line always appends -Xmx option twice
 Key: HADOOP-10245
 URL: https://issues.apache.org/jira/browse/HADOOP-10245
 Project: Hadoop Common
  Issue Type: Bug
  Components: bin
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao


The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
-Xmx options twice. The impact is that any user defined HADOOP_HEAP_SIZE env 
variable will take no effect because it is overwritten by the second -Xmx 
option.

For example, here is the java cmd generated for command hadoop fs -ls /, 
Notice that there are two -Xmx options: -Xmx1000m and -Xmx512m in the 
command line:

java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
-Dhadoop.root.logger=INFO,c
onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
org.apache.hadoop.fs.FsShell -ls /

Here is the root cause:
The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
hadoop-env.sh. 
In hadoop.sh, the command line is generated by the following pseudo code:
java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...

In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as -Xmx1000m if user 
didn't set $HADOOP_HEAP_SIZE env variable.

In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
export HADOOP_CLIENT_OPTS=-Xmx512m $HADOOP_CLIENT_OPTS

To fix this problem, we should remove the -Xmx512m from HADOOP_CLIENT_OPTS. 
If we really want to change the memory settings we need to use 
$HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10245) Hadoop command line always appends -Xmx option twice

2014-01-20 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10245:
-

Attachment: HADOOP-10245.patch

 Hadoop command line always appends -Xmx option twice
 --

 Key: HADOOP-10245
 URL: https://issues.apache.org/jira/browse/HADOOP-10245
 Project: Hadoop Common
  Issue Type: Bug
  Components: bin
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10245.patch


 The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
 -Xmx options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
 env variable will take no effect because it is overwritten by the second 
 -Xmx option.
 For example, here is the java cmd generated for command hadoop fs -ls /, 
 Notice that there are two -Xmx options: -Xmx1000m and -Xmx512m in the 
 command line:
 java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
 -Dhadoop.root.logger=INFO,c
 onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
 org.apache.hadoop.fs.FsShell -ls /
 Here is the root cause:
 The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
 hadoop-env.sh. 
 In hadoop.sh, the command line is generated by the following pseudo code:
 java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
 In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as -Xmx1000m if user 
 didn't set $HADOOP_HEAP_SIZE env variable.
 In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
 export HADOOP_CLIENT_OPTS=-Xmx512m $HADOOP_CLIENT_OPTS
 To fix this problem, we should remove the -Xmx512m from HADOOP_CLIENT_OPTS. 
 If we really want to change the memory settings we need to use 
 $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10245) Hadoop command line always appends -Xmx option twice

2014-01-20 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10245:
-

Status: Patch Available  (was: Open)

 Hadoop command line always appends -Xmx option twice
 --

 Key: HADOOP-10245
 URL: https://issues.apache.org/jira/browse/HADOOP-10245
 Project: Hadoop Common
  Issue Type: Bug
  Components: bin
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10245.patch


 The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
 -Xmx options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
 env variable will take no effect because it is overwritten by the second 
 -Xmx option.
 For example, here is the java cmd generated for command hadoop fs -ls /, 
 Notice that there are two -Xmx options: -Xmx1000m and -Xmx512m in the 
 command line:
 java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
 -Dhadoop.root.logger=INFO,c
 onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
 org.apache.hadoop.fs.FsShell -ls /
 Here is the root cause:
 The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
 hadoop-env.sh. 
 In hadoop.sh, the command line is generated by the following pseudo code:
 java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
 In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as -Xmx1000m if user 
 didn't set $HADOOP_HEAP_SIZE env variable.
 In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
 export HADOOP_CLIENT_OPTS=-Xmx512m $HADOOP_CLIENT_OPTS
 To fix this problem, we should remove the -Xmx512m from HADOOP_CLIENT_OPTS. 
 If we really want to change the memory settings we need to use 
 $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10245) Hadoop command line always appends -Xmx option twice

2014-01-20 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877068#comment-13877068
 ] 

shanyu zhao commented on HADOOP-10245:
--

[~ywskycn] yes, it is the same issue. Sorry I didn't see HADOOP-9870 before I 
submit this one. I also found similar JIRAs HADOOP-9211 and HDFS-5087.

I went through these JIRAs and here are my thoughts:
We should only rely on $HADOOP_HEAPSIZE to control Java heap size, instead of 
$HADOOP_CLIENT_OPTS. Otherwise it would be very confusing and hard to debug 
issues. And I've seen many real world issues caused by this confusion.

There are arguments that $HADOOP_HEAPSIZE is only for service, and client 
should have its own settings. Well, we could create HADOOP_CLIENT_HEAPSIZE 
which is initialized to 512m and used in hadoop.sh. But personally I think it 
does not worth it to add this new env variable. The client can just simply use 
$HADOOP_HEAPSIZE which defaults to 1000m. Also, there are scenarios that a java 
class executed by hadoop jar command has a large memory requirements. A real 
world example: Hive's MapredLocalTask calls hadoop jar to build a local hash 
table.

Also, if there's a need to change the heapsize, one can always set env variable 
$HADOOP_HEAPSIZE.

 Hadoop command line always appends -Xmx option twice
 --

 Key: HADOOP-10245
 URL: https://issues.apache.org/jira/browse/HADOOP-10245
 Project: Hadoop Common
  Issue Type: Bug
  Components: bin
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10245.patch


 The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
 -Xmx options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
 env variable will take no effect because it is overwritten by the second 
 -Xmx option.
 For example, here is the java cmd generated for command hadoop fs -ls /, 
 Notice that there are two -Xmx options: -Xmx1000m and -Xmx512m in the 
 command line:
 java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
 -Dhadoop.root.logger=INFO,c
 onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
 org.apache.hadoop.fs.FsShell -ls /
 Here is the root cause:
 The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
 hadoop-env.sh. 
 In hadoop.sh, the command line is generated by the following pseudo code:
 java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
 In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as -Xmx1000m if user 
 didn't set $HADOOP_HEAP_SIZE env variable.
 In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
 export HADOOP_CLIENT_OPTS=-Xmx512m $HADOOP_CLIENT_OPTS
 To fix this problem, we should remove the -Xmx512m from HADOOP_CLIENT_OPTS. 
 If we really want to change the memory settings we need to use 
 $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10234) hadoop.cmd jar does not propagate exit code.

2014-01-14 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871398#comment-13871398
 ] 

shanyu zhao commented on HADOOP-10234:
--

+1

This patch fixed the problem that a Java program starts a process with command 
line hadoop.cmd ... but always read 0 as exit status even if the process 
failed. Thanks for the fix Chris!

 hadoop.cmd jar does not propagate exit code.
 --

 Key: HADOOP-10234
 URL: https://issues.apache.org/jira/browse/HADOOP-10234
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.3.0

 Attachments: HADOOP-10234.1.patch


 Running hadoop.cmd jar does not always propagate the exit code to the 
 caller.  In interactive use, it works fine.  However, in some usages (notably 
 Hive), it gets called through {{Shell#getRunScriptCommand}}, which needs to 
 do an intermediate cmd /c to execute the script.  In that case, the last 
 exit code is getting dropped, so Hive can't detect job failures.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2014-01-09 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Attachment: HADOOP-10178-v5.patch

New patch attached that fix the problem that unset should also unset 
alternative non-deprecated keys, just like set will set alternative 
non-deprecated keys. Unit tests added.

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178-v3.patch, 
 HADOOP-10178-v4.patch, HADOOP-10178-v5.patch, HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2014-01-09 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Attachment: (was: HADOOP-10178-v5.patch)

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178-v3.patch, 
 HADOOP-10178-v4.patch, HADOOP-10178-v5.patch, HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2014-01-09 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Attachment: HADOOP-10178-v5.patch

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178-v3.patch, 
 HADOOP-10178-v4.patch, HADOOP-10178-v5.patch, HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2014-01-09 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Attachment: HADOOP-10178-v6.patch

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178-v3.patch, 
 HADOOP-10178-v4.patch, HADOOP-10178-v5.patch, HADOOP-10178-v6.patch, 
 HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2014-01-09 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867517#comment-13867517
 ] 

shanyu zhao commented on HADOOP-10178:
--

There is still a problem with the current implementation. The iterator() 
returns all deprecated keys even though they are not set explicitly. This 
caused pig to print false warnings about deprecated keys. 

I think we should add a key to properties no matter it is deprecated or not. 
And iterator() shall only return what has been set to properties. This will 
solve the original problem presented in HADOOP-8167 (): did a set(dK, V) and 
expect, when iterating over the configuration to find (dK, V), without 
introducing false deprecation warnings.

To summarize, these are the expected behaviors with this change, suppose dK is 
deprecated by nK:
1) set(dK): will set both dK and nK to properties;
2) set(nK): will set nK, and if dK is present in properties, update it;
3) get(dK): because dK is deprecated, will call get(nK) instead;
4) get(nK): get nK from properties
5) unset(dK): unset both dK and nK
6) unset(nK): unset nK, and if dK was previously set, remove it from properties

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178-v3.patch, 
 HADOOP-10178-v4.patch, HADOOP-10178-v5.patch, HADOOP-10178-v6.patch, 
 HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2014-01-09 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Attachment: HADOOP-10178-v7.patch

New patch attached. 

I modified getAlternativeNames() to include the deprecated key if it is set 
explicit to properties before. With this set(nK) will update dK if dK is 
previously set, and dK will be removed upon unset(nK).

Also for set(k) we'll set k to properties no mater k is deprecated or not.

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178-v3.patch, 
 HADOOP-10178-v4.patch, HADOOP-10178-v5.patch, HADOOP-10178-v6.patch, 
 HADOOP-10178-v7.patch, HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2014-01-08 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865884#comment-13865884
 ] 

shanyu zhao commented on HADOOP-10178:
--

Not sure why v4 introduces findbugs warnings and those warnings seems not 
relevant to what I modified.

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178-v3.patch, 
 HADOOP-10178-v4.patch, HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2014-01-08 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866126#comment-13866126
 ] 

shanyu zhao commented on HADOOP-10178:
--

Thanks [~tucu00] for reviewing! As for the renaming of the source message, I 
think you don't need to call out using BB because the property name is BB, so 
that using BB is implied. This means the source that set the BB property in 
configuration is because CC is deprecated.

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178-v3.patch, 
 HADOOP-10178-v4.patch, HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2014-01-07 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Attachment: HADOOP-10178-v4.patch

New patch attached following [~cnauroth]'s advice.

Thanks for reviewing!

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178-v3.patch, 
 HADOOP-10178-v4.patch, HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2013-12-26 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Attachment: HADOOP-10178-v3.patch

New patch attached. Fix warning.

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178-v3.patch, 
 HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2013-12-24 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Attachment: (was: HADOOP-10178-v2.patch)

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2013-12-24 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Attachment: HADOOP-10178-v2.patch

Add getAlternativeNames() method to get the non-deprecated alias of a given 
non-deprecated key. When set a non-deprecated key, we need to check if it has 
any alias (that deprecate the same key) non-deprecated keys and set those 
properties as well.

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2013-12-24 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Attachment: HADOOP-10178-v2.patch

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178-v2.patch, HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2013-12-23 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Status: Patch Available  (was: Open)

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2013-12-21 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10178:
-

Attachment: HADOOP-10178.patch

Patch attached.

The root cause of this problem is that the fix for HADOOP-8167 put deprecated 
keys (even if you're adding a new key) into the variable properties, this 
will cause deprecated warning when handleDeprecation() is called. Note that 
handleDeprecation() has to be called otherwise it won't catch deprecated keys 
in configuration files.

This fix basically reversed HADOOP-8167 and fixed HADOOP-8167 in a different 
way: modified iterator() implementation to include all deprecated keys. 

 Configuration deprecation always emit deprecated warnings when a new key is 
 used
 --

 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10178.patch


 Even if you use any new configuration properties, you still find deprecated 
 warnings in your logs. E.g.:
 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HADOOP-10178) Configuration deprecation always emit deprecated warnings when a new key is used

2013-12-20 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-10178:


 Summary: Configuration deprecation always emit deprecated 
warnings when a new key is used
 Key: HADOOP-10178
 URL: https://issues.apache.org/jira/browse/HADOOP-10178
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao


Even if you use any new configuration properties, you still find deprecated 
warnings in your logs. E.g.:
13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive is 
deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HADOOP-10093) hadoop.cmd fs -copyFromLocal fails with large files on WASB

2013-11-12 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-10093:


 Summary: hadoop.cmd fs -copyFromLocal fails with large files on 
WASB
 Key: HADOOP-10093
 URL: https://issues.apache.org/jira/browse/HADOOP-10093
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao


When WASB is configured as default file system, if you run this:
 Hadoop fs -copyFromLocal largefile(150MB) /test

You'll see this error message:
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
 va:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at com.microsoft.windowsazure.services.blob.client.BlobOutputStream.writ
 eInternal(BlobOutputStream.java:618)
 at com.microsoft.windowsazure.services.blob.client.BlobOutputStream.writ
 e(BlobOutputStream.java:545)
 at java.io.DataOutputStream.write(DataOutputStream.java:107)
 at org.apache.hadoop.fs.azurenative.NativeAzureFileSystem$NativeAzureFsO
 utputStream.write(NativeAzureFileSystem.java:307)
 at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOut
 putStream.java:59)
 at java.io.DataOutputStream.write(DataOutputStream.java:107)
 at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:80)
 at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
 at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
 at org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.wr
 iteStreamToFile(CommandWithDestination.java:299)
 at org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(
 CommandWithDestination.java:281)
 at org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(Co
 mmandWithDestination.java:245)
 at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(Command
 WithDestination.java:188)
 at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(Command
 WithDestination.java:173)
 at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306)
 at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:2
 78)
 at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument
 (CommandWithDestination.java:168)
 at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
 at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)

at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(Co
 mmandWithDestination.java:145)
 at org.apache.hadoop.fs.shell.CopyCommands$Put.processArguments(CopyComm
 ands.java:229)
 at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:1
 90)
 at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
 at org.apache.hadoop.fs.FsShell.run(FsShell.java:255)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.fs.FsShell.main(FsShell.java:305)




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HADOOP-10093) hadoop.cmd fs -copyFromLocal fails with large files on WASB

2013-11-12 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-10093:
-

Attachment: HADOOP-10093.patch

Patch attached. Change the HADOOP_CLIENT_OPTS to use 512m as heap size, aligned 
with HADOOP-9211.

 hadoop.cmd fs -copyFromLocal fails with large files on WASB
 ---

 Key: HADOOP-10093
 URL: https://issues.apache.org/jira/browse/HADOOP-10093
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.2.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-10093.patch


 When WASB is configured as default file system, if you run this:
  Hadoop fs -copyFromLocal largefile(150MB) /test
 You'll see this error message:
  Exception in thread main java.lang.OutOfMemoryError: Java heap space
  at java.util.Arrays.copyOf(Arrays.java:2271)
  at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
  at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
  va:93)
  at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
  at com.microsoft.windowsazure.services.blob.client.BlobOutputStream.writ
  eInternal(BlobOutputStream.java:618)
  at com.microsoft.windowsazure.services.blob.client.BlobOutputStream.writ
  e(BlobOutputStream.java:545)
  at java.io.DataOutputStream.write(DataOutputStream.java:107)
  at org.apache.hadoop.fs.azurenative.NativeAzureFileSystem$NativeAzureFsO
  utputStream.write(NativeAzureFileSystem.java:307)
  at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOut
  putStream.java:59)
  at java.io.DataOutputStream.write(DataOutputStream.java:107)
  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:80)
  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
  at org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.wr
  iteStreamToFile(CommandWithDestination.java:299)
  at org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(
  CommandWithDestination.java:281)
  at org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(Co
  mmandWithDestination.java:245)
  at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(Command
  WithDestination.java:188)
  at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(Command
  WithDestination.java:173)
  at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306)
  at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:2
  78)
  at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument
  (CommandWithDestination.java:168)
  at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
  at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
 at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(Co
  mmandWithDestination.java:145)
  at org.apache.hadoop.fs.shell.CopyCommands$Put.processArguments(CopyComm
  ands.java:229)
  at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:1
  90)
  at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
  at org.apache.hadoop.fs.FsShell.run(FsShell.java:255)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
  at org.apache.hadoop.fs.FsShell.main(FsShell.java:305)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HADOOP-9776) HarFileSystem.listStatus() returns har://scheme-localhost:/... if port number is empty

2013-09-19 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9776:


Attachment: HADOOP-9776-3.patch

V3 of the patch to address indentation problem. Thanks Chuan and Ivan for 
reviewing the patch!

 HarFileSystem.listStatus() returns har://scheme-localhost:/... if port 
 number is empty
 --

 Key: HADOOP-9776
 URL: https://issues.apache.org/jira/browse/HADOOP-9776
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-9776-2.patch, HADOOP-9776-3.patch, 
 HADOOP-9776.patch


 If the given har URI is har://scheme-localhost/usr/my.har/a, the result 
 of HarFileSystem.listStatus() will have a : appended after localhost, like 
 this: har://scheme-localhost:/usr/my.har/a. it should return 
 har://scheme-localhost/usr/my.bar/a instead.
 This creates problem when running a hive unit test TestCliDriver 
 (archive_excludeHadoop20.q), generating the following error:
   java.io.IOException: cannot find dir = 
 har://pfile-localhost:/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=12/data.har/00_0
  in pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=11,
  
 har://pfile-localhost/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=12/data.har]
   [junit] at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
   [junit] at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:260)
   [junit] at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.java:104)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9924) FileUtil.createJarWithClassPath() does not generate relative classpath correctly

2013-09-06 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9924:


Attachment: HADOOP-9924.addendum.patch

There is a corner case the previous patch does not handler correctly. When 
there's empty string included in the classpath, new Path(classPathEntry) will 
through an exception. 

The manifest of this problem is that when you try to run a pig job, the job 
will fail because the application master is failed to start.

The addendum patch will handle the empty string scenario. Unit test cases are 
updated to test this scenario as well.

 FileUtil.createJarWithClassPath() does not generate relative classpath 
 correctly
 

 Key: HADOOP-9924
 URL: https://issues.apache.org/jira/browse/HADOOP-9924
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.1.0-beta, 0.23.9
Reporter: shanyu zhao
Assignee: shanyu zhao
 Fix For: 2.1.1-beta

 Attachments: HADOOP-9924-2.patch, HADOOP-9924-3.patch, 
 HADOOP-9924-4.patch, HADOOP-9924.addendum.patch, HADOOP-9924.patch


 On Windows, FileUtil.createJarWithClassPath() is called to generate a 
 manifest jar file to pack classpath - to avoid the problem of classpath being 
 too long.
 However, the relative classpath is not handled correctly. It relies on Java's 
 File(relativePath) to resolve the relative path. But it really should be 
 using the given pwd parameter to resolve the relative path.
 To reproduce this bug, you can try some pig job on Windows, it will fail and 
 the pig log on the application master will look like this:
 2013-08-29 23:25:55,498 INFO [main] 
 org.apache.hadoop.service.AbstractService: Service 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 This is because the PigOutputFormat class is in the job.jar file but the 
 classpath manifest has:
 file:/c:/apps/dist/hadoop-2.1.0-beta/bin/job.jar/job.jar
 When it really should be:
 file:/job container folder/job.jar/job.jar

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9924) FileUtil.createJarWithClassPath() does not generate relative classpath correctly

2013-09-03 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9924:


Attachment: HADOOP-9924-2.patch

Thanks Ivan for your comments! Suggest taken and a new patch is attached.

 FileUtil.createJarWithClassPath() does not generate relative classpath 
 correctly
 

 Key: HADOOP-9924
 URL: https://issues.apache.org/jira/browse/HADOOP-9924
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.1.0-beta, 0.23.9
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-9924-2.patch, HADOOP-9924.patch


 On Windows, FileUtil.createJarWithClassPath() is called to generate a 
 manifest jar file to pack classpath - to avoid the problem of classpath being 
 too long.
 However, the relative classpath is not handled correctly. It relies on Java's 
 File(relativePath) to resolve the relative path. But it really should be 
 using the given pwd parameter to resolve the relative path.
 To reproduce this bug, you can try some pig job on Windows, it will fail and 
 the pig log on the application master will look like this:
 2013-08-29 23:25:55,498 INFO [main] 
 org.apache.hadoop.service.AbstractService: Service 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 This is because the PigOutputFormat class is in the job.jar file but the 
 classpath manifest has:
 file:/c:/apps/dist/hadoop-2.1.0-beta/bin/job.jar/job.jar
 When it really should be:
 file:/job container folder/job.jar/job.jar

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9924) FileUtil.createJarWithClassPath() does not generate relative classpath correctly

2013-09-03 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9924:


Status: Patch Available  (was: Open)

 FileUtil.createJarWithClassPath() does not generate relative classpath 
 correctly
 

 Key: HADOOP-9924
 URL: https://issues.apache.org/jira/browse/HADOOP-9924
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 0.23.9, 2.1.0-beta
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-9924-2.patch, HADOOP-9924.patch


 On Windows, FileUtil.createJarWithClassPath() is called to generate a 
 manifest jar file to pack classpath - to avoid the problem of classpath being 
 too long.
 However, the relative classpath is not handled correctly. It relies on Java's 
 File(relativePath) to resolve the relative path. But it really should be 
 using the given pwd parameter to resolve the relative path.
 To reproduce this bug, you can try some pig job on Windows, it will fail and 
 the pig log on the application master will look like this:
 2013-08-29 23:25:55,498 INFO [main] 
 org.apache.hadoop.service.AbstractService: Service 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 This is because the PigOutputFormat class is in the job.jar file but the 
 classpath manifest has:
 file:/c:/apps/dist/hadoop-2.1.0-beta/bin/job.jar/job.jar
 When it really should be:
 file:/job container folder/job.jar/job.jar

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9924) FileUtil.createJarWithClassPath() does not generate relative classpath correctly

2013-09-03 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9924:


Attachment: HADOOP-9924-3.patch

previous patch was against 2.1.0-beta. This one is against trunk.

 FileUtil.createJarWithClassPath() does not generate relative classpath 
 correctly
 

 Key: HADOOP-9924
 URL: https://issues.apache.org/jira/browse/HADOOP-9924
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.1.0-beta, 0.23.9
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-9924-2.patch, HADOOP-9924-3.patch, 
 HADOOP-9924.patch


 On Windows, FileUtil.createJarWithClassPath() is called to generate a 
 manifest jar file to pack classpath - to avoid the problem of classpath being 
 too long.
 However, the relative classpath is not handled correctly. It relies on Java's 
 File(relativePath) to resolve the relative path. But it really should be 
 using the given pwd parameter to resolve the relative path.
 To reproduce this bug, you can try some pig job on Windows, it will fail and 
 the pig log on the application master will look like this:
 2013-08-29 23:25:55,498 INFO [main] 
 org.apache.hadoop.service.AbstractService: Service 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 This is because the PigOutputFormat class is in the job.jar file but the 
 classpath manifest has:
 file:/c:/apps/dist/hadoop-2.1.0-beta/bin/job.jar/job.jar
 When it really should be:
 file:/job container folder/job.jar/job.jar

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9924) FileUtil.createJarWithClassPath() does not generate relative classpath correctly

2013-09-03 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9924:


Attachment: HADOOP-9924-4.patch

fix the patch.

 FileUtil.createJarWithClassPath() does not generate relative classpath 
 correctly
 

 Key: HADOOP-9924
 URL: https://issues.apache.org/jira/browse/HADOOP-9924
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.1.0-beta, 0.23.9
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-9924-2.patch, HADOOP-9924-3.patch, 
 HADOOP-9924-4.patch, HADOOP-9924.patch


 On Windows, FileUtil.createJarWithClassPath() is called to generate a 
 manifest jar file to pack classpath - to avoid the problem of classpath being 
 too long.
 However, the relative classpath is not handled correctly. It relies on Java's 
 File(relativePath) to resolve the relative path. But it really should be 
 using the given pwd parameter to resolve the relative path.
 To reproduce this bug, you can try some pig job on Windows, it will fail and 
 the pig log on the application master will look like this:
 2013-08-29 23:25:55,498 INFO [main] 
 org.apache.hadoop.service.AbstractService: Service 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 This is because the PigOutputFormat class is in the job.jar file but the 
 classpath manifest has:
 file:/c:/apps/dist/hadoop-2.1.0-beta/bin/job.jar/job.jar
 When it really should be:
 file:/job container folder/job.jar/job.jar

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9896) TestIPC fail on trunk with error VM crash or System.exit

2013-08-30 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755199#comment-13755199
 ] 

shanyu zhao commented on HADOOP-9896:
-

+1

Thanks Chuan! I tried the patch and it works fine!

Not sure when lingering client thread is causing this problem though. Looks 
like sometimes the Server response was probably consumed by other client thus 
the testRetryProxy()'s client is waiting forever. But anyway, this patch fixed 
the TestIPC test case and I think they are good fix.



 TestIPC fail on trunk with error VM crash or System.exit
 

 Key: HADOOP-9896
 URL: https://issues.apache.org/jira/browse/HADOOP-9896
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 3.0.0, 2.3.0
Reporter: shanyu zhao
Assignee: Chuan Liu
 Attachments: HADOOP-9896.patch, 
 org.apache.hadoop.ipc.TestIPC-output.txt


 I'm running hadoop unit tests on a Ubuntu 12.04 64 bit virtual machine, every 
 time I try to run all unit tests with command mvn test, the TestIPC unit 
 test will fail, the console will show The forked VM terminated without 
 saying properly goodbye. VM crash or System.exit called?
 To reproduce:
 $cd hadoop-common-project/hadoop-common
 $mvn clean install -Pdist -DskipTests
 $mvn test -Pdist -Dtest=TestIPC

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9924) FileUtil.createJarWithClassPath() does not generate relative classpath correctly

2013-08-30 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-9924:
---

 Summary: FileUtil.createJarWithClassPath() does not generate 
relative classpath correctly
 Key: HADOOP-9924
 URL: https://issues.apache.org/jira/browse/HADOOP-9924
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 0.23.9, 2.1.0-beta
Reporter: shanyu zhao
Assignee: shanyu zhao


On Windows, FileUtil.createJarWithClassPath() is called to generate a manifest 
jar file to pack classpath - to avoid the problem of classpath being too long.
However, the relative classpath is not handled correctly. It relies on Java's 
File(relativePath) to resolve the relative path. But it really should be using 
the given pwd parameter to resolve the relative path.

To reproduce this bug, you can try some pig job on Windows, it will fail and 
the pig log on the application master will look like this:

2013-08-29 23:25:55,498 INFO [main] org.apache.hadoop.service.AbstractService: 
Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; 
cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
not found
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
not found

This is because the PigOutputFormat class is in the job.jar file but the 
classpath manifest has:
file:/c:/apps/dist/hadoop-2.1.0-beta/bin/job.jar/job.jar
When it really should be:
file:/job container folder/job.jar/job.jar

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9924) FileUtil.createJarWithClassPath() does not generate relative classpath correctly

2013-08-30 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9924:


Attachment: HADOOP-9924.patch

 FileUtil.createJarWithClassPath() does not generate relative classpath 
 correctly
 

 Key: HADOOP-9924
 URL: https://issues.apache.org/jira/browse/HADOOP-9924
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.1.0-beta, 0.23.9
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-9924.patch


 On Windows, FileUtil.createJarWithClassPath() is called to generate a 
 manifest jar file to pack classpath - to avoid the problem of classpath being 
 too long.
 However, the relative classpath is not handled correctly. It relies on Java's 
 File(relativePath) to resolve the relative path. But it really should be 
 using the given pwd parameter to resolve the relative path.
 To reproduce this bug, you can try some pig job on Windows, it will fail and 
 the pig log on the application master will look like this:
 2013-08-29 23:25:55,498 INFO [main] 
 org.apache.hadoop.service.AbstractService: Service 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat 
 not found
 This is because the PigOutputFormat class is in the job.jar file but the 
 classpath manifest has:
 file:/c:/apps/dist/hadoop-2.1.0-beta/bin/job.jar/job.jar
 When it really should be:
 file:/job container folder/job.jar/job.jar

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolute paths when input path is relative on Windows

2013-08-28 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752687#comment-13752687
 ] 

shanyu zhao commented on HADOOP-9774:
-

Thank you Ivan. I actually was able to run all unit tests on hadoop trunk on 
Linux. I didn't observe any negative impact by this patch. Would you please 
commit this patch?

 RawLocalFileSystem.listStatus() return absolute paths when input path is 
 relative on Windows
 

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-9774-2.patch, HADOOP-9774-3.patch, 
 HADOOP-9774-4.patch, HADOOP-9774-5.patch, HADOOP-9774.patch


 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9896) TestIPC fail with VM crash or System.exit

2013-08-27 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751996#comment-13751996
 ] 

shanyu zhao commented on HADOOP-9896:
-

The freeze is actually caused by test case TestIPC#testRetryProxy. If I just 
run this one test case, it was passing, but running all TestIPC tests results 
in main thread getting stuck in call.wait() (Client.java, line 1390). If I 
exclude this test case, or set a timeout, then TestIPC can pass.

This is caused by timing issue. Sometimes randomly the tests pass for me, but 
they fail most of the time. And if I attach a debugger to the test process and 
set a break point at this test case (testRetryProxy), the tests will pass then.

 TestIPC fail with VM crash or System.exit
 -

 Key: HADOOP-9896
 URL: https://issues.apache.org/jira/browse/HADOOP-9896
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.0.5-alpha
Reporter: shanyu zhao
 Attachments: org.apache.hadoop.ipc.TestIPC-output.txt


 I'm running hadoop unit tests on a Ubuntu 12.04 virtual machine, every time I 
 try to run all unit tests with command mvn test, the TestIPC unit test will 
 fail, the console will show The forked VM terminated without saying properly 
 goodbye. VM crash or System.exit called?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9896) TestIPC fail on trunk with error VM crash or System.exit

2013-08-27 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9896:


Summary: TestIPC fail on trunk with error VM crash or System.exit  (was: 
TestIPC fail with VM crash or System.exit)

 TestIPC fail on trunk with error VM crash or System.exit
 

 Key: HADOOP-9896
 URL: https://issues.apache.org/jira/browse/HADOOP-9896
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.0.5-alpha
Reporter: shanyu zhao
 Attachments: org.apache.hadoop.ipc.TestIPC-output.txt


 I'm running hadoop unit tests on a Ubuntu 12.04 virtual machine, every time I 
 try to run all unit tests with command mvn test, the TestIPC unit test will 
 fail, the console will show The forked VM terminated without saying properly 
 goodbye. VM crash or System.exit called?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9896) TestIPC fail on trunk with error VM crash or System.exit

2013-08-27 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9896:


Description: 
I'm running hadoop unit tests on a Ubuntu 12.04 64 bit virtual machine, every 
time I try to run all unit tests with command mvn test, the TestIPC unit test 
will fail, the console will show The forked VM terminated without saying 
properly goodbye. VM crash or System.exit called?

To reproduce:
$cd hadoop-common-project/hadoop-common
$mvn clean install -Pdist -DskipTests
$mvn test -Pdist -Dtest=TestIPC


  was:
I'm running hadoop unit tests on a Ubuntu 12.04 virtual machine, every time I 
try to run all unit tests with command mvn test, the TestIPC unit test will 
fail, the console will show The forked VM terminated without saying properly 
goodbye. VM crash or System.exit called?




 TestIPC fail on trunk with error VM crash or System.exit
 

 Key: HADOOP-9896
 URL: https://issues.apache.org/jira/browse/HADOOP-9896
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.0.5-alpha
Reporter: shanyu zhao
 Attachments: org.apache.hadoop.ipc.TestIPC-output.txt


 I'm running hadoop unit tests on a Ubuntu 12.04 64 bit virtual machine, every 
 time I try to run all unit tests with command mvn test, the TestIPC unit 
 test will fail, the console will show The forked VM terminated without 
 saying properly goodbye. VM crash or System.exit called?
 To reproduce:
 $cd hadoop-common-project/hadoop-common
 $mvn clean install -Pdist -DskipTests
 $mvn test -Pdist -Dtest=TestIPC

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9896) TestIPC fail with VM crash or System.exit

2013-08-23 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749058#comment-13749058
 ] 

shanyu zhao commented on HADOOP-9896:
-

More information. I was using Hyper-V on Windows 8 running Ubuntu 12.04 VM. I 
found out that it was due to TestIPC#testSocketLeaks got stuck. However, if I 
just run that one test case, it will pass for me. But if I run all TestIPC unit 
tests, it will get stuck.

 TestIPC fail with VM crash or System.exit
 -

 Key: HADOOP-9896
 URL: https://issues.apache.org/jira/browse/HADOOP-9896
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.0.5-alpha
Reporter: shanyu zhao
 Attachments: org.apache.hadoop.ipc.TestIPC-output.txt


 I'm running hadoop unit tests on a Ubuntu 12.04 virtual machine, every time I 
 try to run all unit tests with command mvn test, the TestIPC unit test will 
 fail, the console will show The forked VM terminated without saying properly 
 goodbye. VM crash or System.exit called?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolute paths when input path is relative on Windows

2013-08-23 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749060#comment-13749060
 ] 

shanyu zhao commented on HADOOP-9774:
-

Hi Ivan, would you please run that patch against hdfs, mapreduce and a few 
other projects? 

I'm still not able to run unit tests even WITHOUT this patch - I got the VM 
crash error here and there, so I even couldn't finish one project. This is 
completely irrelevant to this bug. I even created a bug for the VM crash - 
HADOOP-9896.

 RawLocalFileSystem.listStatus() return absolute paths when input path is 
 relative on Windows
 

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-9774-2.patch, HADOOP-9774-3.patch, 
 HADOOP-9774-4.patch, HADOOP-9774-5.patch, HADOOP-9774.patch


 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9896) TestIPC fail with VM crash or System.exit

2013-08-21 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-9896:
---

 Summary: TestIPC fail with VM crash or System.exit
 Key: HADOOP-9896
 URL: https://issues.apache.org/jira/browse/HADOOP-9896
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.0.5-alpha
Reporter: shanyu zhao
 Attachments: org.apache.hadoop.ipc.TestIPC-output.txt

I'm running hadoop unit tests on a Ubuntu 12.04 virtual machine, every time I 
try to run all unit tests with command mvn test, the TestIPC unit test will 
fail, the console will show The forked VM terminated without saying properly 
goodbye. VM crash or System.exit called?



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9896) TestIPC fail with VM crash or System.exit

2013-08-21 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9896:


Attachment: org.apache.hadoop.ipc.TestIPC-output.txt

surefire report output attached

 TestIPC fail with VM crash or System.exit
 -

 Key: HADOOP-9896
 URL: https://issues.apache.org/jira/browse/HADOOP-9896
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.0.5-alpha
Reporter: shanyu zhao
 Attachments: org.apache.hadoop.ipc.TestIPC-output.txt


 I'm running hadoop unit tests on a Ubuntu 12.04 virtual machine, every time I 
 try to run all unit tests with command mvn test, the TestIPC unit test will 
 fail, the console will show The forked VM terminated without saying properly 
 goodbye. VM crash or System.exit called?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolute paths when input path is relative on Windows

2013-08-15 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741212#comment-13741212
 ] 

shanyu zhao commented on HADOOP-9774:
-

Sure. I actually have run all yarn unit tests on patch v4, I can run another 
test on v5.

 RawLocalFileSystem.listStatus() return absolute paths when input path is 
 relative on Windows
 

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HADOOP-9774-2.patch, HADOOP-9774-3.patch, 
 HADOOP-9774-4.patch, HADOOP-9774-5.patch, HADOOP-9774.patch


 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolute paths when input path is relative on Windows

2013-07-30 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9774:


Attachment: HADOOP-9774-4.patch

Thanks Ivan for your suggestions. You are right, the Path(String parent, String 
child) constructor is not designed to accept ONLY relative path in child. There 
are some unit test cases where child is absolution path like file:/mydata. My 
previous patch failed those cases.

So basically we cannot support file names with colon in child parameter in 
constructor Path(String parent, String child). This is because a string like 
file:/ can be interpret either as a scheme or a file named file:.

That being said, we can still use the trick Ivan mentioned - the Path(String 
scheme, String authority, String path) constructor. If we expect the path to 
contain colon, then we should use this explicit constructor to avoid ambiguity. 
So we can try adding ./ to the path string in this constructor to deal with 
colon character.

Attached is a new patch to implement this idea.

 RawLocalFileSystem.listStatus() return absolute paths when input path is 
 relative on Windows
 

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
 Attachments: HADOOP-9774-2.patch, HADOOP-9774-3.patch, 
 HADOOP-9774-4.patch, HADOOP-9774.patch


 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolution paths when input path is relative on Windows

2013-07-29 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722814#comment-13722814
 ] 

shanyu zhao commented on HADOOP-9774:
-

Thanks Ivan for chiming in. Actually I've tried your suggestions but they don't 
work exactly because of the gotcha you mentioned:

bq. One gotcha to keep in mind is the URI encoding/decoding of escape chars.

In the code you posted: 

{code}
results[j] = getFileStatus(new Path(f, new Path(null, null, names[i])));
{code}

the names[i] on windows could be E:\mydata which will throw exception on new 
Path(null, null, names[i]) because it is not escaped. So it must be specially 
handled using a OS specific condition sentence. Like the following code snippet:

{code}
// add a slash in front of paths with Windows drive letters
if (hasWindowsDrive(pathString)  pathString.charAt(0) != '/') {
  pathString = / + pathString;
}
{code}

This is why I chose not to touch Path.java at the beginning. But if we want to 
do a clean fix (but riskier), I can post a patch for review.

 RawLocalFileSystem.listStatus() return absolution paths when input path is 
 relative on Windows
 --

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
 Attachments: HADOOP-9774.patch


 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolute paths when input path is relative on Windows

2013-07-29 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9774:


Attachment: HADOOP-9774-2.patch

Attached is a patch to fix the hadoop path resolution issues (HADOOP-8962 and 
this jira) from its root.

The fundamental cause of issue HADOOP-8962 is that the constructor:

{code}
public Path(String parent, String child)
{code}

does not always work as it was intended to be in all scenarios. A simple 
example is that the relative path represented by child could contain colon in 
the file name, e.g. a:b/t1.txt, which cause Path constructor to wrongfully 
interpret the path. 

One way to fix this problem is to add ./ to the beginning of the child string 
if it's not an absolute path (starts with /). Also, on Windows, we need to 
add a slash to the beginning if the child string starts with dive spec, e.g., 
E:\data - /E:\data.

I added a few Path related test cases, also a new test case to make sure on 
Windows, the RawLocalFileSystem.listStatus() returns consistent paths.

 RawLocalFileSystem.listStatus() return absolute paths when input path is 
 relative on Windows
 

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
 Attachments: HADOOP-9774-2.patch, HADOOP-9774.patch


 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolute paths when input path is relative on Windows

2013-07-29 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9774:


Attachment: HADOOP-9774-3.patch

This v3 of the patch. I created a new constructor with a isRelative parameter 
so that we don't change the behavior of old constructor Path(string).

 RawLocalFileSystem.listStatus() return absolute paths when input path is 
 relative on Windows
 

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
 Attachments: HADOOP-9774-2.patch, HADOOP-9774-3.patch, 
 HADOOP-9774.patch


 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9776) HarFileSystem.listStatus() returns har://scheme-localhost:/... if port number is empty

2013-07-26 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9776:


Attachment: HADOOP-9776-2.patch

Adding unit test case

 HarFileSystem.listStatus() returns har://scheme-localhost:/... if port 
 number is empty
 --

 Key: HADOOP-9776
 URL: https://issues.apache.org/jira/browse/HADOOP-9776
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
 Attachments: HADOOP-9776-2.patch, HADOOP-9776.patch


 If the given har URI is har://scheme-localhost/usr/my.har/a, the result 
 of HarFileSystem.listStatus() will have a : appended after localhost, like 
 this: har://scheme-localhost:/usr/my.har/a. it should return 
 har://scheme-localhost/usr/my.bar/a instead.
 This creates problem when running a hive unit test TestCliDriver 
 (archive_excludeHadoop20.q), generating the following error:
   java.io.IOException: cannot find dir = 
 har://pfile-localhost:/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=12/data.har/00_0
  in pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=11,
  
 har://pfile-localhost/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=12/data.har]
   [junit] at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
   [junit] at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:260)
   [junit] at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.java:104)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolution paths when input path is relative on Windows

2013-07-26 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721463#comment-13721463
 ] 

shanyu zhao commented on HADOOP-9774:
-

The reason why I didn't add a unit test is because this behavior cannot be 
tested on Linux. I'm reusing HADOOP-8962's unit test case to verify that this 
change doesn't break on Linux.

In this specific context, by relative path I mean a path without a drive spec 
on Windows, e.g. file:///mydata (instead of file:///E:/mydata). I do not 
refer to a relative path in the sense of ../mypath/. Maybe I should say 
Windows relative path.

 RawLocalFileSystem.listStatus() return absolution paths when input path is 
 relative on Windows
 --

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
 Attachments: HADOOP-9774.patch


 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolution paths when input path is relative on Windows

2013-07-25 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-9774:
---

 Summary: RawLocalFileSystem.listStatus() return absolution paths 
when input path is relative on Windows
 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 0.23.9, 0.23.8, 0.23.7, 0.23.6, 0.23.5
Reporter: shanyu zhao


On Windows, when using RawLocalFileSystem.listStatus() to enumerate a relative 
path (without drive spec), e.g., file:///mydata, the resulting paths become 
absolute paths, e.g., [file://E:/mydata/t1.txt, file://E:/mydata/t2.txt...].
Note that if we use it to enumerate an absolute path, e.g., file://E:/mydata 
then the we get the same results as above.

This breaks some hive unit tests which uses local file system to simulate HDFS 
when testing, therefore the drive spec is removed. Then after listStatus() the 
path is changed to absolute path, hive failed to find the path in its map 
reduce job.

You'll see the following exception:
[junit] java.io.IOException: cannot find dir = 
pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
pathToPartitionInfo: 
[pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
[junit] at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)


This problem is introduced by this JIRA:
HADOOP-8962

Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
relative paths if the parent paths are relative, e.g., 
[file:///mydata/t1.txt, file:///mydata/t2.txt...]

This behavior change is a side effect of the fix in HADOOP-8962, not an 
intended change. The resulting behavior, even though is legitimate from a 
function point of view, break consistency from the caller's point of view. When 
the caller use a relative path (without drive spec) to do listStatus() the 
resulting path should be relative. Therefore, I think this should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9776) HarFileSystem.listStatus() returns har://scheme-localhost:/... if port number is empty

2013-07-25 Thread shanyu zhao (JIRA)
shanyu zhao created HADOOP-9776:
---

 Summary: HarFileSystem.listStatus() returns 
har://scheme-localhost:/... if port number is empty
 Key: HADOOP-9776
 URL: https://issues.apache.org/jira/browse/HADOOP-9776
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 0.23.9
Reporter: shanyu zhao


If the given har URI is har://scheme-localhost/usr/my.har/a, the result of 
HarFileSystem.listStatus() will have a : appended after localhost, like this: 
har://scheme-localhost:/usr/my.har/a. it should return 
har://scheme-localhost/usr/my.bar/a instead.

This creates problem when running a hive unit test TestCliDriver 
(archive_excludeHadoop20.q), generating the following error:

java.io.IOException: cannot find dir = 
har://pfile-localhost:/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=12/data.har/00_0
 in pathToPartitionInfo: 
[pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=11,
 
har://pfile-localhost/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=12/data.har]
[junit] at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
[junit] at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:260)
[junit] at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.java:104)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolution paths when input path is relative on Windows

2013-07-25 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9774:


Status: Open  (was: Patch Available)

 RawLocalFileSystem.listStatus() return absolution paths when input path is 
 relative on Windows
 --

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 0.23.9, 0.23.8, 0.23.7, 0.23.6, 0.23.5
Reporter: shanyu zhao

 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolution paths when input path is relative on Windows

2013-07-25 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9774:


Status: Patch Available  (was: Open)

 RawLocalFileSystem.listStatus() return absolution paths when input path is 
 relative on Windows
 --

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 0.23.9, 0.23.8, 0.23.7, 0.23.6, 0.23.5
Reporter: shanyu zhao

 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolution paths when input path is relative on Windows

2013-07-25 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9774:


Affects Version/s: (was: 0.23.9)
   (was: 0.23.8)
   (was: 0.23.7)
   (was: 0.23.6)
   (was: 0.23.5)
   2.1.0-beta
   3.0.0

 RawLocalFileSystem.listStatus() return absolution paths when input path is 
 relative on Windows
 --

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao

 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolution paths when input path is relative on Windows

2013-07-25 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9774:


Attachment: HADOOP-9774.patch

Patch attached. Basically on Windows, we can resolve the files against their 
parent folder, which will result in consistent paths returned. Since : is not 
allowed in file name on Windows, the problem stated in HADOOP-8962 doesn't 
exist.

I tried to come up with a more elegant way by introducing a new Path 
constructor or re-use other Path constructors, none of them worked in all 
situations. The Path class is very dangerous to modify and it's already 
confusing to readers. I decide to leave it alone.

 RawLocalFileSystem.listStatus() return absolution paths when input path is 
 relative on Windows
 --

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
 Attachments: HADOOP-9774.patch


 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9774) RawLocalFileSystem.listStatus() return absolution paths when input path is relative on Windows

2013-07-25 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9774:


Status: Patch Available  (was: Open)

 RawLocalFileSystem.listStatus() return absolution paths when input path is 
 relative on Windows
 --

 Key: HADOOP-9774
 URL: https://issues.apache.org/jira/browse/HADOOP-9774
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
 Attachments: HADOOP-9774.patch


 On Windows, when using RawLocalFileSystem.listStatus() to enumerate a 
 relative path (without drive spec), e.g., file:///mydata, the resulting 
 paths become absolute paths, e.g., [file://E:/mydata/t1.txt, 
 file://E:/mydata/t2.txt...].
 Note that if we use it to enumerate an absolute path, e.g., 
 file://E:/mydata then the we get the same results as above.
 This breaks some hive unit tests which uses local file system to simulate 
 HDFS when testing, therefore the drive spec is removed. Then after 
 listStatus() the path is changed to absolute path, hive failed to find the 
 path in its map reduce job.
 You'll see the following exception:
 [junit] java.io.IOException: cannot find dir = 
 pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in 
 pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
 [junit]   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
 This problem is introduced by this JIRA:
 HADOOP-8962
 Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are 
 relative paths if the parent paths are relative, e.g., 
 [file:///mydata/t1.txt, file:///mydata/t2.txt...]
 This behavior change is a side effect of the fix in HADOOP-8962, not an 
 intended change. The resulting behavior, even though is legitimate from a 
 function point of view, break consistency from the caller's point of view. 
 When the caller use a relative path (without drive spec) to do listStatus() 
 the resulting path should be relative. Therefore, I think this should be 
 fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9776) HarFileSystem.listStatus() returns har://scheme-localhost:/... if port number is empty

2013-07-25 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9776:


Attachment: HADOOP-9776.patch

 HarFileSystem.listStatus() returns har://scheme-localhost:/... if port 
 number is empty
 --

 Key: HADOOP-9776
 URL: https://issues.apache.org/jira/browse/HADOOP-9776
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
 Attachments: HADOOP-9776.patch


 If the given har URI is har://scheme-localhost/usr/my.har/a, the result 
 of HarFileSystem.listStatus() will have a : appended after localhost, like 
 this: har://scheme-localhost:/usr/my.har/a. it should return 
 har://scheme-localhost/usr/my.bar/a instead.
 This creates problem when running a hive unit test TestCliDriver 
 (archive_excludeHadoop20.q), generating the following error:
   java.io.IOException: cannot find dir = 
 har://pfile-localhost:/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=12/data.har/00_0
  in pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=11,
  
 har://pfile-localhost/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=12/data.har]
   [junit] at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
   [junit] at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:260)
   [junit] at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.java:104)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9776) HarFileSystem.listStatus() returns har://scheme-localhost:/... if port number is empty

2013-07-25 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HADOOP-9776:


Affects Version/s: (was: 0.23.9)
   2.1.0-beta
   3.0.0

 HarFileSystem.listStatus() returns har://scheme-localhost:/... if port 
 number is empty
 --

 Key: HADOOP-9776
 URL: https://issues.apache.org/jira/browse/HADOOP-9776
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: shanyu zhao
 Attachments: HADOOP-9776.patch


 If the given har URI is har://scheme-localhost/usr/my.har/a, the result 
 of HarFileSystem.listStatus() will have a : appended after localhost, like 
 this: har://scheme-localhost:/usr/my.har/a. it should return 
 har://scheme-localhost/usr/my.bar/a instead.
 This creates problem when running a hive unit test TestCliDriver 
 (archive_excludeHadoop20.q), generating the following error:
   java.io.IOException: cannot find dir = 
 har://pfile-localhost:/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=12/data.har/00_0
  in pathToPartitionInfo: 
 [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=11,
  
 har://pfile-localhost/GitHub/hive-monarch/build/ql/test/data/warehouse/tstsrcpart/ds=2008-04-08/hr=12/data.har]
   [junit] at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
   [junit] at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:260)
   [junit] at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.java:104)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira