[jira] [Work started] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-10809 started by Mike Liddell.

> hadoop-azure: page blob support
> ---
>
> Key: HADOOP-10809
> URL: https://issues.apache.org/jira/browse/HADOOP-10809
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools
>Reporter: Mike Liddell
>Assignee: Mike Liddell
> Attachments: HADOOP-10809.1.patch
>
>
> Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
> Block-blobs are the general purpose kind that support convenient APIs and are 
> the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).
> Page-blobs use the same namespace as block-blobs but provide a different 
> low-level feature set.  Most importantly, page-blobs can cope with an 
> effectively infinite number of small accesses whereas block-blobs can only 
> tolerate 50K appends before relatively manual rewriting of the data is 
> necessary.  A simple analogy is that page-blobs are like a regular disk and 
> the basic API is like a low-level device driver.
> See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
> introductory material.
> The primary driving scenario for page-blob support is for HBase transaction 
> log files which require an access pattern of many small writes.  Additional 
> scenarios can also be supported.
> Configuration:
> The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
> determine whether to create a block- or page-blob.  To permit scenarios where 
> application code doesn't know about the details of azure storage we would 
> like the configuration to be Aspect-style, ie configured by the Administrator 
> and transparent to the application. The current solution is to use hadoop 
> configuration to declare a list of page-blob folders -- Azure Filesystem for 
> Hadoop will create files in these folders using page-blob flavor.  The 
> configuration key is "fs.azure.page.blob.dir", and description can be found 
> in AzureNativeFileSystemStore.java.
> Code changes:
> - refactor of basic Azure Filesystem code to use a general BlobWrapper and 
> specialized BlockBlobWrapper vs PageBlobWrapper
> - introduction of PageBlob support (read, write, etc)
> - miscellaneous changes such as umask handling, implementation of 
> createNonRecursive(), flush/hflush/hsync.
> - new unit tests.
> Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
> Mike Liddell.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell reassigned HADOOP-10809:
-

Assignee: Mike Liddell

> hadoop-azure: page blob support
> ---
>
> Key: HADOOP-10809
> URL: https://issues.apache.org/jira/browse/HADOOP-10809
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools
>Reporter: Mike Liddell
>Assignee: Mike Liddell
> Attachments: HADOOP-10809.1.patch
>
>
> Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
> Block-blobs are the general purpose kind that support convenient APIs and are 
> the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).
> Page-blobs use the same namespace as block-blobs but provide a different 
> low-level feature set.  Most importantly, page-blobs can cope with an 
> effectively infinite number of small accesses whereas block-blobs can only 
> tolerate 50K appends before relatively manual rewriting of the data is 
> necessary.  A simple analogy is that page-blobs are like a regular disk and 
> the basic API is like a low-level device driver.
> See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
> introductory material.
> The primary driving scenario for page-blob support is for HBase transaction 
> log files which require an access pattern of many small writes.  Additional 
> scenarios can also be supported.
> Configuration:
> The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
> determine whether to create a block- or page-blob.  To permit scenarios where 
> application code doesn't know about the details of azure storage we would 
> like the configuration to be Aspect-style, ie configured by the Administrator 
> and transparent to the application. The current solution is to use hadoop 
> configuration to declare a list of page-blob folders -- Azure Filesystem for 
> Hadoop will create files in these folders using page-blob flavor.  The 
> configuration key is "fs.azure.page.blob.dir", and description can be found 
> in AzureNativeFileSystemStore.java.
> Code changes:
> - refactor of basic Azure Filesystem code to use a general BlobWrapper and 
> specialized BlockBlobWrapper vs PageBlobWrapper
> - introduction of PageBlob support (read, write, etc)
> - miscellaneous changes such as umask handling, implementation of 
> createNonRecursive(), flush/hflush/hsync.
> - new unit tests.
> Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
> Mike Liddell.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10809:
--

Status: Patch Available  (was: In Progress)

> hadoop-azure: page blob support
> ---
>
> Key: HADOOP-10809
> URL: https://issues.apache.org/jira/browse/HADOOP-10809
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools
>Reporter: Mike Liddell
>Assignee: Mike Liddell
> Attachments: HADOOP-10809.1.patch
>
>
> Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
> Block-blobs are the general purpose kind that support convenient APIs and are 
> the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).
> Page-blobs use the same namespace as block-blobs but provide a different 
> low-level feature set.  Most importantly, page-blobs can cope with an 
> effectively infinite number of small accesses whereas block-blobs can only 
> tolerate 50K appends before relatively manual rewriting of the data is 
> necessary.  A simple analogy is that page-blobs are like a regular disk and 
> the basic API is like a low-level device driver.
> See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
> introductory material.
> The primary driving scenario for page-blob support is for HBase transaction 
> log files which require an access pattern of many small writes.  Additional 
> scenarios can also be supported.
> Configuration:
> The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
> determine whether to create a block- or page-blob.  To permit scenarios where 
> application code doesn't know about the details of azure storage we would 
> like the configuration to be Aspect-style, ie configured by the Administrator 
> and transparent to the application. The current solution is to use hadoop 
> configuration to declare a list of page-blob folders -- Azure Filesystem for 
> Hadoop will create files in these folders using page-blob flavor.  The 
> configuration key is "fs.azure.page.blob.dir", and description can be found 
> in AzureNativeFileSystemStore.java.
> Code changes:
> - refactor of basic Azure Filesystem code to use a general BlobWrapper and 
> specialized BlockBlobWrapper vs PageBlobWrapper
> - introduction of PageBlob support (read, write, etc)
> - miscellaneous changes such as umask handling, implementation of 
> createNonRecursive(), flush/hflush/hsync.
> - new unit tests.
> Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
> Mike Liddell.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10809:
--

Attachment: HADOOP-10809.1.patch

> hadoop-azure: page blob support
> ---
>
> Key: HADOOP-10809
> URL: https://issues.apache.org/jira/browse/HADOOP-10809
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools
>Reporter: Mike Liddell
> Attachments: HADOOP-10809.1.patch
>
>
> Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
> Block-blobs are the general purpose kind that support convenient APIs and are 
> the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).
> Page-blobs use the same namespace as block-blobs but provide a different 
> low-level feature set.  Most importantly, page-blobs can cope with an 
> effectively infinite number of small accesses whereas block-blobs can only 
> tolerate 50K appends before relatively manual rewriting of the data is 
> necessary.  A simple analogy is that page-blobs are like a regular disk and 
> the basic API is like a low-level device driver.
> See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
> introductory material.
> The primary driving scenario for page-blob support is for HBase transaction 
> log files which require an access pattern of many small writes.  Additional 
> scenarios can also be supported.
> Configuration:
> The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
> determine whether to create a block- or page-blob.  To permit scenarios where 
> application code doesn't know about the details of azure storage we would 
> like the configuration to be Aspect-style, ie configured by the Administrator 
> and transparent to the application. The current solution is to use hadoop 
> configuration to declare a list of page-blob folders -- Azure Filesystem for 
> Hadoop will create files in these folders using page-blob flavor.  The 
> configuration key is "fs.azure.page.blob.dir", and description can be found 
> in AzureNativeFileSystemStore.java.
> Code changes:
> - refactor of basic Azure Filesystem code to use a general BlobWrapper and 
> specialized BlockBlobWrapper vs PageBlobWrapper
> - introduction of PageBlob support (read, write, etc)
> - miscellaneous changes such as umask handling, implementation of 
> createNonRecursive(), flush/hflush/hsync.
> - new unit tests.
> Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
> Mike Liddell.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10809:
--

Description: 
Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
Block-blobs are the general purpose kind that support convenient APIs and are 
the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).

Page-blobs use the same namespace as block-blobs but provide a different 
low-level feature set.  Most importantly, page-blobs can cope with an 
effectively infinite number of small accesses whereas block-blobs can only 
tolerate 50K appends before relatively manual rewriting of the data is 
necessary.  A simple analogy is that page-blobs are like a regular disk and the 
basic API is like a low-level device driver.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log 
files which require an access pattern of many small writes.  Additional 
scenarios can also be supported.

Configuration:
The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
determine whether to create a block- or page-blob.  To permit scenarios where 
application code doesn't know about the details of azure storage we would like 
the configuration to be Aspect-style, ie configured by the Administrator and 
transparent to the application. The current solution is to use hadoop 
configuration to declare a list of page-blob folders -- Azure Filesystem for 
Hadoop will create files in these folders using page-blob flavor.  The 
configuration key is "fs.azure.page.blob.dir", and description can be found in 
AzureNativeFileSystemStore.java.

Code changes:
- refactor of basic Azure Filesystem code to use a general BlobWrapper and 
specialized BlockBlobWrapper vs PageBlobWrapper
- introduction of PageBlob support (read, write, etc)
- miscellaneous changes such as umask handling, implementation of 
createNonRecursive(), flush/hflush/hsync.
- new unit tests.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
Mike Liddell.

  was:
Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
Block-blobs are the general purpose kind that support convenient APIs and are 
the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).

Page-blobs use the same namespace as block-blobs but provide a different 
low-level feature set.  Most importantly, page-blobs can cope with an 
effectively infinite number of small accesses whereas block-blobs can only 
tolerate 50K appends before relatively manual rewriting of the data is 
necessary.  The simplest analogy is that page-blobs are like a normal 
filesystem (eg FAT) and the API is like a low-level device driver.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log 
files which require an access pattern of many small writes.  Additional 
scenarios can also be supported.

Configuration:
The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
determine whether to create a block- or page-blob.  To permit scenarios where 
application code doesn't know about the details of azure storage we would like 
the configuration to be Aspect-style, ie configured by the Administrator and 
transparent to the application. The current solution is to use hadoop 
configuration to declare a list of page-blob folders -- Azure Filesystem for 
Hadoop will create files in these folders using page-blob flavor.  The 
configuration key is "fs.azure.page.blob.dir", and description can be found in 
AzureNativeFileSystemStore.java.

Code changes:
- refactor of basic Azure Filesystem code to use a general BlobWrapper and 
specialized BlockBlobWrapper vs PageBlobWrapper
- introduction of PageBlob support (read, write, etc)
- miscellaneous changes such as umask handling, implementation of 
createNonRecursive(), flush/hflush/hsync.
- new unit tests.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
Mike Liddell.


> hadoop-azure: page blob support
> ---
>
> Key: HADOOP-10809
> URL: https://issues.apache.org/jira/browse/HADOOP-10809
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools
>Reporter: Mike Liddell
>
> Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
> Block-blobs are the general purpose kind that support convenient APIs and are 
> the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).
> Page-blobs use the same namespace as block-blobs but provide a different 
> low-level feature set.  Most importantly, page-blobs can cope with an 
> effectively infinite number of small accesses whereas block-blobs can

[jira] [Updated] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10809:
--

Description: 
Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
Block-blobs are the general purpose kind that support convenient APIs and are 
the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).

Page-blobs use the same namespace as block-blobs but provide a different 
low-level feature set.  Most importantly, page-blobs can cope with an 
effectively infinite number of small accesses whereas block-blobs can only 
tolerate 50K appends before relatively manual rewriting of the data is 
necessary.  The simplest analogy is that page-blobs are like a normal 
filesystem (eg FAT) and the API is like a low-level device driver.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log 
files which require an access pattern of many small writes.  Additional 
scenarios can also be supported.

Configuration:
The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
determine whether to create a block- or page-blob.  To permit scenarios where 
application code doesn't know about the details of azure storage we would like 
the configuration to be Aspect-style, ie configured by the Administrator and 
transparent to the application. The current solution is to use hadoop 
configuration to declare a list of page-blob folders -- Azure Filesystem for 
Hadoop will create files in these folders using page-blob flavor.  The 
configuration key is "fs.azure.page.blob.dir", and description can be found in 
AzureNativeFileSystemStore.java.

Code changes:
- refactor of basic Azure Filesystem code to use a general BlobWrapper and 
specialized BlockBlobWrapper vs PageBlobWrapper
- introduction of PageBlob support (read, write, etc)
- miscellaneous changes such as umask handling, implementation of 
createNonRecursive(), flush/hflush/hsync.
- new unit tests.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
Mike Liddell.

  was:
Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
Block-blobs are the general purpose kind that support convenient APIs and are 
the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).

Page-blobs are more difficult to use but provide a different feature set.  Most 
importantly, page-blobs can cope with an effectively infinite number of small 
accesses whereas block-blobs can only tolerate 50K appends before relatively 
manual rewriting of the data is necessary.  The simplest analogy is that 
page-blobs are like a normal filesystem (eg FAT) and the API is like a 
low-level device driver.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log 
files which require an access pattern of many small writes.  Additional 
scenarios can also be supported.

Configuration:
The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
determine whether to create a block- or page-blob.  To permit scenarios where 
application code doesn't know about the details of azure storage we would like 
the configuration to be Aspect-style, ie configured by the Administrator and 
transparent to the application. The current solution is to use hadoop 
configuration to declare a list of page-blob folders -- Azure Filesystem for 
Hadoop will create files in these folders using page-blob flavor.  The 
configuration key is "fs.azure.page.blob.dir", and description can be found in 
AzureNativeFileSystemStore.java.

Code changes:
- refactor of basic Azure Filesystem code to use a general BlobWrapper and 
specialized BlockBlobWrapper vs PageBlobWrapper
- introduction of PageBlob support (read, write, etc)
- miscellaneous changes such as umask handling, implementation of 
createNonRecursive(), flush/hflush/hsync.
- new unit tests.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
Mike Liddell.


> hadoop-azure: page blob support
> ---
>
> Key: HADOOP-10809
> URL: https://issues.apache.org/jira/browse/HADOOP-10809
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools
>Reporter: Mike Liddell
>
> Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
> Block-blobs are the general purpose kind that support convenient APIs and are 
> the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).
> Page-blobs use the same namespace as block-blobs but provide a different 
> low-level feature set.  Most importantly, page-blobs can cope with an 
> effectively infinite number of small accesses whereas block-blobs can only 
> t

[jira] [Created] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)
Mike Liddell created HADOOP-10809:
-

 Summary: hadoop-azure: page blob support
 Key: HADOOP-10809
 URL: https://issues.apache.org/jira/browse/HADOOP-10809
 Project: Hadoop Common
  Issue Type: Improvement
  Components: tools
Reporter: Mike Liddell


Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
Block-blobs are the general purpose kind that support convenient APIs and are 
the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).

Page-blobs are more difficult to use but provide a different feature set.  Most 
importantly, page-blobs can cope with an effectively infinite number of small 
accesses whereas block-blobs can only tolerate 50K appends before relatively 
manual rewriting of the data is necessary.  The simplest analogy is that 
page-blobs are like a normal filesystem (eg FAT) and the API is like a 
low-level device driver.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log 
files which require an access pattern of many small writes.  Additional 
scenarios can also be supported.

Configuration:
The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
determine whether to create a block- or page-blob.  To permit scenarios where 
application code doesn't know about the details of azure storage we would like 
the configuration to be Aspect-style, ie configured by the Administrator and 
transparent to the application. The current solution is to use hadoop 
configuration to declare a list of page-blob folders -- Azure Filesystem for 
Hadoop will create files in these folders using page-blob flavor.  The 
configuration key is "fs.azure.page.blob.dir", and description can be found in 
AzureNativeFileSystemStore.java.

Code changes:
- refactor of basic Azure Filesystem code to use a general BlobWrapper and 
specialized BlockBlobWrapper vs PageBlobWrapper
- introduction of PageBlob support (read, write, etc)
- miscellaneous changes such as umask handling, implementation of 
createNonRecursive(), flush/hflush/hsync.
- new unit tests.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
Mike Liddell.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-24 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10728:
--

Attachment: HADOOP-10728.3.patch

new patch addressing:
- license header
- simplification of pom.xml  (use default behavior for src\test\resources\)

> Metrics system for Windows Azure Storage Filesystem
> ---
>
> Key: HADOOP-10728
> URL: https://issues.apache.org/jira/browse/HADOOP-10728
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Reporter: Mike Liddell
>Assignee: Mike Liddell
> Attachments: HADOOP-10728.2.patch, HADOOP-10728.3.patch
>
>
> Add a metrics2 source for the Windows Azure Filesystem driver that was 
> introduced with HADOOP-9629.
> AzureFileSystemInstrumentation is the new MetricsSource.  
> AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
> record statistics and some machinery is added for the accumulation of 
> 'rolling average' statistics.
> Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.
> h2. Credits and history
> Credit for this work goes to the early team: [~minwei], [~davidlao], 
> [~lengningliu] and [~stojanovic] as well as multiple people who have taken 
> over this work since then (hope I don't forget anyone): [~dexterb], Johannes 
> Klein, [~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], 
> [~xifang], and [~chuanliu].



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work started] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-23 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-10728 started by Mike Liddell.

> Metrics system for Windows Azure Storage Filesystem
> ---
>
> Key: HADOOP-10728
> URL: https://issues.apache.org/jira/browse/HADOOP-10728
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Reporter: Mike Liddell
>Assignee: Mike Liddell
> Attachments: HADOOP-10728.2.patch
>
>
> Add a metrics2 source for the Windows Azure Filesystem driver that was 
> introduced with HADOOP-9629.
> AzureFileSystemInstrumentation is the new MetricsSource.  
> AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
> record statistics and some machinery is added for the accumulation of 
> 'rolling average' statistics.
> Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.
> h2. Credits and history
> Credit for this work goes to the early team: [~minwei], [~davidlao], 
> [~lengningliu] and [~stojanovic] as well as multiple people who have taken 
> over this work since then (hope I don't forget anyone): [~dexterb], Johannes 
> Klein, [~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], 
> [~xifang], and [~chuanliu].



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-23 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10728:
--

Status: Patch Available  (was: In Progress)

> Metrics system for Windows Azure Storage Filesystem
> ---
>
> Key: HADOOP-10728
> URL: https://issues.apache.org/jira/browse/HADOOP-10728
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Reporter: Mike Liddell
>Assignee: Mike Liddell
> Attachments: HADOOP-10728.2.patch
>
>
> Add a metrics2 source for the Windows Azure Filesystem driver that was 
> introduced with HADOOP-9629.
> AzureFileSystemInstrumentation is the new MetricsSource.  
> AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
> record statistics and some machinery is added for the accumulation of 
> 'rolling average' statistics.
> Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.
> h2. Credits and history
> Credit for this work goes to the early team: [~minwei], [~davidlao], 
> [~lengningliu] and [~stojanovic] as well as multiple people who have taken 
> over this work since then (hope I don't forget anyone): [~dexterb], Johannes 
> Klein, [~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], 
> [~xifang], and [~chuanliu].



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-23 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10728:
--

Attachment: HADOOP-10728.2.patch

new patch: HADOOP-10728.2.patch

- moved finalizer from BandwidthGaugeUpdater to AzureNativeFileSystem.  (it 
would logically be better on the former, but those instances are still attached 
to GC-roots when a filesystem instance gets GCed.  This was root cause of 
testFinalizerThreadShutdown failure.)
- revised testFinalizerThreadShutdown to accurately track thread counts.
- fixed pom.xml to include the metrics configuration file for testing
  (we now include * from src/test/resources)
- apache headers added to all files
- javadoc issues fixed.
- findbugs issues fixed.
- minor tweak to README.txt
- minor tweak to .gitignore

> Metrics system for Windows Azure Storage Filesystem
> ---
>
> Key: HADOOP-10728
> URL: https://issues.apache.org/jira/browse/HADOOP-10728
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Reporter: Mike Liddell
>Assignee: Mike Liddell
> Attachments: HADOOP-10728.2.patch
>
>
> Add a metrics2 source for the Windows Azure Filesystem driver that was 
> introduced with HADOOP-9629.
> AzureFileSystemInstrumentation is the new MetricsSource.  
> AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
> record statistics and some machinery is added for the accumulation of 
> 'rolling average' statistics.
> Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.
> h2. Credits and history
> Credit for this work goes to the early team: [~minwei], [~davidlao], 
> [~lengningliu] and [~stojanovic] as well as multiple people who have taken 
> over this work since then (hope I don't forget anyone): [~dexterb], Johannes 
> Klein, [~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], 
> [~xifang], and [~chuanliu].



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-19 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10728:
--

Attachment: HADOOP-10728.1.patch

> Metrics system for Windows Azure Storage Filesystem
> ---
>
> Key: HADOOP-10728
> URL: https://issues.apache.org/jira/browse/HADOOP-10728
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Reporter: Mike Liddell
>Assignee: Mike Liddell
> Attachments: HADOOP-10728.1.patch
>
>
> Add a metrics2 source for the Windows Azure Filesystem driver that was 
> introduced with HADOOP-9629.
> AzureFileSystemInstrumentation is the new MetricsSource.  
> AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
> record statistics and some machinery is added for the accumulation of 
> 'rolling average' statistics.
> Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.
> h2. Credits and history
> Credit for this work goes to the early team: [~minwei], [~davidlao], 
> [~lengningliu] and [~stojanovic] as well as multiple people who have taken 
> over this work since then (hope I don't forget anyone): [~dexterb], Johannes 
> Klein, [~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], 
> [~xifang], and [~chuanliu].



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-19 Thread Mike Liddell (JIRA)
Mike Liddell created HADOOP-10728:
-

 Summary: Metrics system for Windows Azure Storage Filesystem
 Key: HADOOP-10728
 URL: https://issues.apache.org/jira/browse/HADOOP-10728
 Project: Hadoop Common
  Issue Type: New Feature
  Components: tools
Reporter: Mike Liddell
Assignee: Mike Liddell


Add a metrics2 source for the Windows Azure Filesystem driver that was 
introduced with HADOOP-9629.

AzureFileSystemInstrumentation is the new MetricsSource.  

AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
record statistics and some machinery is added for the accumulation of 'rolling 
average' statistics.

Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.

h2. Credits and history
Credit for this work goes to the early team: [~minwei], [~davidlao], 
[~lengningliu] and [~stojanovic] as well as multiple people who have taken over 
this work since then (hope I don't forget anyone): [~dexterb], Johannes Klein, 
[~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
[~chuanliu].




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9559) When metrics system is restarted MBean names get incorrectly flagged as dupes

2014-06-19 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9559:
-

Attachment: (was: HADOOP-9559.2.txt)

> When metrics system is restarted MBean names get incorrectly flagged as dupes
> -
>
> Key: HADOOP-9559
> URL: https://issues.apache.org/jira/browse/HADOOP-9559
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Mostafa Elhemali
> Attachments: HADOOP-9559.2.patch, HADOOP-9559.patch
>
>
> In the Metrics2 system, every source gets registered as an MBean name, which 
> gets put into a unique name pool in the singleton DefaultMetricsSystem 
> object. The problem is that when the metrics system is shutdown (which 
> unregisters the MBeans) this unique name pool is left as is, so if the 
> metrics system is started again every attempt to register the same MBean 
> names fails (exception is eaten and a warning is logged).
> I think the fix here is to remove the name from the unique name pool if an 
> MBean is unregistered, since it's OK at this point to add it again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9559) When metrics system is restarted MBean names get incorrectly flagged as dupes

2014-06-19 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9559:
-

Attachment: HADOOP-9559.2.patch

> When metrics system is restarted MBean names get incorrectly flagged as dupes
> -
>
> Key: HADOOP-9559
> URL: https://issues.apache.org/jira/browse/HADOOP-9559
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Mostafa Elhemali
> Attachments: HADOOP-9559.2.patch, HADOOP-9559.patch
>
>
> In the Metrics2 system, every source gets registered as an MBean name, which 
> gets put into a unique name pool in the singleton DefaultMetricsSystem 
> object. The problem is that when the metrics system is shutdown (which 
> unregisters the MBeans) this unique name pool is left as is, so if the 
> metrics system is started again every attempt to register the same MBean 
> names fails (exception is eaten and a warning is logged).
> I think the fix here is to remove the name from the unique name pool if an 
> MBean is unregistered, since it's OK at this point to add it again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9559) When metrics system is restarted MBean names get incorrectly flagged as dupes

2014-06-19 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038195#comment-14038195
 ] 

Mike Liddell commented on HADOOP-9559:
--

[~vicaya] I think DefaultMetricsSystem#sourceName is not new code (and it is 
used by MetricsSystemImpl). so Nofix.

new patch will be added with small amendments: just adding @VisibleForTesting 
to MetricsSourceAdapter#getMBeanName() and MetricsSystemImpl#getSourceAdapter

> When metrics system is restarted MBean names get incorrectly flagged as dupes
> -
>
> Key: HADOOP-9559
> URL: https://issues.apache.org/jira/browse/HADOOP-9559
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Mostafa Elhemali
> Attachments: HADOOP-9559.2.txt, HADOOP-9559.patch
>
>
> In the Metrics2 system, every source gets registered as an MBean name, which 
> gets put into a unique name pool in the singleton DefaultMetricsSystem 
> object. The problem is that when the metrics system is shutdown (which 
> unregisters the MBeans) this unique name pool is left as is, so if the 
> metrics system is started again every attempt to register the same MBean 
> names fails (exception is eaten and a warning is logged).
> I think the fix here is to remove the name from the unique name pool if an 
> MBean is unregistered, since it's OK at this point to add it again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9559) When metrics system is restarted MBean names get incorrectly flagged as dupes

2014-06-19 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9559:
-

Attachment: HADOOP-9559.2.txt

> When metrics system is restarted MBean names get incorrectly flagged as dupes
> -
>
> Key: HADOOP-9559
> URL: https://issues.apache.org/jira/browse/HADOOP-9559
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Mostafa Elhemali
> Attachments: HADOOP-9559.2.txt, HADOOP-9559.patch
>
>
> In the Metrics2 system, every source gets registered as an MBean name, which 
> gets put into a unique name pool in the singleton DefaultMetricsSystem 
> object. The problem is that when the metrics system is shutdown (which 
> unregisters the MBeans) this unique name pool is left as is, so if the 
> metrics system is started again every attempt to register the same MBean 
> names fails (exception is eaten and a warning is logged).
> I think the fix here is to remove the name from the unique name pool if an 
> MBean is unregistered, since it's OK at this point to add it again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-09 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025857#comment-14025857
 ] 

Mike Liddell commented on HADOOP-9629:
--

Thanks Chris! I have applied your patch - looks good.

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch, HADOOP-9629.trunk.4.patch, 
> HADOOP-9629.trunk.5.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-09 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.4.patch

new patch: HADOOP-9629.trunk.4.patch
 - removed comments re: Thread.currentThread.interrupt()
   (see reviewboard for the discussion)



> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch, HADOOP-9629.trunk.4.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021506#comment-14021506
 ] 

Mike Liddell commented on HADOOP-9629:
--

Previous comment about new patch file had name wrong: The new patch is 
HADOOP-9629.trunk.3.patch

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021499#comment-14021499
 ] 

Mike Liddell commented on HADOOP-9629:
--

new patch: HADOOP-9629.trunk.4.patch
 - addresses code-review comments from [~cnauroth], see 
https://reviews.apache.org/r/22096/
 - adds InterfaceAudience and InterfaceStability annotations to the main 
classes.

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.3.patch

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: (was: HADOOP-9629.trunk.3.patch)

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021497#comment-14021497
 ] 

Mike Liddell commented on HADOOP-9629:
--

The annotations and suggested usages sound good.
The only changes that I suggest are:
- AzureException: Public + Evolving
- WasbFsck: Public + Evolving.

sound good?

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.3.patch

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000380#comment-14000380
 ] 

Mike Liddell commented on HADOOP-9629:
--

Added a document with information for developers / code-reviewers.

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629 - Azure Filesystem - Information for developers.docx

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.1.patch

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mostafa Elhemali
> Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
> HADOOP-9629.patch, HADOOP-9629.trunk.1.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629 - Azure Filesystem - Information for developers.pdf

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.2.patch

New patch:
- added apache headers to XML files
- fixed the suppression of m2e warning (in pom.xml)

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mostafa Elhemali
> Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
> HADOOP-9629.patch, HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell reassigned HADOOP-9629:


Assignee: Mike Liddell  (was: Mostafa Elhemali)

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
> HADOOP-9629.patch, HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999338#comment-13999338
 ] 

Mike Liddell commented on HADOOP-9629:
--

A revised approach is now being used so that the Azure driver is handled the 
same way as the open-stack driver:
 - The Azure FileSystem driver is now a separate project 
hadoop-tools\hadoop-azure

As part of moving to a separate project area, the following have also been done:
- findbugs
- checkstyle
- code-cleanup based on the above and also based on Apache formatting rules 
- remove metrics business for now (it will come back later as a dedicated patch)

Namespace altered from org.apache.hadoop.fs.azurenative -> 
org.apache.hadoop.fs.azure

New approach is HADOOP-9629.trunk.1.patch 


> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mostafa Elhemali
> Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
> HADOOP-9629.patch, HADOOP-9629.trunk.1.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10124) Option to shuffle splits of equal size

2013-11-22 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830371#comment-13830371
 ] 

Mike Liddell commented on HADOOP-10124:
---

patch added.
new flag to govern use of new logic: mapred.submit.shuffle.equalsized.splits. 
Default=false. 
If flag is true, JobClient will shuffle the splits that share a common size.

> Option to shuffle splits of equal size
> --
>
> Key: HADOOP-10124
> URL: https://issues.apache.org/jira/browse/HADOOP-10124
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mike Liddell
> Attachments: HADOOP-10124.1.patch
>
>
> Mapreduce split calculation has the following base logic (via JobClient and 
> the major InputFormat implementations ):
> ◾enumerate input files in natural (aka linear) order.
> ◾create one split for each 'block-size' of each input. Apart from 
> rack-awareness, combining and so on, the input file order remains in its 
> natural order.
> ◾sort the splits by size using a stable sort based on splitsize.
> When data from multiple storage services are used in a single hadoop job, we 
> get better I/O utilization if the list of splits does round-robin or 
> random-access across the services. 
> The particular scenario arises in Azure HDInsight where jobs can easily read 
> from many storage accounts and each storage account has hard limits on 
> throughtput.  Concurrent access to the accounts is substantially better than 
>  
> Two common scenarios can cause non-ideal access pattern:
>  1. many/all input files are the same size
>  2. files have different sizes, but many/all input files have size>blocksize.
>  In the second scenario, for each file will have one or more splits with size 
> exactly equal to block size so it basically degenerates to the first scenario.
> There are various ways to solve the problem but the simplest is to alter the 
> mapreduce JobClient to sort splits by size _and_ randomize the order of 
> splits with equal size. This keeps the old behavior effectively unchanged 
> while also fixing both common problematic scenarios.
> Some rare scenarios will still suffer bad access patterns due. For example if 
> two storage accounts are used and the files from one storage account are all 
> smaller than from the other then problems can arise. Addressing these 
> scenarios would be further work, perhaps by completely randomizing the split 
> order. These problematic scenarios are considered rare and not requiring 
> immediate attention.
> If further algorithms for split ordering are necessary, the implementation in 
> JobClient will change to being interface-based (eg interface splitOrderer) 
> with various standard implementations.  At this time there is only the need 
> for two implementations and so simple Boolean flag and if/then logic is used.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HADOOP-10124) Option to shuffle splits of equal size

2013-11-22 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10124:
--

Attachment: HADOOP-10124.1.patch

> Option to shuffle splits of equal size
> --
>
> Key: HADOOP-10124
> URL: https://issues.apache.org/jira/browse/HADOOP-10124
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mike Liddell
> Attachments: HADOOP-10124.1.patch
>
>
> Mapreduce split calculation has the following base logic (via JobClient and 
> the major InputFormat implementations ):
> ◾enumerate input files in natural (aka linear) order.
> ◾create one split for each 'block-size' of each input. Apart from 
> rack-awareness, combining and so on, the input file order remains in its 
> natural order.
> ◾sort the splits by size using a stable sort based on splitsize.
> When data from multiple storage services are used in a single hadoop job, we 
> get better I/O utilization if the list of splits does round-robin or 
> random-access across the services. 
> The particular scenario arises in Azure HDInsight where jobs can easily read 
> from many storage accounts and each storage account has hard limits on 
> throughtput.  Concurrent access to the accounts is substantially better than 
>  
> Two common scenarios can cause non-ideal access pattern:
>  1. many/all input files are the same size
>  2. files have different sizes, but many/all input files have size>blocksize.
>  In the second scenario, for each file will have one or more splits with size 
> exactly equal to block size so it basically degenerates to the first scenario.
> There are various ways to solve the problem but the simplest is to alter the 
> mapreduce JobClient to sort splits by size _and_ randomize the order of 
> splits with equal size. This keeps the old behavior effectively unchanged 
> while also fixing both common problematic scenarios.
> Some rare scenarios will still suffer bad access patterns due. For example if 
> two storage accounts are used and the files from one storage account are all 
> smaller than from the other then problems can arise. Addressing these 
> scenarios would be further work, perhaps by completely randomizing the split 
> order. These problematic scenarios are considered rare and not requiring 
> immediate attention.
> If further algorithms for split ordering are necessary, the implementation in 
> JobClient will change to being interface-based (eg interface splitOrderer) 
> with various standard implementations.  At this time there is only the need 
> for two implementations and so simple Boolean flag and if/then logic is used.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HADOOP-10124) Option to shuffle splits of equal size

2013-11-22 Thread Mike Liddell (JIRA)
Mike Liddell created HADOOP-10124:
-

 Summary: Option to shuffle splits of equal size
 Key: HADOOP-10124
 URL: https://issues.apache.org/jira/browse/HADOOP-10124
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mike Liddell


Mapreduce split calculation has the following base logic (via JobClient and the 
major InputFormat implementations ):
◾enumerate input files in natural (aka linear) order.
◾create one split for each 'block-size' of each input. Apart from 
rack-awareness, combining and so on, the input file order remains in its 
natural order.
◾sort the splits by size using a stable sort based on splitsize.

When data from multiple storage services are used in a single hadoop job, we 
get better I/O utilization if the list of splits does round-robin or 
random-access across the services. 
The particular scenario arises in Azure HDInsight where jobs can easily read 
from many storage accounts and each storage account has hard limits on 
throughtput.  Concurrent access to the accounts is substantially better than 
 
Two common scenarios can cause non-ideal access pattern:
 1. many/all input files are the same size
 2. files have different sizes, but many/all input files have size>blocksize.
 In the second scenario, for each file will have one or more splits with size 
exactly equal to block size so it basically degenerates to the first scenario.

There are various ways to solve the problem but the simplest is to alter the 
mapreduce JobClient to sort splits by size _and_ randomize the order of splits 
with equal size. This keeps the old behavior effectively unchanged while also 
fixing both common problematic scenarios.

Some rare scenarios will still suffer bad access patterns due. For example if 
two storage accounts are used and the files from one storage account are all 
smaller than from the other then problems can arise. Addressing these scenarios 
would be further work, perhaps by completely randomizing the split order. These 
problematic scenarios are considered rare and not requiring immediate attention.

If further algorithms for split ordering are necessary, the implementation in 
JobClient will change to being interface-based (eg interface splitOrderer) with 
various standard implementations.  At this time there is only the need for two 
implementations and so simple Boolean flag and if/then logic is used.




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-03-14 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9371:
-

Attachment: HADOOP-9361.2.patch

Added HADOOP-9361.2.patch with minor edits.   
 - additional assumptions
 - changed detail for fs.delete("/")

This patch was created via svn diff is not a delta over the original patch.

Please let me know if the patch format is incorrect.

> Define Semantics of FileSystem and FileContext more rigorously
> --
>
> Key: HADOOP-9371
> URL: https://issues.apache.org/jira/browse/HADOOP-9371
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 1.2.0, 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
> HadoopFilesystemContract.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The semantics of {{FileSystem}} and {{FileContext}} are not completely 
> defined in terms of 
> # core expectations of a filesystem
> # consistency requirements.
> # concurrency requirements.
> # minimum scale limits
> Furthermore, methods are not defined strictly enough in terms of their 
> outcomes and failure modes.
> The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-03-13 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601901#comment-13601901
 ] 

Mike Liddell commented on HADOOP-9371:
--

A few items for consideration:

Possible additions to 'implicit assumption': 
 - paths are represented as Unicode strings
 - equality/comparison of paths is based on binary content. this implies 
case-sensitivity and no locale-specific comparison rules.

>>The data added to a file during a write or append MAY be visible during while 
>>the write operation is in progress.
- Allowing read(s) during write seems to break the subsequent rule that 
"readers always see consistent data".

>> Deleting the root path, /, MUST fail iff recursive==false.
- If the root path is empty, it seems reasonable for delete("/",false) to 
succeed but to have no effect.

>> After a file is created, all ls operations on the file and parent directory 
>> MUST not find the file
- copy-paste error -> "after a file is deleted ..."

>> Security: if a caller has the rights to list a directory, it has the rights 
>> to list directories all the way up the tree.
- This point raises lots of interesting questions and requirements for 
individual methods.  A section on security assumptions/rules would be great.




> Define Semantics of FileSystem and FileContext more rigorously
> --
>
> Key: HADOOP-9371
> URL: https://issues.apache.org/jira/browse/HADOOP-9371
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 1.2.0, 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-9361.patch, HadoopFilesystemContract.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The semantics of {{FileSystem}} and {{FileContext}} are not completely 
> defined in terms of 
> # core expectations of a filesystem
> # consistency requirements.
> # concurrency requirements.
> # minimum scale limits
> Furthermore, methods are not defined strictly enough in terms of their 
> outcomes and failure modes.
> The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8562) Enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

2013-02-27 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588848#comment-13588848
 ] 

Mike Liddell commented on HADOOP-8562:
--

+1 non-binding

> Enhancements to Hadoop for Windows Server and Windows Azure development and 
> runtime environments
> 
>
> Key: HADOOP-8562
> URL: https://issues.apache.org/jira/browse/HADOOP-8562
> Project: Hadoop Common
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: branch-trunk-win.patch, branch-trunk-win.patch, 
> branch-trunk-win.patch, branch-trunk-win.patch, branch-trunk-win.patch, 
> branch-trunk-win.patch, branch-trunk-win.patch, branch-trunk-win.patch, 
> test-untar.tar, test-untar.tgz
>
>
> This JIRA tracks the work that needs to be done on trunk to enable Hadoop to 
> run on Windows Server and Azure environments. This incorporates porting 
> relevant work from the similar effort on branch 1 tracked via HADOOP-8079.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8902) Enable Gridmix v1 & v2 benchmarks on Windows platform

2012-10-17 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478374#comment-13478374
 ] 

Mike Liddell commented on HADOOP-8902:
--

patch updated: HADOOP-8902.branch-1-win.contribscripts.patch

> Enable Gridmix v1 & v2 benchmarks on Windows platform
> -
>
> Key: HADOOP-8902
> URL: https://issues.apache.org/jira/browse/HADOOP-8902
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: benchmarks
>Affects Versions: 1-win
>Reporter: Mike Liddell
> Attachments: HADOOP-8902.branch-1-win.contribscripts.patch, 
> HADOOP-8902.patch
>
>
> Gridmix v1 and v2 benchmarks do not run on Windows as they require bash 
> shell.  These scripts have been ported to Windows cmd-scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8902) Enable Gridmix v1 & v2 benchmarks on Windows platform

2012-10-17 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-8902:
-

Attachment: HADOOP-8902.branch-1-win.contribscripts.patch

> Enable Gridmix v1 & v2 benchmarks on Windows platform
> -
>
> Key: HADOOP-8902
> URL: https://issues.apache.org/jira/browse/HADOOP-8902
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: benchmarks
>Affects Versions: 1-win
>Reporter: Mike Liddell
> Attachments: HADOOP-8902.branch-1-win.contribscripts.patch, 
> HADOOP-8902.patch
>
>
> Gridmix v1 and v2 benchmarks do not run on Windows as they require bash 
> shell.  These scripts have been ported to Windows cmd-scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8902) Enable Gridmix v1 & v2 benchmarks on Windows platform

2012-10-09 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-8902:
-

Status: Open  (was: Patch Available)

> Enable Gridmix v1 & v2 benchmarks on Windows platform
> -
>
> Key: HADOOP-8902
> URL: https://issues.apache.org/jira/browse/HADOOP-8902
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: benchmarks
>Affects Versions: 1-win
>Reporter: Mike Liddell
> Attachments: HADOOP-8902.patch
>
>
> Gridmix v1 and v2 benchmarks do not run on Windows as they require bash 
> shell.  These scripts have been ported to Windows cmd-scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8902) Enable Gridmix v1 & v2 benchmarks on Windows platform

2012-10-08 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-8902:
-

Status: Patch Available  (was: Open)

> Enable Gridmix v1 & v2 benchmarks on Windows platform
> -
>
> Key: HADOOP-8902
> URL: https://issues.apache.org/jira/browse/HADOOP-8902
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: benchmarks
>Affects Versions: 1-win
>Reporter: Mike Liddell
> Attachments: HADOOP-8902.patch
>
>
> Gridmix v1 and v2 benchmarks do not run on Windows as they require bash 
> shell.  These scripts have been ported to Windows cmd-scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8902) Enable Gridmix v1 & v2 benchmarks on Windows platform

2012-10-08 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-8902:
-

Attachment: HADOOP-8902.patch

> Enable Gridmix v1 & v2 benchmarks on Windows platform
> -
>
> Key: HADOOP-8902
> URL: https://issues.apache.org/jira/browse/HADOOP-8902
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: benchmarks
>Affects Versions: 1-win
>Reporter: Mike Liddell
> Attachments: HADOOP-8902.patch
>
>
> Gridmix v1 and v2 benchmarks do not run on Windows as they require bash 
> shell.  These scripts have been ported to Windows cmd-scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8902) Enable Gridmix v1 & v2 benchmarks on Windows platform

2012-10-08 Thread Mike Liddell (JIRA)
Mike Liddell created HADOOP-8902:


 Summary: Enable Gridmix v1 & v2 benchmarks on Windows platform
 Key: HADOOP-8902
 URL: https://issues.apache.org/jira/browse/HADOOP-8902
 Project: Hadoop Common
  Issue Type: Bug
  Components: benchmarks
Affects Versions: 1-win
Reporter: Mike Liddell


Gridmix v1 and v2 benchmarks do not run on Windows as they require bash shell.  
These scripts have been ported to Windows cmd-scripts.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira