[jira] [Updated] (HADOOP-10670) Allow AuthenticationFilter to respect signature secret file even without AuthenticationFilterInitializer

2014-06-08 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HADOOP-10670:
---

Component/s: security

> Allow AuthenticationFilter to respect signature secret file even without 
> AuthenticationFilterInitializer
> 
>
> Key: HADOOP-10670
> URL: https://issues.apache.org/jira/browse/HADOOP-10670
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Kai Zheng
>Assignee: Kai Zheng
>Priority: Minor
> Attachments: hadoop-10670.patch
>
>
> In Hadoop web console, by using AuthenticationFilterInitializer, it's allowed 
> to configure AuthenticationFilter for the required signature secret by 
> specifying signature.secret.file property. This improvement would also allow 
> this when AuthenticationFilterInitializer isn't used in situations like 
> webhdfs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10671) Single sign on between web console and webhdfs

2014-06-08 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HADOOP-10671:
---

Status: Patch Available  (was: Open)

Attached a simple patch:

It allows webhdfs to access configuration properties prefixed with 
'hadoop.http.authentication.' for web console, by simply transforming the 
properties with new prefix of 'dfs.web.authentication.' that can be picked up 
by webhdfs. Any manually configured properties for webhdfs with 
'dfs.web.authentication.' are of higher priority since they can override the 
ones from web console.

> Single sign on between web console and webhdfs
> --
>
> Key: HADOOP-10671
> URL: https://issues.apache.org/jira/browse/HADOOP-10671
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: hadoop-10671.patch
>
>
> Currently it's not able to single sign on between hadoop web console and 
> webhdfs since they don't share common configurations as required to, such as 
> signature secret to sign authenticaton token, and domain cookie etc. This 
> improvement would allow sso between the two, and also simplify the 
> configuration by removing the duplicate effort for the two parts.
> The sso makes sense because in current web console, it integrates webhdfs and 
> we should avoid redundant sign on in different mechanisms. This is necessary 
> when a certain authentication mechanism other than SPNEGO is desired across 
> web console and webhdfs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10671) Single sign on between web console and webhdfs

2014-06-08 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HADOOP-10671:
---

Attachment: hadoop-10671.patch

> Single sign on between web console and webhdfs
> --
>
> Key: HADOOP-10671
> URL: https://issues.apache.org/jira/browse/HADOOP-10671
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: hadoop-10671.patch
>
>
> Currently it's not able to single sign on between hadoop web console and 
> webhdfs since they don't share common configurations as required to, such as 
> signature secret to sign authenticaton token, and domain cookie etc. This 
> improvement would allow sso between the two, and also simplify the 
> configuration by removing the duplicate effort for the two parts.
> The sso makes sense because in current web console, it integrates webhdfs and 
> we should avoid redundant sign on in different mechanisms. This is necessary 
> when a certain authentication mechanism other than SPNEGO is desired across 
> web console and webhdfs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10670) Allow AuthenticationFilter to respect signature secret file even without AuthenticationFilterInitializer

2014-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021655#comment-14021655
 ] 

Hadoop QA commented on HADOOP-10670:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12648916/hadoop-10670.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4028//console

This message is automatically generated.

> Allow AuthenticationFilter to respect signature secret file even without 
> AuthenticationFilterInitializer
> 
>
> Key: HADOOP-10670
> URL: https://issues.apache.org/jira/browse/HADOOP-10670
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Kai Zheng
>Assignee: Kai Zheng
>Priority: Minor
> Attachments: hadoop-10670.patch
>
>
> In Hadoop web console, by using AuthenticationFilterInitializer, it's allowed 
> to configure AuthenticationFilter for the required signature secret by 
> specifying signature.secret.file property. This improvement would also allow 
> this when AuthenticationFilterInitializer isn't used in situations like 
> webhdfs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10671) Single sign on between web console and webhdfs

2014-06-08 Thread Kai Zheng (JIRA)
Kai Zheng created HADOOP-10671:
--

 Summary: Single sign on between web console and webhdfs
 Key: HADOOP-10671
 URL: https://issues.apache.org/jira/browse/HADOOP-10671
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Reporter: Kai Zheng
Assignee: Kai Zheng


Currently it's not able to single sign on between hadoop web console and 
webhdfs since they don't share common configurations as required to, such as 
signature secret to sign authenticaton token, and domain cookie etc. This 
improvement would allow sso between the two, and also simplify the 
configuration by removing the duplicate effort for the two parts.

The sso makes sense because in current web console, it integrates webhdfs and 
we should avoid redundant sign on in different mechanisms. This is necessary 
when a certain authentication mechanism other than SPNEGO is desired across web 
console and webhdfs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10640) Implement Namenode RPCs in HDFS native client

2014-06-08 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021649#comment-14021649
 ] 

Binglin Chang commented on HADOOP-10640:


 sizeof(struct hrpc_proxy) always larger than RPC_PROXY_USERDATA_MAX?
{code}
void *hrpc_proxy_alloc_userdata(struct hrpc_proxy *proxy, size_t size)
{
if (size > RPC_PROXY_USERDATA_MAX) {
return NULL;
}
return proxy->userdata;
}

struct hrpc_sync_ctx *hrpc_proxy_alloc_sync_ctx(struct hrpc_proxy *proxy)
{
struct hrpc_sync_ctx *ctx = 
hrpc_proxy_alloc_userdata(proxy, sizeof(struct hrpc_proxy));
if (!ctx) {
return NULL;
}
if (uv_sem_init(&ctx->sem, 0)) {
return NULL;
}
memset(&ctx, 0, sizeof(ctx));
return ctx;
}
{code}

> Implement Namenode RPCs in HDFS native client
> -
>
> Key: HADOOP-10640
> URL: https://issues.apache.org/jira/browse/HADOOP-10640
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: native
>Affects Versions: HADOOP-10388
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HADOOP-10640-pnative.001.patch, 
> HADOOP-10640-pnative.002.patch, HADOOP-10640-pnative.003.patch
>
>
> Implement the parts of libhdfs that just involve making RPCs to the Namenode, 
> such as mkdir, rename, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10670) Allow AuthenticationFilter to respect signature secret file even without AuthenticationFilterInitializer

2014-06-08 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HADOOP-10670:
---

Status: Patch Available  (was: Open)

Attached a patch. Changes summary:
1. Moving signature file reading from AuthenticationFilterInitializer to 
AuthenticationFilter:

2. And in AuthenticationFilter, if SIGNATURE_SECRET is configured, then use it; 
otherwise if SIGNATURE_SECRET_FILE is configured, then use it; otherwise 
generate a secret as before.


> Allow AuthenticationFilter to respect signature secret file even without 
> AuthenticationFilterInitializer
> 
>
> Key: HADOOP-10670
> URL: https://issues.apache.org/jira/browse/HADOOP-10670
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Kai Zheng
>Assignee: Kai Zheng
>Priority: Minor
> Attachments: hadoop-10670.patch
>
>
> In Hadoop web console, by using AuthenticationFilterInitializer, it's allowed 
> to configure AuthenticationFilter for the required signature secret by 
> specifying signature.secret.file property. This improvement would also allow 
> this when AuthenticationFilterInitializer isn't used in situations like 
> webhdfs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10670) Allow AuthenticationFilter to respect signature secret file even without AuthenticationFilterInitializer

2014-06-08 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HADOOP-10670:
---

Attachment: hadoop-10670.patch

> Allow AuthenticationFilter to respect signature secret file even without 
> AuthenticationFilterInitializer
> 
>
> Key: HADOOP-10670
> URL: https://issues.apache.org/jira/browse/HADOOP-10670
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Kai Zheng
>Assignee: Kai Zheng
>Priority: Minor
> Attachments: hadoop-10670.patch
>
>
> In Hadoop web console, by using AuthenticationFilterInitializer, it's allowed 
> to configure AuthenticationFilter for the required signature secret by 
> specifying signature.secret.file property. This improvement would also allow 
> this when AuthenticationFilterInitializer isn't used in situations like 
> webhdfs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10670) Allow AuthenticationFilter to respect signature secret file even without AuthenticationFilterInitializer

2014-06-08 Thread Kai Zheng (JIRA)
Kai Zheng created HADOOP-10670:
--

 Summary: Allow AuthenticationFilter to respect signature secret 
file even without AuthenticationFilterInitializer
 Key: HADOOP-10670
 URL: https://issues.apache.org/jira/browse/HADOOP-10670
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Kai Zheng
Assignee: Kai Zheng
Priority: Minor


In Hadoop web console, by using AuthenticationFilterInitializer, it's allowed 
to configure AuthenticationFilter for the required signature secret by 
specifying signature.secret.file property. This improvement would also allow 
this when AuthenticationFilterInitializer isn't used in situations like webhdfs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021558#comment-14021558
 ] 

Hadoop QA commented on HADOOP-10561:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12648901/HADOOP-10561.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.datanode.TestBPOfferService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4026//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4026//console

This message is automatically generated.

> Copy command with preserve option should handle Xattrs
> --
>
> Key: HADOOP-10561
> URL: https://issues.apache.org/jira/browse/HADOOP-10561
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
> Attachments: HADOOP-10561.1.patch, HADOOP-10561.2.patch, 
> HADOOP-10561.3.patch, HADOOP-10561.4.patch, HADOOP-10561.patch
>
>
> The design docs for Xattrs stated that we handle preserve options with copy 
> commands
> From doc:
> Preserve option of commands like “cp -p” shell command and “distcp -p” should 
> work on XAttrs. 
> In the case of source fs supports XAttrs but target fs does not support, 
> XAttrs will be ignored 
> with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10640) Implement Namenode RPCs in HDFS native client

2014-06-08 Thread Wenwu Peng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021541#comment-14021541
 ] 

Wenwu Peng commented on HADOOP-10640:
-

run command "NAMENODE_URI=hdfs://localhost:8020 ./test_libhdfs_meta_ops" and 
hit this error

hdfsBuilderConnect: ndfs failed to connect: 
org.apache.hadoop.native.HadoopCore.OutOfMemoryException: 
cnn_get_server_defaults: failed to allocate sync_ctx (error 12)
org.apache.hadoop.native.HadoopCore.OutOfMemoryException: 
cnn_get_server_defaults: failed to allocate sync_ctxerror: did not expect '0': 
'/root/hadoop-common/hadoop-native-core/src/main/native/test/fs/test_libhdfs_meta_ops.c
 at line 60'

> Implement Namenode RPCs in HDFS native client
> -
>
> Key: HADOOP-10640
> URL: https://issues.apache.org/jira/browse/HADOOP-10640
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: native
>Affects Versions: HADOOP-10388
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HADOOP-10640-pnative.001.patch, 
> HADOOP-10640-pnative.002.patch, HADOOP-10640-pnative.003.patch
>
>
> Implement the parts of libhdfs that just involve making RPCs to the Namenode, 
> such as mkdir, rename, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021511#comment-14021511
 ] 

Hadoop QA commented on HADOOP-9629:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12648902/HADOOP-9629.trunk.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
100 warning messages.
See 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4027//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-azure hadoop-tools/hadoop-tools-dist.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4027//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4027//console

This message is automatically generated.

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021506#comment-14021506
 ] 

Mike Liddell commented on HADOOP-9629:
--

Previous comment about new patch file had name wrong: The new patch is 
HADOOP-9629.trunk.3.patch

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021502#comment-14021502
 ] 

Hadoop QA commented on HADOOP-9629:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12648900/HADOOP-9629.trunk.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
100 warning messages.
See 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4025//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-azure hadoop-tools/hadoop-tools-dist.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4025//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4025//console

This message is automatically generated.

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021499#comment-14021499
 ] 

Mike Liddell commented on HADOOP-9629:
--

new patch: HADOOP-9629.trunk.4.patch
 - addresses code-review comments from [~cnauroth], see 
https://reviews.apache.org/r/22096/
 - adds InterfaceAudience and InterfaceStability annotations to the main 
classes.

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.3.patch

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: (was: HADOOP-9629.trunk.3.patch)

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-06-08 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-10561:


Attachment: HADOOP-10561.4.patch

Thanks Uma for review. The new patch includes update for your comments.

> Copy command with preserve option should handle Xattrs
> --
>
> Key: HADOOP-10561
> URL: https://issues.apache.org/jira/browse/HADOOP-10561
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
> Attachments: HADOOP-10561.1.patch, HADOOP-10561.2.patch, 
> HADOOP-10561.3.patch, HADOOP-10561.4.patch, HADOOP-10561.patch
>
>
> The design docs for Xattrs stated that we handle preserve options with copy 
> commands
> From doc:
> Preserve option of commands like “cp -p” shell command and “distcp -p” should 
> work on XAttrs. 
> In the case of source fs supports XAttrs but target fs does not support, 
> XAttrs will be ignored 
> with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021497#comment-14021497
 ] 

Mike Liddell commented on HADOOP-9629:
--

The annotations and suggested usages sound good.
The only changes that I suggest are:
- AzureException: Public + Evolving
- WasbFsck: Public + Evolving.

sound good?

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.3.patch

> Support Windows Azure Storage - Blob as a file system in Hadoop
> ---
>
> Key: HADOOP-9629
> URL: https://issues.apache.org/jira/browse/HADOOP-9629
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Mostafa Elhemali
>Assignee: Mike Liddell
> Attachments: HADOOP-9629 - Azure Filesystem - Information for 
> developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
> developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
> HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
> HADOOP-9629.trunk.3.patch
>
>
> h2. Description
> This JIRA incorporates adding a new file system implementation for accessing 
> Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
> to MR jobs or configuring MR jobs to put their output directly into blob 
> storage.
> h2. High level design
> At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blob storage; the scheme wasb is used for 
> accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
> scheme: {code}wasb[s]://@/path/to/file{code} to address 
> individual blobs. We use the standard Azure Java SDK 
> (com.microsoft.windowsazure) to do most of the work. In order to map a 
> hierarchical file system over the flat name-value pair nature of blob 
> storage, we create a specially tagged blob named path/to/dir whenever we 
> create a directory called path/to/dir, then files under that are stored as 
> normal blobs path/to/dir/file. We have many metrics implemented for it using 
> the Metrics2 interface. Tests are implemented mostly using a mock 
> implementation for the Azure SDK functionality, with an option to test 
> against a real blob storage if configured (instructions provided inside in 
> README.txt).
> h2. Credits and history
> This has been ongoing work for a while, and the early version of this work 
> can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
> we'll post the patch here for Hadoop trunk first, then post a patch for 
> branch-1 as well for backporting the functionality if accepted. Credit for 
> this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
> [~stojanovic] as well as multiple people who have taken over this work since 
> then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
> Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
> [~chuanliu].
> h2. Test
> Besides unit tests, we have used WASB as the default file system in our 
> service product. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. The current version reflects to the 
> version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10669) Avro serialization does not flush buffered serialized values causing data lost

2014-06-08 Thread Mikhail Bernadsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bernadsky updated HADOOP-10669:
---

Description: 
Found this debugging Nutch. 

MapTask serializes keys and values to the same stream, in pairs: 

keySerializer.serialize(key); 
. 
valSerializer.serialize(value);
 . 
bb.write(b0, 0, 0); 

AvroSerializer does not flush its buffer after each serialization. So if it is 
used for valSerializer, the values are only partially written or not written at 
all to the output stream before the record is marked as complete (the last line 
above).

 Added HADOOP-10699_all.patch. This is a less intrusive fix, as it does 
not try to flush MapTask stream. Instead, we write serialized values directly 
to MapTask stream and avoid using a buffer on avro side. 

  was:
Found this debugging Nutch. 

MapTask serializes keys and values to the same stream, in pairs: 

keySerializer.serialize(key); 
. 
valSerializer.serialize(value);
 . 
bb.write(b0, 0, 0); 

AvroSerializer does not flush its buffer after each serialization. So if it is 
used for valSerializer, the values are only partially written or not written at 
all to the output stream before the record is marked as complete (the last line 
above).


> Avro serialization does not flush buffered serialized values causing data lost
> --
>
> Key: HADOOP-10669
> URL: https://issues.apache.org/jira/browse/HADOOP-10669
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.4.0
>Reporter: Mikhail Bernadsky
> Attachments: HADOOP-10669.patch, HADOOP-10669_alt.patch
>
>
> Found this debugging Nutch. 
> MapTask serializes keys and values to the same stream, in pairs: 
> keySerializer.serialize(key); 
> . 
> valSerializer.serialize(value);
>  . 
> bb.write(b0, 0, 0); 
> AvroSerializer does not flush its buffer after each serialization. So if it 
> is used for valSerializer, the values are only partially written or not 
> written at all to the output stream before the record is marked as complete 
> (the last line above).
>  Added HADOOP-10699_all.patch. This is a less intrusive fix, as it does 
> not try to flush MapTask stream. Instead, we write serialized values directly 
> to MapTask stream and avoid using a buffer on avro side. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10669) Avro serialization does not flush buffered serialized values causing data lost

2014-06-08 Thread Mikhail Bernadsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bernadsky updated HADOOP-10669:
---

Attachment: HADOOP-10669_alt.patch

> Avro serialization does not flush buffered serialized values causing data lost
> --
>
> Key: HADOOP-10669
> URL: https://issues.apache.org/jira/browse/HADOOP-10669
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.4.0
>Reporter: Mikhail Bernadsky
> Attachments: HADOOP-10669.patch, HADOOP-10669_alt.patch
>
>
> Found this debugging Nutch. 
> MapTask serializes keys and values to the same stream, in pairs: 
> keySerializer.serialize(key); 
> . 
> valSerializer.serialize(value);
>  . 
> bb.write(b0, 0, 0); 
> AvroSerializer does not flush its buffer after each serialization. So if it 
> is used for valSerializer, the values are only partially written or not 
> written at all to the output stream before the record is marked as complete 
> (the last line above).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10641) Introduce Coordination Engine

2014-06-08 Thread Plamen Jeliazkov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021473#comment-14021473
 ] 

Plamen Jeliazkov commented on HADOOP-10641:
---

Hi Lohit, thanks for your comments!
# checkQuorum is an optimization some coordination engines may choose to 
implement in order to fail-fast to client requests. In the NameNode case, if 
quorum loss was suspected, that NameNode could start issuing StandbyExceptions.
# You are correct that the ZKCoordinationEngine does not implement ZNode 
clean-up currently. That is because it was made as a proof of concept for the 
CoordinationEngine API. Nonetheless, proper clean-up can be implemented. All 
one has to do is delete the ZNodes that everyone else has already learned about.
## Suppose you have Node A, B, and C, and Agreements 1, 2, 3, 4, and 5.
## Node A and B learn Agreement 1 first. Node C is a lagging node. A & B 
contain 1. C contains nothing.
## Node A and B continue onwards, learning up to Agreement 4. A & B contain 1, 
2, 3, and 4 now. C contains nothing.
## Node C finally learns Agreement 1. A & B contain 1, 2, 3, and 4 now. C 
contains 1.
## We can now discard Agreement 1 from persistence because we know that all the 
Nodes, A, B, and C, have safely learned about and applied Agreement 1.
## We can apply this process for all other Agreements. 

> Introduce Coordination Engine
> -
>
> Key: HADOOP-10641
> URL: https://issues.apache.org/jira/browse/HADOOP-10641
> Project: Hadoop Common
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Konstantin Shvachko
>Assignee: Plamen Jeliazkov
> Attachments: HADOOP-10641.patch, HADOOP-10641.patch, 
> HADOOP-10641.patch
>
>
> Coordination Engine (CE) is a system, which allows to agree on a sequence of 
> events in a distributed system. In order to be reliable CE should be 
> distributed by itself.
> Coordination Engine can be based on different algorithms (paxos, raft, 2PC, 
> zab) and have different implementations, depending on use cases, reliability, 
> availability, and performance requirements.
> CE should have a common API, so that it could serve as a pluggable component 
> in different projects. The immediate beneficiaries are HDFS (HDFS-6469) and 
> HBase (HBASE-10909).
> First implementation is proposed to be based on ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9361) Strictly define the expected behavior of filesystem APIs and write tests to verify compliance

2014-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021294#comment-14021294
 ] 

Hadoop QA commented on HADOOP-9361:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12648875/HADOOP-9361-015.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 77 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-tools/hadoop-openstack.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4024//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/4024//console

This message is automatically generated.

> Strictly define the expected behavior of filesystem APIs and write tests to 
> verify compliance
> -
>
> Key: HADOOP-9361
> URL: https://issues.apache.org/jira/browse/HADOOP-9361
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-9361-001.patch, HADOOP-9361-002.patch, 
> HADOOP-9361-003.patch, HADOOP-9361-004.patch, HADOOP-9361-005.patch, 
> HADOOP-9361-006.patch, HADOOP-9361-007.patch, HADOOP-9361-008.patch, 
> HADOOP-9361-009.patch, HADOOP-9361-011.patch, HADOOP-9361-012.patch, 
> HADOOP-9361-013.patch, HADOOP-9361-014.patch, HADOOP-9361-015.patch, 
> HADOOP-9361.awang-addendum.patch
>
>
> {{FileSystem}} and {{FileContract}} aren't tested rigorously enough -while 
> HDFS gets tested downstream, other filesystems, such as blobstore bindings, 
> don't.
> The only tests that are common are those of {{FileSystemContractTestBase}}, 
> which HADOOP-9258 shows is incomplete.
> I propose 
> # writing more tests which clarify expected behavior
> # testing operations in the interface being in their own JUnit4 test classes, 
> instead of one big test suite. 
> # Having each FS declare via a properties file what behaviors they offer, 
> such as atomic-rename, atomic-delete, umask, immediate-consistency -test 
> methods can downgrade to skipped test cases if a feature is missing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10561) Copy command with preserve option should handle Xattrs

2014-06-08 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021221#comment-14021221
 ] 

Uma Maheswara Rao G commented on HADOOP-10561:
--

Overall latest patch looks good to me. 

Tiny nit:
input attibute --> input attribute

+1 from me on addressing this comment.

> Copy command with preserve option should handle Xattrs
> --
>
> Key: HADOOP-10561
> URL: https://issues.apache.org/jira/browse/HADOOP-10561
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Yi Liu
> Attachments: HADOOP-10561.1.patch, HADOOP-10561.2.patch, 
> HADOOP-10561.3.patch, HADOOP-10561.patch
>
>
> The design docs for Xattrs stated that we handle preserve options with copy 
> commands
> From doc:
> Preserve option of commands like “cp -p” shell command and “distcp -p” should 
> work on XAttrs. 
> In the case of source fs supports XAttrs but target fs does not support, 
> XAttrs will be ignored 
> with warning message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9361) Strictly define the expected behavior of filesystem APIs and write tests to verify compliance

2014-06-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-9361:
---

Status: Patch Available  (was: Open)

> Strictly define the expected behavior of filesystem APIs and write tests to 
> verify compliance
> -
>
> Key: HADOOP-9361
> URL: https://issues.apache.org/jira/browse/HADOOP-9361
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-9361-001.patch, HADOOP-9361-002.patch, 
> HADOOP-9361-003.patch, HADOOP-9361-004.patch, HADOOP-9361-005.patch, 
> HADOOP-9361-006.patch, HADOOP-9361-007.patch, HADOOP-9361-008.patch, 
> HADOOP-9361-009.patch, HADOOP-9361-011.patch, HADOOP-9361-012.patch, 
> HADOOP-9361-013.patch, HADOOP-9361-014.patch, HADOOP-9361-015.patch, 
> HADOOP-9361.awang-addendum.patch
>
>
> {{FileSystem}} and {{FileContract}} aren't tested rigorously enough -while 
> HDFS gets tested downstream, other filesystems, such as blobstore bindings, 
> don't.
> The only tests that are common are those of {{FileSystemContractTestBase}}, 
> which HADOOP-9258 shows is incomplete.
> I propose 
> # writing more tests which clarify expected behavior
> # testing operations in the interface being in their own JUnit4 test classes, 
> instead of one big test suite. 
> # Having each FS declare via a properties file what behaviors they offer, 
> such as atomic-rename, atomic-delete, umask, immediate-consistency -test 
> methods can downgrade to skipped test cases if a feature is missing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10648) Service Authorization Improvements

2014-06-08 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HADOOP-10648:
--

Description: Umbrella jira for a set of improvements on service 
Authorization  (was: Umbrella jira for set of improvements on service 
Authorization)

> Service Authorization Improvements
> --
>
> Key: HADOOP-10648
> URL: https://issues.apache.org/jira/browse/HADOOP-10648
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Benoy Antony
>Assignee: Benoy Antony
>
> Umbrella jira for a set of improvements on service Authorization



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9361) Strictly define the expected behavior of filesystem APIs and write tests to verify compliance

2014-06-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-9361:
---

Attachment: HADOOP-9361-015.patch

This is revision -015 of the patch

# incorporates all of Andrew's modifications. Andrew - thanks for putting in 
the effort!
# added a section on object stores in the introduction, to clarify how they are 
different. 

Once we add a `Blobstore` marker to object stores, we can expand that a bit 
more.

> Strictly define the expected behavior of filesystem APIs and write tests to 
> verify compliance
> -
>
> Key: HADOOP-9361
> URL: https://issues.apache.org/jira/browse/HADOOP-9361
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-9361-001.patch, HADOOP-9361-002.patch, 
> HADOOP-9361-003.patch, HADOOP-9361-004.patch, HADOOP-9361-005.patch, 
> HADOOP-9361-006.patch, HADOOP-9361-007.patch, HADOOP-9361-008.patch, 
> HADOOP-9361-009.patch, HADOOP-9361-011.patch, HADOOP-9361-012.patch, 
> HADOOP-9361-013.patch, HADOOP-9361-014.patch, HADOOP-9361-015.patch, 
> HADOOP-9361.awang-addendum.patch
>
>
> {{FileSystem}} and {{FileContract}} aren't tested rigorously enough -while 
> HDFS gets tested downstream, other filesystems, such as blobstore bindings, 
> don't.
> The only tests that are common are those of {{FileSystemContractTestBase}}, 
> which HADOOP-9258 shows is incomplete.
> I propose 
> # writing more tests which clarify expected behavior
> # testing operations in the interface being in their own JUnit4 test classes, 
> instead of one big test suite. 
> # Having each FS declare via a properties file what behaviors they offer, 
> such as atomic-rename, atomic-delete, umask, immediate-consistency -test 
> methods can downgrade to skipped test cases if a feature is missing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-9361) Strictly define the expected behavior of filesystem APIs and write tests to verify compliance

2014-06-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-9361:
---

Status: Open  (was: Patch Available)

> Strictly define the expected behavior of filesystem APIs and write tests to 
> verify compliance
> -
>
> Key: HADOOP-9361
> URL: https://issues.apache.org/jira/browse/HADOOP-9361
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-9361-001.patch, HADOOP-9361-002.patch, 
> HADOOP-9361-003.patch, HADOOP-9361-004.patch, HADOOP-9361-005.patch, 
> HADOOP-9361-006.patch, HADOOP-9361-007.patch, HADOOP-9361-008.patch, 
> HADOOP-9361-009.patch, HADOOP-9361-011.patch, HADOOP-9361-012.patch, 
> HADOOP-9361-013.patch, HADOOP-9361-014.patch, HADOOP-9361.awang-addendum.patch
>
>
> {{FileSystem}} and {{FileContract}} aren't tested rigorously enough -while 
> HDFS gets tested downstream, other filesystems, such as blobstore bindings, 
> don't.
> The only tests that are common are those of {{FileSystemContractTestBase}}, 
> which HADOOP-9258 shows is incomplete.
> I propose 
> # writing more tests which clarify expected behavior
> # testing operations in the interface being in their own JUnit4 test classes, 
> instead of one big test suite. 
> # Having each FS declare via a properties file what behaviors they offer, 
> such as atomic-rename, atomic-delete, umask, immediate-consistency -test 
> methods can downgrade to skipped test cases if a feature is missing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)