[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster

2012-04-03 Thread Zhanwei.Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245476#comment-13245476
 ] 

Zhanwei.Wang commented on HDFS-3179:


@Uma and amith
It seems the same question with HDFS-3091.

I configure only one datanode and create a file using default number of 
replica(3), 
existings(1) = replication/2(3/2==1) will be satisfied and it can not replace 
with the new node as there is no extra nodes exist in the cluster.

HDFS-3091 should patch to 0.23.2 branch


 failed to append data, DataStreamer throw an exception, nodes.length != 
 original.length + 1 on single datanode cluster
 

 Key: HDFS-3179
 URL: https://issues.apache.org/jira/browse/HDFS-3179
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.2
Reporter: Zhanwei.Wang
Priority: Critical

 Create a single datanode cluster
 disable permissions
 enable webhfds
 start hdfs
 run the test script
 expected result:
 a file named test is created and the content is testtest
 the result I got:
 hdfs throw an exception on the second append operation.
 {code}
 ./test.sh 
 {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed
  to add a datanode: nodes.length != original.length + 1, 
 nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}}
 {code}
 Log in datanode:
 {code}
 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to 
 close file /test
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 {code}
 test.sh
 {code}
 #!/bin/sh
 echo test  test.txt
 curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster

2012-04-03 Thread Zhanwei.Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245501#comment-13245501
 ] 

Zhanwei.Wang commented on HDFS-3179:


@Uma and amith
Another question, in this test script, I first create a new EMPTY file and 
append to the file twice.
The first append succeed because file is empty, to create a pipeline, the 
stage is PIPELINE_SETUP_CREATE and the policy will not be checked.
The second append failed because the stage is PIPELINE_SETPU_APPEND and the 
policy will be checked.

So from the view of user, the first append succeed while the second fail, is 
that a good idea?

{code}
  // get new block from namenode
  if (stage == BlockConstructionStage.PIPELINE_SETUP_CREATE) {
if(DFSClient.LOG.isDebugEnabled()) {
  DFSClient.LOG.debug(Allocating new block);
}
nodes = nextBlockOutputStream(src);
initDataStreaming();
  } else if (stage == BlockConstructionStage.PIPELINE_SETUP_APPEND) {
if(DFSClient.LOG.isDebugEnabled()) {
  DFSClient.LOG.debug(Append to block  + block);
}
setupPipelineForAppendOrRecovery();  //check the policy here
initDataStreaming();
  }
{code}

 failed to append data, DataStreamer throw an exception, nodes.length != 
 original.length + 1 on single datanode cluster
 

 Key: HDFS-3179
 URL: https://issues.apache.org/jira/browse/HDFS-3179
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.2
Reporter: Zhanwei.Wang
Priority: Critical

 Create a single datanode cluster
 disable permissions
 enable webhfds
 start hdfs
 run the test script
 expected result:
 a file named test is created and the content is testtest
 the result I got:
 hdfs throw an exception on the second append operation.
 {code}
 ./test.sh 
 {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed
  to add a datanode: nodes.length != original.length + 1, 
 nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}}
 {code}
 Log in datanode:
 {code}
 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to 
 close file /test
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 {code}
 test.sh
 {code}
 #!/bin/sh
 echo test  test.txt
 curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.

2012-04-03 Thread Zhanwei.Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245515#comment-13245515
 ] 

Zhanwei.Wang commented on HDFS-3091:


Hi, Nocholas
{quote}
I would say the failures are expected. The feature is to guarantee the number 
of replicas that the user is asking. However, the cluster is too small that the 
guarantee is impossible. It makes sense to fail the write requests.
{quote}

I agree with you, but have a look at code. in HDFS-3179, I first create a EMPTY 
file and append twice, the first append finished successfully but the second 
failed since there is only one datanode and the number of replica is 3.

Is that what you want to see? I think the policy check should fail on the first 
write to the file. 



 Update the usage limitations of ReplaceDatanodeOnFailure policy in the config 
 description for the smaller clusters.
 ---

 Key: HDFS-3091
 URL: https://issues.apache.org/jira/browse/HDFS-3091
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client, name-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 2.0.0

 Attachments: h3091_20120319.patch


 When verifying the HDFS-1606 feature, Observed couple of issues.
 Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont 
 have enough DN to replcae in cluster and will be resulted into write failure.
 {quote}
 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010]
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)
 {quote}
 Lets take some cases:
 1) Replication factor 3 and cluster size also 3 and unportunately pipeline 
 drops to 1.
 ReplaceDatanodeOnFailure will be satisfied because *existings(1)= 
 replication/2 (3/2==1)*.
 But when it finding the new node to replace obiously it can not find the new 
 node and the sanity check will fail.
 This will be resulted to Wite failure.
 2) Replication factor 10 (accidentally user sets the replication factor to 
 higher value than cluster size),
   Cluser has only 5 datanodes.
   Here even if one node fails also write will fail with same reason.
   Because pipeline max will be 5 and killed one datanode, then existings will 
 be 4
   *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it 
 can not replace with the new node as there is no extra nodes exist in the 
 cluster. This will be resulted to write failure.
 3) sync realted opreations also fails in this situations ( will post the 
 clear scenarios)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3179) failed to append data, DataStreamer throw an exception, nodes.length != original.length + 1 on single datanode cluster

2012-04-03 Thread Zhanwei.Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245755#comment-13245755
 ] 

Zhanwei.Wang commented on HDFS-3179:


I totally agree with you about the problem one datanode with replication 3,I 
think this kind of operation should fail or at least get a warning.

My opinion is that, the purpose of the policy check is to make sure no 
potential data lose, in this one datanode 3 replica case, although the first 
append failure will not cause the data lose, the appended data after the first 
successful append is in danger because there is only one replica which is not 
the user expected 3. And there is no warning to tell the user the truth. 

My suggestion is to make the first write to the empty file fail if there is not 
enough datanode, in another word, make the policy check more strictly. And make 
the error message more friendly instead of nodes.length != original.length + 
1.




 failed to append data, DataStreamer throw an exception, nodes.length != 
 original.length + 1 on single datanode cluster
 

 Key: HDFS-3179
 URL: https://issues.apache.org/jira/browse/HDFS-3179
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.2
Reporter: Zhanwei.Wang
Priority: Critical

 Create a single datanode cluster
 disable permissions
 enable webhfds
 start hdfs
 run the test script
 expected result:
 a file named test is created and the content is testtest
 the result I got:
 hdfs throw an exception on the second append operation.
 {code}
 ./test.sh 
 {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Failed
  to add a datanode: nodes.length != original.length + 1, 
 nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]}}
 {code}
 Log in datanode:
 {code}
 2012-04-02 14:34:21,058 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 2012-04-02 14:34:21,059 ERROR org.apache.hadoop.hdfs.DFSClient: Failed to 
 close file /test
 java.io.IOException: Failed to add a datanode: nodes.length != 
 original.length + 1, nodes=[127.0.0.1:50010], original=[127.0.0.1:50010]
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
 {code}
 test.sh
 {code}
 #!/bin/sh
 echo test  test.txt
 curl -L -X PUT http://localhost:50070/webhdfs/v1/test?op=CREATE;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 curl -L -X POST -T test.txt http://localhost:50070/webhdfs/v1/test?op=APPEND;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2656) Implement a pure c client based on webhdfs

2012-03-31 Thread Zhanwei.Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243664#comment-13243664
 ] 

Zhanwei.Wang commented on HDFS-2656:



Hi donal, 
Good question, performance is an important issue and the lib needs to be 
designed and implemented carefully.

From lib side, I use libcurl to deal with http protocol and a buffer in the 
lib to optimize the performance. The same design was also used in our another 
project and the performance of libcurl is ok.

For the transmission, http use tcp connection. To read data from server, only 
the raw data is transfered. To write to server, I use chunked transfer 
encoding, and the overhead is just a small head per chunk.

For the server side, the performance is depending on the jetty server. In the 
previous prototype, jetty server or webhdfs had performance problem when I use 
HTTP1.1 protocol to read data from server, but this problem cannot reproduce 
when I switch to HTTP1.0 protocol. 

I did simple performance test on the previous prototype, and more performance 
test work is on the plan.

Currently, to write to hdfs may still fail under the heavy workload, I am not 
sure it is a bug of my code or the hdfs, I am working on it (seems not my bug 
-_-). The doc is under writing, function test is finished. As soon as I get the 
permit to open source and finish the doc, you can test yourself. I think it 
will not take too long.


 Implement a pure c client based on webhdfs
 --

 Key: HDFS-2656
 URL: https://issues.apache.org/jira/browse/HDFS-2656
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhanwei.Wang

 Currently, the implementation of libhdfs is based on JNI. The overhead of JVM 
 seems a little big, and libhdfs can also not be used in the environment 
 without hdfs.
 It seems a good idea to implement a pure c client by wrapping webhdfs. It 
 also can be used to access different version of hdfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2656) Implement a pure c client based on webhdfs

2012-03-26 Thread Zhanwei.Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238651#comment-13238651
 ] 

Zhanwei.Wang commented on HDFS-2656:


Hi everyone,
The code of this jira proposed is almost finished and will available soon. It 
is more complicated than what I thought and cost me more time. The following is 
the current status of the code.

Status update:

1, finished:
most functions in hdfs.h of libhdfs are implemented.

2, on-going:
function test.
unit test.
document

3, todo:
kerberos support.
http proxy support.
some performance improvement





 Implement a pure c client based on webhdfs
 --

 Key: HDFS-2656
 URL: https://issues.apache.org/jira/browse/HDFS-2656
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhanwei.Wang

 Currently, the implementation of libhdfs is based on JNI. The overhead of JVM 
 seems a little big, and libhdfs can also not be used in the environment 
 without hdfs.
 It seems a good idea to implement a pure c client by wrapping webhdfs. It 
 also can be used to access different version of hdfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3107) HDFS truncate

2012-03-22 Thread Zhanwei.Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236298#comment-13236298
 ] 

Zhanwei.Wang commented on HDFS-3107:


A problem of truncate is the visibility. Since to truncate a file needs to 
get the lease first, we do not need to take care of concurrent write, but we 
need to take care of concurrent read when we truncate a file. Hdfs client will 
buffer some block info when open and read a file, while these blocks may be 
truncated. Furthermore, socket and Hdfs client may buffer some data which may 
will be truncated.

When I implement the first edition of truncate prototype, if the block or data 
the client required is truncated, datanode will throw a exception and client 
will update the metadata to check if the data is truncated or the real error 
happened. But this cannot prevent the client reading buffered data.

Any comment and suggestion? 


 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Lei Chang
 Attachments: HDFS_truncate_semantics_Mar15.pdf, 
 HDFS_truncate_semantics_Mar21.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3107) HDFS truncate

2012-03-22 Thread Zhanwei.Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236362#comment-13236362
 ] 

Zhanwei.Wang commented on HDFS-3107:


Add more detail to my previous question, how to define may read content of a 
file that will be truncated, that is the visibility problem. If a file is 
opened and read just before truncation, should the truncated data be visible? 
Or just depends on the process of truncation? What if a file is opened before 
truncation and read after truncation?

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, name-node
Reporter: Lei Chang
 Attachments: HDFS_truncate_semantics_Mar15.pdf, 
 HDFS_truncate_semantics_Mar21.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3100) failed to append data using webhdfs

2012-03-16 Thread Zhanwei.Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230923#comment-13230923
 ] 

Zhanwei.Wang commented on HDFS-3100:


I also run this test on hdfs 1.0.1 and the script finished successfully, but I 
found some strange things in datanode log.
1) I also got DataBlockScanner: Verification failed. Since my hdfs is a 
single node cluster and running on local network, local disk and everything 
should not fail at all.
2) I got lots of DataNode: Client calls recoverBlock and I have no idea what 
happened.



{code}
2012-03-15 22:43:33,572 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_3505027625176300242_3086 src: /127.0.0.1:63879 dest: 
/127.0.0.1:50010
2012-03-15 22:43:33,590 INFO 
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification failed 
for blk_8647935099647661204_1101. Its ok since it not in datanode dataset 
anymore.
2012-03-15 22:43:33,659 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Reopen Block for append blk_3505027625176300242_3086
2012-03-15 22:43:33,662 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
setBlockPosition trying to set position to 275712 for block 
blk_3505027625176300242_3086 which is not a multiple of bytesPerChecksum 512
2012-03-15 22:43:33,662 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
computePartialChunkCrc sizePartialChunk 256 block blk_3505027625176300242_3086 
offset in block 275456 offset in metafile 2159
2012-03-15 22:43:33,662 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Read in partial CRC chunk from disk for block blk_3505027625176300242_3086
2012-03-15 22:43:33,664 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/127.0.0.1:63879, dest: /127.0.0.1:50010, bytes: 307712, op: HDFS_WRITE, cliID: 
DFSClient_110493321, offset: 0, srvID: 
DS-1576952563-10.64.55.158-50010-1331870286875, blockid: 
blk_3505027625176300242_3086, duration: 2816000
2012-03-15 22:43:33,664 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 0 for block blk_3505027625176300242_3086 terminating
2012-03-15 22:43:33,677 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Client calls recoverBlock(block=blk_3505027625176300242_3086, 
targets=[127.0.0.1:50010])
{code} 

 failed to append data using webhdfs
 ---

 Key: HDFS-3100
 URL: https://issues.apache.org/jira/browse/HDFS-3100
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.1
Reporter: Zhanwei.Wang
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: hadoop-wangzw-datanode-ubuntu.log, 
 hadoop-wangzw-namenode-ubuntu.log, test.sh, testAppend.patch


 STEP:
 1, deploy a single node hdfs  0.23.1 cluster and configure hdfs as:
 A) enable webhdfs
 B) enable append
 C) disable permissions
 2, start hdfs
 3, run the test script as attached
 RESULT:
 expected: a file named testFile should be created and populated with 32K * 
 5000 zeros, HDFS should be OK.
 I got: script cannot be finished, file has been created but not be populated 
 as expected, actually append operation failed.
 Datanode log shows that, blockscaner report a bad replica and nanenode decide 
 to delete it. Since it is a single node cluster, append fail. It makes no 
 sense that the script failed every time.
 Datanode and Namenode logs are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3101) cannot read empty file using webhdfs

2012-03-15 Thread Zhanwei.Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230891#comment-13230891
 ] 

Zhanwei.Wang commented on HDFS-3101:


well done, fixed very fast

 cannot read empty file using webhdfs
 

 Key: HDFS-3101
 URL: https://issues.apache.org/jira/browse/HDFS-3101
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.1
Reporter: Zhanwei.Wang
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.24.0, 1.1.0, 0.23.2, 1.0.2, 0.23.3

 Attachments: h3101_20120315.patch, h3101_20120315_branch-1.patch


 STEP:
 1, create a new EMPTY file
 2, read it using webhdfs.
 RESULT:
 expected: get a empty file
 I got: 
 {RemoteException:{exception:IOException,javaClassName:java.io.IOException,message:Offset=0
  out of the range [0, 0); OPEN, path=/testFile}}
 First of all, [0, 0) is not a valid range, and I think read a empty file 
 should be OK.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2656) Implement a pure c client based on webhdfs

2011-12-10 Thread Zhanwei.Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166815#comment-13166815
 ] 

Zhanwei.Wang commented on HDFS-2656:


I am working on a new pure c hdfs client named libchdfs. It has almost the same 
interface with libhdfs. Any comment are welcomed.

 Implement a pure c client based on webhdfs
 --

 Key: HDFS-2656
 URL: https://issues.apache.org/jira/browse/HDFS-2656
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhanwei.Wang

 Currently, the implementation of libhdfs is based on JNI. The overhead of JVM 
 seems a little big, and libhdfs can also not be used in the environment 
 without hdfs.
 It seems a good idea to implement a pure c client by wrapping webhdfs. It 
 also can be used to access different version of hdfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira