[jira] Commented: (HDFS-767) Job failure due to BlockMissingException

2009-12-28 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794900#action_12794900
 ] 

dhruba borthakur commented on HDFS-767:
---

+1. code looks code.

 Job failure due to BlockMissingException
 

 Key: HDFS-767
 URL: https://issues.apache.org/jira/browse/HDFS-767
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.22.0

 Attachments: HDFS-767.patch, HDFS-767_2.patch, HDFS-767_3.patch, 
 HDFS-767_4.txt


 If a block is request by too many mappers/reducers (say, 3000) at the same 
 time, a BlockMissingException is thrown because it exceeds the upper limit (I 
 think 256 by default) of number of threads accessing the same block at the 
 same time. The DFSClient wil catch that exception and retry 3 times after 
 waiting for 3 seconds. Since the wait time is a fixed value, a lot of clients 
 will retry at about the same time and a large portion of them get another 
 failure. After 3 retries, there are about 256*4 = 1024 clients got the block. 
 If the number of clients are more than that, the job will fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-767) Job failure due to BlockMissingException

2009-12-28 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HDFS-767:
--

   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks Ning!

 Job failure due to BlockMissingException
 

 Key: HDFS-767
 URL: https://issues.apache.org/jira/browse/HDFS-767
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.22.0

 Attachments: HDFS-767.patch, HDFS-767_2.patch, HDFS-767_3.patch, 
 HDFS-767_4.txt


 If a block is request by too many mappers/reducers (say, 3000) at the same 
 time, a BlockMissingException is thrown because it exceeds the upper limit (I 
 think 256 by default) of number of threads accessing the same block at the 
 same time. The DFSClient wil catch that exception and retry 3 times after 
 waiting for 3 seconds. Since the wait time is a fixed value, a lot of clients 
 will retry at about the same time and a large portion of them get another 
 failure. After 3 retries, there are about 256*4 = 1024 clients got the block. 
 If the number of clients are more than that, the job will fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-856) Hardcoded replication level for new files in fuse-dfs

2009-12-28 Thread Brian Bockelman (JIRA)
Hardcoded replication level for new files in fuse-dfs
-

 Key: HDFS-856
 URL: https://issues.apache.org/jira/browse/HDFS-856
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor


In fuse-dfs, the number of replicas is always hardcoded to 3 in the arguments 
to hdfsOpenFile.  We should use the setting in the hadoop configuration instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-856) Hardcoded replication level for new files in fuse-dfs

2009-12-28 Thread Brian Bockelman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bockelman updated HDFS-856:
-

Attachment: HADOOP-856.patch

This patch changes the # of replicas argument in hdfsOpenFile to 0, which the 
libhdfs documentation says should be whatever the default is for the Hadoop 
configuration.

 Hardcoded replication level for new files in fuse-dfs
 -

 Key: HDFS-856
 URL: https://issues.apache.org/jira/browse/HDFS-856
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor
 Attachments: HADOOP-856.patch


 In fuse-dfs, the number of replicas is always hardcoded to 3 in the arguments 
 to hdfsOpenFile.  We should use the setting in the hadoop configuration 
 instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline

2009-12-28 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794910#action_12794910
 ] 

dhruba borthakur commented on HDFS-826:
---

If we do not introduce Replicable, then applications have to typecast to 
DFSOutputStream to use the new feature. So, essentially there is not much 
difference as far as maintaining this API as a true public facing API is 
concerned. Do you agree?

 Allow a mechanism for an application to detect that datanode(s)  have died in 
 the write pipeline
 

 Key: HDFS-826
 URL: https://issues.apache.org/jira/browse/HDFS-826
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: ReplicableHdfs.txt


 HDFS does not replicate the last block of the file that is being currently 
 written to by an application. Every datanode death in the write pipeline 
 decreases the reliability of the last block of the currently-being-written 
 block. This situation can be improved if the application can be notified of a 
 datanode death in the write pipeline. Then, the application can decide what 
 is the right course of action to be taken on this event.
 In our use-case, the application can close the file on the first datanode 
 death, and start writing to a newly created file. This ensures that the 
 reliability guarantee of a block is close to 3 at all time.
 One idea is to make DFSOutoutStream. write() throw an exception if the number 
 of datanodes in the write pipeline fall below minimum.replication.factor that 
 is set on the client (this is backward compatible).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-856) Hardcoded replication level for new files in fuse-dfs

2009-12-28 Thread Brian Bockelman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bockelman updated HDFS-856:
-

Status: Patch Available  (was: Open)

Submitting patch to Hudson.  I do not believe a test case is needed because of 
how straightforward the fix is and the fact that fuse-dfs is in contrib/

 Hardcoded replication level for new files in fuse-dfs
 -

 Key: HDFS-856
 URL: https://issues.apache.org/jira/browse/HDFS-856
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor
 Attachments: HADOOP-856.patch


 In fuse-dfs, the number of replicas is always hardcoded to 3 in the arguments 
 to hdfsOpenFile.  We should use the setting in the hadoop configuration 
 instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-857) Incorrect type for fuse-dfs capacity can cause df to return negative values on 32-bit machines

2009-12-28 Thread Brian Bockelman (JIRA)
Incorrect type for fuse-dfs capacity can cause df to return negative values 
on 32-bit machines


 Key: HDFS-857
 URL: https://issues.apache.org/jira/browse/HDFS-857
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor
 Attachments: HDFS-857.patch

On sufficiently large HDFS installs, the casting of hdfsGetCapacity to a long 
may cause df to return negative values.  tOffset should be used instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-857) Incorrect type for fuse-dfs capacity can cause df to return negative values on 32-bit machines

2009-12-28 Thread Brian Bockelman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bockelman updated HDFS-857:
-

Attachment: HDFS-857.patch

This patch fixes the issue by using tOffset instead of long.

 Incorrect type for fuse-dfs capacity can cause df to return negative values 
 on 32-bit machines
 

 Key: HDFS-857
 URL: https://issues.apache.org/jira/browse/HDFS-857
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor
 Attachments: HDFS-857.patch


 On sufficiently large HDFS installs, the casting of hdfsGetCapacity to a long 
 may cause df to return negative values.  tOffset should be used instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-94) The Heap Size in HDFS web ui may not be accurate

2009-12-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794928#action_12794928
 ] 

Hudson commented on HDFS-94:


Integrated in Hadoop-Hdfs-trunk-Commit #158 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/158/])


 The Heap Size in HDFS web ui may not be accurate
 --

 Key: HDFS-94
 URL: https://issues.apache.org/jira/browse/HDFS-94
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Dmytro Molkov
 Fix For: 0.22.0

 Attachments: HDFS-94.patch


 It seems that the Heap Size shown in HDFS web UI is not accurate.  It keeps 
 showing 100% of usage.  e.g.
 {noformat}
 Heap Size is 10.01 GB / 10.01 GB (100%) 
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-767) Job failure due to BlockMissingException

2009-12-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794929#action_12794929
 ] 

Hudson commented on HDFS-767:
-

Integrated in Hadoop-Hdfs-trunk-Commit #158 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/158/])
. An improved retry policy when the DFSClient is unable to fetch a
block from the datanode.  (Ning Zhang via dhruba)


 Job failure due to BlockMissingException
 

 Key: HDFS-767
 URL: https://issues.apache.org/jira/browse/HDFS-767
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.22.0

 Attachments: HDFS-767.patch, HDFS-767_2.patch, HDFS-767_3.patch, 
 HDFS-767_4.txt


 If a block is request by too many mappers/reducers (say, 3000) at the same 
 time, a BlockMissingException is thrown because it exceeds the upper limit (I 
 think 256 by default) of number of threads accessing the same block at the 
 same time. The DFSClient wil catch that exception and retry 3 times after 
 waiting for 3 seconds. Since the wait time is a fixed value, a lot of clients 
 will retry at about the same time and a large portion of them get another 
 failure. After 3 retries, there are about 256*4 = 1024 clients got the block. 
 If the number of clients are more than that, the job will fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-762) Trying to start the balancer throws a NPE

2009-12-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794927#action_12794927
 ] 

Hudson commented on HDFS-762:
-

Integrated in Hadoop-Hdfs-trunk-Commit #158 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/158/])


 Trying to start the balancer throws a NPE
 -

 Key: HDFS-762
 URL: https://issues.apache.org/jira/browse/HDFS-762
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Cristian Ivascu
Assignee: Cristian Ivascu
 Fix For: 0.21.0

 Attachments: 0001-corrected-balancer-constructor.patch, HDFS-762.patch


 When trying to run the balancer, I get a NullPointerException:
 2009-11-10 11:08:14,235 ERROR 
 org.apache.hadoop.hdfs.server.balancer.Balancer: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:161)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.checkReplicationPolicyCompatibility(Balancer.java:784)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.init(Balancer.java:792)
 at 
 org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:814)
 This happens when trying to use bin/start-balancer or bin/hdfs balancer 
 -threshold 10
 The config files (hdfs-site and core-site) have as fs.default.name 
 hdfs://namenode:9000.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-856) Hardcoded replication level for new files in fuse-dfs

2009-12-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794953#action_12794953
 ] 

Hadoop QA commented on HDFS-856:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429027/HADOOP-856.patch
  against trunk revision 894233.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/163/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/163/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/163/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/163/console

This message is automatically generated.

 Hardcoded replication level for new files in fuse-dfs
 -

 Key: HDFS-856
 URL: https://issues.apache.org/jira/browse/HDFS-856
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor
 Attachments: HADOOP-856.patch


 In fuse-dfs, the number of replicas is always hardcoded to 3 in the arguments 
 to hdfsOpenFile.  We should use the setting in the hadoop configuration 
 instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-858) Incorrect return codes for fuse-dfs

2009-12-28 Thread Brian Bockelman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bockelman updated HDFS-858:
-

Attachment: HDFS-858.patch

 Incorrect return codes for fuse-dfs
 ---

 Key: HDFS-858
 URL: https://issues.apache.org/jira/browse/HDFS-858
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor
 Attachments: HDFS-858.patch


 fuse-dfs doesn't pass proper error codes from libhdfs; places I'd like to 
 correct are hdfsFileOpen (which can result in permission denied or quota 
 violations) and hdfsWrite (which can result in quota violations).
 By returning the correct error codes, command line utilities return much 
 better error messages - especially for quota violations, which can be a devil 
 to debug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-859) fuse-dfs utime behavior causes issues with tar

2009-12-28 Thread Brian Bockelman (JIRA)
fuse-dfs utime behavior causes issues with tar
--

 Key: HDFS-859
 URL: https://issues.apache.org/jira/browse/HDFS-859
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor


When trying to untar files onto fuse-dfs, tar will try to set the utime on all 
the files and directories.  However, setting the utime on a directory in 
libhdfs causes an error.

We should silently ignore the failure of setting a utime on a directory; this 
will allow tar to complete successfully.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-859) fuse-dfs utime behavior causes issues with tar

2009-12-28 Thread Brian Bockelman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bockelman updated HDFS-859:
-

Attachment: HDFS-859.patch

 fuse-dfs utime behavior causes issues with tar
 --

 Key: HDFS-859
 URL: https://issues.apache.org/jira/browse/HDFS-859
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor
 Attachments: HDFS-859.patch


 When trying to untar files onto fuse-dfs, tar will try to set the utime on 
 all the files and directories.  However, setting the utime on a directory in 
 libhdfs causes an error.
 We should silently ignore the failure of setting a utime on a directory; this 
 will allow tar to complete successfully.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-420) fuse_dfs is unable to connect to the dfs after a copying a large number of files into the dfs over fuse

2009-12-28 Thread Brian Bockelman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bockelman updated HDFS-420:
-

Attachment: HDFS-420.patch

Hey Dima,

First, try applying HDFS-464 to alleviate some memory leaks present in libhdfs.

Then, try the attached file; this releases the reference to the FileSystem.

With this, we have been able to run FUSE-DFS stably for weeks at a time.  I've 
taken a heap dump of the resulting system and not found any more obvious leaks 
(with these two patches, albeit on Hadoop-0.19.x versions of them, I was able 
to set the heap size to 4MB and create tens of thousands of files).

To debug better, repeat your test with fuse_dfs in a second terminal with the 
-d option to make it stay in the foreground.  In this case, you will be able 
to capture the error messages Hadoop spits out.  Alternately, you can muck 
around with the log4j settings and get these to spit out to a log file.

Brian

 fuse_dfs is unable to connect to the dfs after a copying a large number of 
 files into the dfs over fuse
 ---

 Key: HDFS-420
 URL: https://issues.apache.org/jira/browse/HDFS-420
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
 Environment: Fedora core 10, x86_64, 2.6.27.7-134.fc10.x86_64 #1 SMP 
 (AMD 64), gcc 4.3.2, java 1.6.0 (IcedTea6 1.4 (fedora-7.b12.fc10-x86_64) 
 Runtime Environment (build 1.6.0_0-b12) OpenJDK 64-Bit Server VM (build 
 10.0-b19, mixed mode)
Reporter: Dima Brodsky
 Attachments: HDFS-420.patch


 I run the following test:
 1.  Run hadoop DFS in single node mode
 2.  start up fuse_dfs
 3.  copy my source tree, about 250 megs, into the DFS
  cp -av * /mnt/hdfs/
 in /var/log/messages I keep seeing:
 Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime 
 /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19 to 
 1229385138/1229963739
 and then eventually
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1333
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1333
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1333
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1333
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1209
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1209
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1333
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1209
 Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
 fuse_dfs.c:1037
 and the file system hangs.  hadoop is still running and I don't see any 
 errors in it's logs.  I have to unmount the dfs and restart fuse_dfs and then 
 everything is fine again.  At some point I see the following messages in the 
 /var/log/messages:
 ERROR: dfs problem - could not close file_handle(139677114350528) for 
 /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8339-93825052368848-1229278807.log
  fuse_dfs.c:1464
 Dec 22 09:04:49 bodum fuse_dfs: ERROR: dfs problem - could not close 
 file_handle(139676770220176) for 
 /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8140-93825025883216-1229278759.log
  fuse_dfs.c:1464
 Dec 22 09:05:13 bodum fuse_dfs: ERROR: dfs problem - could not close 
 file_handle(139677114812832) for 
 /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8138-93825070138960-1229251587.log
  fuse_dfs.c:1464
 Is this a known issue?  Am I just 

[jira] Created: (HDFS-860) fuse-dfs truncate behavior causes issues with scp

2009-12-28 Thread Brian Bockelman (JIRA)
fuse-dfs truncate behavior causes issues with scp
-

 Key: HDFS-860
 URL: https://issues.apache.org/jira/browse/HDFS-860
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brian Bockelman
Priority: Minor


For whatever reason, scp issues a truncate once it's written a file to 
truncate the file to the # of bytes it has written (i.e., if a file is X bytes, 
it calls truncate(X)).

This fails on the current fuse-dfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-860) fuse-dfs truncate behavior causes issues with scp

2009-12-28 Thread Brian Bockelman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bockelman updated HDFS-860:
-

Attachment: HDFS-860.patch

Attaching a simple patch to get around this problem - silently suppress the 
error if you call truncate with non-zero size.

This patch should be considered carefully; for our local community, the benefit 
(scp can be used to copy files onto a remote HDFS mount) outweighs the cost 
(breaking error codes for the truncate call).

I primarily wanted to get this issue and patch documented for others to 
potentially use (and to make sure it has proper licensing :)

 fuse-dfs truncate behavior causes issues with scp
 -

 Key: HDFS-860
 URL: https://issues.apache.org/jira/browse/HDFS-860
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brian Bockelman
Priority: Minor
 Attachments: HDFS-860.patch


 For whatever reason, scp issues a truncate once it's written a file to 
 truncate the file to the # of bytes it has written (i.e., if a file is X 
 bytes, it calls truncate(X)).
 This fails on the current fuse-dfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-861) fuse-dfs does not support O_RDWR

2009-12-28 Thread Brian Bockelman (JIRA)
fuse-dfs does not support O_RDWR


 Key: HDFS-861
 URL: https://issues.apache.org/jira/browse/HDFS-861
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor


Some applications (for us, the big one is rsync) will open a file in read-write 
mode when it really only intends to read xor write (not both).  fuse-dfs should 
try to not fail until the application actually tries to write to a pre-existing 
file or read from a newly created file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-861) fuse-dfs does not support O_RDWR

2009-12-28 Thread Brian Bockelman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bockelman updated HDFS-861:
-

Attachment: HDFS-861.patch

This patch implements the proposed change.  O_RDWR will work in a few cases; 
fuse-dfs respond with an error when you try to do a forbidden operation (such 
as writing to an existing file) instead of when you open it in the wrong mode.

This is important for our community as it allows users to rsync from a remote 
system into HDFS; primarily, I am trying to get the patch and its licensing 
documented.

 fuse-dfs does not support O_RDWR
 

 Key: HDFS-861
 URL: https://issues.apache.org/jira/browse/HDFS-861
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor
 Attachments: HDFS-861.patch


 Some applications (for us, the big one is rsync) will open a file in 
 read-write mode when it really only intends to read xor write (not both).  
 fuse-dfs should try to not fail until the application actually tries to write 
 to a pre-existing file or read from a newly created file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-858) Incorrect return codes for fuse-dfs

2009-12-28 Thread Brian Bockelman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bockelman updated HDFS-858:
-

Attachment: (was: HDFS-858.patch)

 Incorrect return codes for fuse-dfs
 ---

 Key: HDFS-858
 URL: https://issues.apache.org/jira/browse/HDFS-858
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor

 fuse-dfs doesn't pass proper error codes from libhdfs; places I'd like to 
 correct are hdfsFileOpen (which can result in permission denied or quota 
 violations) and hdfsWrite (which can result in quota violations).
 By returning the correct error codes, command line utilities return much 
 better error messages - especially for quota violations, which can be a devil 
 to debug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-858) Incorrect return codes for fuse-dfs

2009-12-28 Thread Brian Bockelman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bockelman updated HDFS-858:
-

Attachment: HDFS-858.patch

Fixed compilation issue for previously attached patch.

 Incorrect return codes for fuse-dfs
 ---

 Key: HDFS-858
 URL: https://issues.apache.org/jira/browse/HDFS-858
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Brian Bockelman
Priority: Minor
 Attachments: HDFS-858.patch


 fuse-dfs doesn't pass proper error codes from libhdfs; places I'd like to 
 correct are hdfsFileOpen (which can result in permission denied or quota 
 violations) and hdfsWrite (which can result in quota violations).
 By returning the correct error codes, command line utilities return much 
 better error messages - especially for quota violations, which can be a devil 
 to debug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.