[jira] [Commented] (HADOOP-13340) Compress Hadoop Archive output

2018-07-06 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534993#comment-16534993
 ] 

Koji Noguchi commented on HADOOP-13340:
---

I do sometimes want this compression feature when I want to keep a backup copy 
of our users' directories or when har-archiving bunch of job history&configs.  
And yes, "transparent compression" with overhead of decoding up to an entire 
codec block would be nice. 

However, in addition to this overhead of finding the head of the original file, 
there is another overhead when users need to perform random reads on the 
original files. As I understand, suggested design would only allow us to 
decompress from the head of the file.
If we have hadoop job with 10 mappers reading from a single text file, this 
would be hard to perform with the proposed compressed-har when each mapper 
trying to read the text file from a specific offset.

Maybe we can live with a semi-transparent hadoop-archive compression that would 
only let you read from the head of each file?  This would be similar to old 
hftp implementation where we didn't allow seek/positional-reads.

> Compress Hadoop Archive output
> --
>
> Key: HADOOP-13340
> URL: https://issues.apache.org/jira/browse/HADOOP-13340
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 2.5.0
>Reporter: Duc Le Tu
>Priority: Major
>  Labels: features, performance
>
> Why Hadoop Archive tool cannot compress output like other map-reduce job? 
> I used some options like -D mapreduce.output.fileoutputformat.compress=true 
> -D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
>  but it's not work. Did I wrong somewhere?
> If not, please support option for compress output of Hadoop Archive tool, 
> it's very neccessary for data retention for everyone (small files problem and 
> compress data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13340) Compress Hadoop Archive output

2018-07-17 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated HADOOP-13340:
--
Attachment: HADOOP-13340-example-v01.patch

> Compress Hadoop Archive output
> --
>
> Key: HADOOP-13340
> URL: https://issues.apache.org/jira/browse/HADOOP-13340
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 2.5.0
>Reporter: Duc Le Tu
>Priority: Major
>  Labels: features, performance
> Attachments: HADOOP-13340-example-v01.patch
>
>
> Why Hadoop Archive tool cannot compress output like other map-reduce job? 
> I used some options like -D mapreduce.output.fileoutputformat.compress=true 
> -D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
>  but it's not work. Did I wrong somewhere?
> If not, please support option for compress output of Hadoop Archive tool, 
> it's very neccessary for data retention for everyone (small files problem and 
> compress data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13340) Compress Hadoop Archive output

2018-07-17 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547033#comment-16547033
 ] 

Koji Noguchi commented on HADOOP-13340:
---

Just to clarify on my previous comment, tried writing an example.  Not intended 
for commit.  

This provides a compressed har but it's not transparent like regular har in 
that it doesn't allow random reads.

> Compress Hadoop Archive output
> --
>
> Key: HADOOP-13340
> URL: https://issues.apache.org/jira/browse/HADOOP-13340
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 2.5.0
>Reporter: Duc Le Tu
>Priority: Major
>  Labels: features, performance
> Attachments: HADOOP-13340-example-v01.patch
>
>
> Why Hadoop Archive tool cannot compress output like other map-reduce job? 
> I used some options like -D mapreduce.output.fileoutputformat.compress=true 
> -D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
>  but it's not work. Did I wrong somewhere?
> If not, please support option for compress output of Hadoop Archive tool, 
> it's very neccessary for data retention for everyone (small files problem and 
> compress data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13340) Compress Hadoop Archive output

2018-07-17 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547046#comment-16547046
 ] 

Koji Noguchi commented on HADOOP-13340:
---

Hmm, updated unit test is failing for me.  Please ignore. I'll upload another 
one.

> Compress Hadoop Archive output
> --
>
> Key: HADOOP-13340
> URL: https://issues.apache.org/jira/browse/HADOOP-13340
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 2.5.0
>Reporter: Duc Le Tu
>Priority: Major
>  Labels: features, performance
> Attachments: HADOOP-13340-example-v01.patch
>
>
> Why Hadoop Archive tool cannot compress output like other map-reduce job? 
> I used some options like -D mapreduce.output.fileoutputformat.compress=true 
> -D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
>  but it's not work. Did I wrong somewhere?
> If not, please support option for compress output of Hadoop Archive tool, 
> it's very neccessary for data retention for everyone (small files problem and 
> compress data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13340) Compress Hadoop Archive output

2018-07-17 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated HADOOP-13340:
--
Attachment: HADOOP-13340-example-v02.patch

bq. Hmm, updated unit test is failing for me.  Please ignore. I'll upload 
another one.

Seems like recent addition of commons-lang3 broke the unit test. Just taking 
out that jar fixed the classnotfound issue.  
>From last patch example, updated getFileBlockLocation to fake the block size 
>so that application still sees the full file size.  
>({{HADOOP-13340-example-v02.patch}})
This is breaking another transparency (or contract).


> Compress Hadoop Archive output
> --
>
> Key: HADOOP-13340
> URL: https://issues.apache.org/jira/browse/HADOOP-13340
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 2.5.0
>Reporter: Duc Le Tu
>Priority: Major
>  Labels: features, performance
> Attachments: HADOOP-13340-example-v01.patch, 
> HADOOP-13340-example-v02.patch
>
>
> Why Hadoop Archive tool cannot compress output like other map-reduce job? 
> I used some options like -D mapreduce.output.fileoutputformat.compress=true 
> -D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
>  but it's not work. Did I wrong somewhere?
> If not, please support option for compress output of Hadoop Archive tool, 
> it's very neccessary for data retention for everyone (small files problem and 
> compress data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16839) SparkLauncher does not read SPARK_CONF_DIR/spark-defaults.conf

2020-02-04 Thread Koji Noguchi (Jira)
Koji Noguchi created HADOOP-16839:
-

 Summary: SparkLauncher does not read 
SPARK_CONF_DIR/spark-defaults.conf 
 Key: HADOOP-16839
 URL: https://issues.apache.org/jira/browse/HADOOP-16839
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Koji Noguchi


Noticed while testing spark e2e tests.  Somehow, Pig's spark launcher is not 
reading SPARK_CONF_DIR at all. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-01-11 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819365#comment-15819365
 ] 

Koji Noguchi commented on HADOOP-13114:
---

bq. Could you please elucidate your concern if its not that?

My point is, this command won't be useful unless the compressed outputs are 
directly readable by hadoop jobs.
Avro, Orc, RCFile, SequenceFile etc and other common file formats all have 
their own ways of compressing and simply gzip/bzip-ing the entire files won't 
do any good.
Worse, I don't think the patch provides a way to uncompress them back.

bq.  but that means we'd make assumptions about Hadoop's use cases

And I'd say you're assuming users would only call this distcp+compress on text 
files only.
Files with other fileformat would become unreadable (until uncompressed back).


I agree with Nathan on the naming. If the command is called 
{{dist-text-compress}}, then I'll have no concerns.

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, 
> HADOOP-13114.06.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-11-21 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684893#comment-15684893
 ] 

Koji Noguchi commented on HADOOP-13114:
---

Sorry for joining late on this jira but this feature only seems to make sense 
for compressing text files.
Isn't the use case too narrow to be part of the general distcp tool ?

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-01-10 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815124#comment-15815124
 ] 

Koji Noguchi commented on HADOOP-13114:
---

bq. I guess it'd be useful for any files which are compressible, right? 

I'm probably missing something here.
Besides from text files, is there any other file format that can benefit from 
this distcp+compression?

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, 
> HADOOP-13114.06.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-8230) Enable sync by default and disable append

2012-05-07 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269678#comment-13269678
 ] 

Koji Noguchi commented on HADOOP-8230:
--

bq. , but what is that use case?

I'm probably a minority here but I do want the "an option to disable durable 
sync" not because it could be buggy but because it could be too stable compared 
to 0.23/2.0.  Before moving all of our non-HBase clusters to 2.0, we might use 
1.1 for some time.  During this period, I do not want some production projects 
to start relying on the sync features then find some regression/difference on 
2.0 blocking our upgrade schedule. 

> Enable sync by default and disable append
> -
>
> Key: HADOOP-8230
> URL: https://issues.apache.org/jira/browse/HADOOP-8230
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 1.1.0
>
> Attachments: hadoop-8230.txt
>
>
> Per HDFS-3120 for 1.x let's:
> - Always enable the sync path, which is currently only enabled if 
> dfs.support.append is set
> - Remove the dfs.support.append configuration option. We'll keep the code 
> paths though in case we ever fix append on branch-1, in which case we can add 
> the config option back

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-8690) Shell may remove a file without going to trash even if skipTrash is not enabled

2012-08-14 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434240#comment-13434240
 ] 

Koji Noguchi commented on HADOOP-8690:
--

bq. Seems like in this case we should move it to trash but with a file name 
suffix.
I believe it does that already.

On 0.23, 
{noformat}
[knoguchi ~]$ hdfs dfs  -touchz abcdef
[knoguchi ~]$ hdfs dfs  -rm abcdef
(repeat 3 times)
[knoguchi ~]$ hdfs dfs  -ls /user/knoguchi/.Trash/Current/user/knoguchi/abc\*
Found 1 items
-rw-r--r--   3 knoguchi users  0 2012-08-14 16:30 
/user/knoguchi/.Trash/Current/user/knoguchi/abcdef
Found 1 items
-rw-r--r--   3 knoguchi users  0 2012-08-14 16:31 
/user/knoguchi/.Trash/Current/user/knoguchi/abcdef1344961878400
Found 1 items
-rw-r--r--   3 knoguchi users  0 2012-08-14 16:31 
/user/knoguchi/.Trash/Current/user/knoguchi/abcdef1344961912093
{noformat}



> Shell may remove a file without going to trash even if skipTrash is not 
> enabled
> ---
>
> Key: HADOOP-8690
> URL: https://issues.apache.org/jira/browse/HADOOP-8690
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Priority: Minor
>
> Delete.java contains the following comment:
> {noformat}
> // TODO: if the user wants the trash to be used but there is any
> // problem (ie. creating the trash dir, moving the item to be deleted,
> // etc), then the path will just be deleted because moveToTrash returns
> // false and it falls thru to fs.delete.  this doesn't seem right
> {noformat}
> If Trash#moveToAppropriateTrash returns false FsShell will delete the path 
> even if skipTrash is not enabled. The comment isn't quite right as some of 
> these failure scenarios result in exceptions not a false return value, and in 
> the case of an exception we don't unconditionally delete the path. 
> TrashPolicy#moveToTrash states that it only returns false if the item is 
> already in the trash or trash is disabled, and the expected behavior for 
> these cases is to just delete the path. However 
> TrashPolicyDefault#moveToTrash also returns false if there's a problem 
> creating the trash directory, so for this case I think we should throw an 
> exception rather than return false (and delete the path bypassing trash).
> I also question the behavior of just deleting when the item is already in the 
> trash as it may have changed since previously put in the trash and not been 
> checkpointed yet. Seems like in this case we should move it to trash but with 
> a file name suffix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-10820) Empty entry in libjars results in working directory being recursively localized

2014-07-21 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068776#comment-14068776
 ] 

Koji Noguchi commented on HADOOP-10820:
---

This reminds me of a very old bug HADOOP-1386 from 7 years ago. 
Maybe it's better to fix this issue at the lower layer and disallow creation of 
Path from an empty URI.
Simple test code as below.  If you uncomment the 'delete', it'll wipe the 
entire current directory instead of throwing the IllegalArgumentException.

{code:java}
import org.apache.hadoop.fs.Path;
import java.lang.IllegalArgumentException;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.RawLocalFileSystem;
import java.net.URI;

public class TestEmptyPath {

  public static void main(String[] args) throws Exception {
FileSystem fs = new RawLocalFileSystem();
try {
  Path path = new Path("");
  System.out.println("Path from empty String is " + path);
  System.exit(123);
} catch (IllegalArgumentException ex) {
  System.out.println("Empty Path creation error successfully captured [" + 
ex + "]");
}

try {
  URI emptyuri = new URI("");
  Path path = new Path(emptyuri);
  System.out.println("Path from empty URI is " + path);
  // IF YOU UNCOMMENT THIS LINE, IT WILL WIPE YOUR ENTIRE CURRENT DIR!!!
  //fs.delete(path, true);
  System.exit(234);
} catch (IllegalArgumentException ex) {
  System.out.println("Empty Path creation error successfully captured [" + 
ex + "]");
  System.exit(0);
}
  }
}
{code}

{noformat}
-bash-4.1$ ls -l
total 8
drwxr-xr-x 2 ___ users 4096 Jul 21 16:54 bcd
-rw-r--r-- 1 ___ users5 Jul 21 16:54 efg
-bash-4.1$java -cp.  TestEmptyPath
Empty Path creation error successfully captured 
[java.lang.IllegalArgumentException: Can not create a Path from an empty string]
Path from empty URI is
-bash-4.1$ echo $?
234
-bash-4.1$ ls -l
total 0
-bash-4.1$
{noformat}



> Empty entry in libjars results in working directory being recursively 
> localized
> ---
>
> Key: HADOOP-10820
> URL: https://issues.apache.org/jira/browse/HADOOP-10820
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Alex Holmes
>Priority: Minor
> Attachments: HADOOP-10820-1.patch, HADOOP-10820.patch
>
>
> An empty token (e.g. "a.jar,,b.jar") in the -libjars option causes the 
> current working directory to be recursively localized.
> Here's an example of this in action (using Hadoop 2.2.0):
> {code}
> # create a temp directory and touch three JAR files
> mkdir -p tmp/path && cd tmp && touch a.jar b.jar c.jar path/d.jar
> # Run an example job only specifying two of the JARs.
> # Include an empty entry in libjars.
> hadoop jar 
> /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar 
> pi -libjars a.jar,,c.jar 2 10
> # As the job is running examine the localized directory in HDFS.
> # Notice that not only are the two JAR's specified in libjars copied,
> # but in addition the contents of the working directory are also recursively 
> copied.
> $ hadoop fs -lsr 
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/a.jar
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/c.jar
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp/a.jar
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp/b.jar
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp/c.jar
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp/path
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp/path/d.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10820) Empty entry in libjars results in working directory being recursively localized

2014-07-22 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070441#comment-14070441
 ] 

Koji Noguchi commented on HADOOP-10820:
---

Given careless ",," started uploading the entire current working directory, 
maybe someday this same ",," could start *deleting* entire current working 
directory for some application.  I still remember the nightmare we had when our 
user lost his entire homedir due to his pig script containing one extra 
'space'(HADOOP-1386).  So yes, I'd love to see this fixed.  Zhihai or Alex, if 
one of you want to create the jira with the fix, please go ahead.

As for this jira, I don't have a strong preference.  We can
*  just fail with IllegalArgumentException from Path constructor, 
* catch at GenericOptionsParser and give better error message
or
* catch at GenericOptionsParser and ignore the empty string.

> Empty entry in libjars results in working directory being recursively 
> localized
> ---
>
> Key: HADOOP-10820
> URL: https://issues.apache.org/jira/browse/HADOOP-10820
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Alex Holmes
>Priority: Minor
> Attachments: HADOOP-10820-1.patch, HADOOP-10820.patch
>
>
> An empty token (e.g. "a.jar,,b.jar") in the -libjars option causes the 
> current working directory to be recursively localized.
> Here's an example of this in action (using Hadoop 2.2.0):
> {code}
> # create a temp directory and touch three JAR files
> mkdir -p tmp/path && cd tmp && touch a.jar b.jar c.jar path/d.jar
> # Run an example job only specifying two of the JARs.
> # Include an empty entry in libjars.
> hadoop jar 
> /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar 
> pi -libjars a.jar,,c.jar 2 10
> # As the job is running examine the localized directory in HDFS.
> # Notice that not only are the two JAR's specified in libjars copied,
> # but in addition the contents of the working directory are also recursively 
> copied.
> $ hadoop fs -lsr 
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/a.jar
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/c.jar
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp/a.jar
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp/b.jar
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp/c.jar
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp/path
> /tmp/hadoop-yarn/staging/aholmes/.staging/job_1404752711144_0018/libjars/tmp/path/d.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10876) The constructor of Path should not take an empty URL as a parameter

2014-07-22 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070605#comment-14070605
 ] 

Koji Noguchi commented on HADOOP-10876:
---

My comment and suggestion can be found at 
https://issues.apache.org/jira/browse/HADOOP-10820?focusedCommentId=14068776&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14068776
and
https://issues.apache.org/jira/browse/HADOOP-10820?focusedCommentId=14070441&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070441

> The constructor of Path should not take an empty URL as a parameter
> ---
>
> Key: HADOOP-10876
> URL: https://issues.apache.org/jira/browse/HADOOP-10876
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: zhihai xu
>
> The constructor of Path should not take an empty URL as a parameter, As 
> discussed in HADOOP-10820, This JIRA is to change Path constructor at public 
> Path(URI aUri) to check the empty URI and throw IllegalArgumentException for 
> empty URI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10876) The constructor of Path should not take an empty URL as a parameter

2014-08-07 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090245#comment-14090245
 ] 

Koji Noguchi commented on HADOOP-10876:
---

Thanks Andrew, Zhihai.
(Sorry I was too lazy to create the jira myself to fix this.)

If possible,  I still want to prevent path creation from empty URI. 
Original bug (HADOOP-10820) was about unintentionally reading the current 
working directory recursively.  I'm afraid next time we hit a similar issue, it 
could be about deleting the entire working directory.

Maybe I'm worrying too much.  Let me think about it.

> The constructor of Path should not take an empty URL as a parameter
> ---
>
> Key: HADOOP-10876
> URL: https://issues.apache.org/jira/browse/HADOOP-10876
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: HADOOP-10876.000.patch, HADOOP-10876.001.patch
>
>
> The constructor of Path should not take an empty URL as a parameter, As 
> discussed in HADOOP-10820, This JIRA is to change Path constructor at public 
> Path(URI aUri) to check the empty URI and throw IllegalArgumentException for 
> empty URI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9757) Har metadata cache can grow without limit

2013-07-31 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725577#comment-13725577
 ] 

Koji Noguchi commented on HADOOP-9757:
--

Hi Daryn,

bq. The har AFS delegates to the standard har FS which uses a cached fs.

I'm not fully sure if I understand your point but please note that 
HarFileSystem does not use FileSystem.CACHE. We have 
fs.har.impl.disable.cache=true inside core-default.xml. 
(HADOOP-6097 and  HADOOP-6231)


> Har metadata cache can grow without limit
> -
>
> Key: HADOOP-9757
> URL: https://issues.apache.org/jira/browse/HADOOP-9757
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.0.4-alpha, 0.23.9
>Reporter: Jason Lowe
>Assignee: Cristina L. Abad
> Attachments: HADOOP_9757.branch23.patch
>
>
> MAPREDUCE-2459 added a metadata cache to the har filesystem, but the cache 
> has no upper limits.  A long-running process that accesses many har archives 
> will eventually run out of memory due to a har metadata cache that never 
> retires entries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9639) truly shared cache for jars (jobjar/libjar)

2013-08-27 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751400#comment-13751400
 ] 

Koji Noguchi commented on HADOOP-9639:
--

I'm not comfortable with the proposed design where it almost blindly trust 
other users to do the right thing and upload the right file.
(Not that I don't trust my users :)

For example, it's easy for me to corrupt the /sharedcache directories by 
creating directories with permission 700.

But what worries me the most is, entire security is based on a checksum.
Qutoing from [wikipedia|http://en.wikipedia.org/wiki/Checksum]
"It is important to not use a checksum in a security related application, as a 
checksum does not have the properties required to protect data from intentional 
tampering."

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: HADOOP-9639
> URL: https://issues.apache.org/jira/browse/HADOOP-9639
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: filecache
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-6857) FsShell should report raw disk usage including replication factor

2010-09-14 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909502#action_12909502
 ] 

Koji Noguchi commented on HADOOP-6857:
--

A little confused.  I thought "fs -count" shows hdfs usage same as "fs -du" at 
the thrid column.

{noformat}
[knoguchi ~]$ hadoop dfs -dus /user/knoguchi
hdfs://abc-nn1.com/user/knoguchi   2603203340273
[knoguchi ~]$ hadoop dfs -count /user/knoguchi
158020624  2603203340273 hdfs://abc-nn1.com/user/knoguchi
[knoguchi ~]$ 
{noformat}
If quota is enabled on that dir and "-q" is passed, it would show the remaining 
raw space available. 
{noformat}
[knoguchi ~]$ hadoop dfs -count -q /user/knoguchi
   5   27796  13194139533312   5384528402193 1580   
 20624  2603203340273 hdfs://abc-nn1.com/user/knoguchi
[knoguchi ~]$ 
{noformat}
You can get the raw space usage then. (quota - raw\_remaining). 
However *this is only if you have quota enabled on that particular dir*.


> FsShell should report raw disk usage including replication factor
> -
>
> Key: HADOOP-6857
> URL: https://issues.apache.org/jira/browse/HADOOP-6857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Alex Kozlov
> Fix For: 0.22.0
>
> Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus " command.  
> Since replication level is per file level, it would be nice to add raw disk 
> usage including the replication factor (maybe "hadoop fs -dus -raw "?). 
>  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (HADOOP-6857) FsShell should report raw disk usage including replication factor

2010-09-14 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reopened HADOOP-6857:
--


I think this number(raw usage) would be helpful.  Not sure whether this should 
be in -du or -count and by default or as an option.

> FsShell should report raw disk usage including replication factor
> -
>
> Key: HADOOP-6857
> URL: https://issues.apache.org/jira/browse/HADOOP-6857
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Alex Kozlov
> Fix For: 0.22.0
>
> Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus " command.  
> Since replication level is per file level, it would be nice to add raw disk 
> usage including the replication factor (maybe "hadoop fs -dus -raw "?). 
>  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6985) Suggest that HADOOP_OPTS be preserved in hadoop-env.sh.template

2010-10-25 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924763#action_12924763
 ] 

Koji Noguchi commented on HADOOP-6985:
--

Curious.

bq. else FOO+=" -server"; fi

Where is FOO being used?

> Suggest that HADOOP_OPTS be preserved in hadoop-env.sh.template
> ---
>
> Key: HADOOP-6985
> URL: https://issues.apache.org/jira/browse/HADOOP-6985
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: HADOOP-6985.patch
>
>
> For an administrator who wants to customize HADOOP_OPTS, it would be better 
> to have
> # if [ "$HADOOP_OPTS" == "" ]; then export HADOOP_OPTS=-server; else FOO+=" 
> -server"; fi
> instead of
> # Extra Java runtime options.  Empty by default.
> # export HADOOP_OPTS=-server

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-7104) Remove unnecessary DNS reverse lookups from RPC layer

2011-01-18 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983339#action_12983339
 ] 

Koji Noguchi commented on HADOOP-7104:
--

Kan, Deveraj, isn't this a regression bug instead of an improvement? 
A single accept thread falling behind due to reverse DNS lookup leading to 
unresponsive Namenode.  (still to be confirmed.)

Nigel and dev, please consider this for 0.22.  



> Remove unnecessary DNS reverse lookups from RPC layer
> -
>
> Key: HADOOP-7104
> URL: https://issues.apache.org/jira/browse/HADOOP-7104
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc, security
>Reporter: Kan Zhang
>Assignee: Kan Zhang
> Fix For: 0.23.0
>
> Attachments: 7104-few-edits.patch, c7104-01.patch, c7104-03.patch
>
>
> RPC connection authorization needs to verify client's Kerberos principal name 
> matches what specified for the protocol. For service clients like DN's, their 
> Kerberos principal names can be specified in the form of  
> "datanode/_h...@domain.com". To get the expected
> client principal name, the server needs to substitute "_HOST" with the 
> client's fully qualified domain name, which requires a reverse DNS lookup 
> from client IP address. However, for connections from clients whose principal 
> name are either unspecified or specified not using the "_HOST" convention, 
> the substitution is not required and the reverse DNS lookup should be 
> avoided. Currently the reverse DNS lookup is done for all clients, which 
> could slow services like NN down, when local named cache is not available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-7115) Add a cache for getpwuid_r and getpwgid_r calls

2011-01-21 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985048#action_12985048
 ] 

Koji Noguchi commented on HADOOP-7115:
--

Devaraj, can we make this cache timeout period configurable ?  
If yes, ops would have an option to set this timeout to 0 to skip the caching 
altogether.
Allen, will this address your concern? 

We really had miserable weeks when our LDAP servers were going up and down.  
Don't want to go back there.

> Add a cache for getpwuid_r and getpwgid_r calls
> ---
>
> Key: HADOOP-7115
> URL: https://issues.apache.org/jira/browse/HADOOP-7115
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Arun C Murthy
>Assignee: Devaraj Das
>
> As discussed in HADOOP-6978, a cache helps a lot.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] [Commented] (HADOOP-7207) fs member of FSShell is not really needed

2011-03-24 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010787#comment-13010787
 ] 

Koji Noguchi commented on HADOOP-7207:
--

Duplicate of HADOOP-5749 ?

> fs member of FSShell is not really needed
> -
>
> Key: HADOOP-7207
> URL: https://issues.apache.org/jira/browse/HADOOP-7207
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
>
> FileSystem object should be created on demand when needed in FSShell, instead 
> of always creating one that connects to the default FileSystem.
> It will also solve a problem of connecting to non-existant/non-functional 
> default, when it is not really needed for this run.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-7262) Update CS docs with better example configs

2011-05-09 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030901#comment-13030901
 ] 

Koji Noguchi commented on HADOOP-7262:
--

Arun, "mapred.capacity-scheduler.queue." is repeated many times on 
the tables in your pdf doc.  We should fill in the right conf names.

> Update CS docs with better example configs
> --
>
> Key: HADOOP-7262
> URL: https://issues.apache.org/jira/browse/HADOOP-7262
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>Priority: Minor
> Fix For: 0.20.204.0
>
> Attachments: HADOOP-7262.patch, capacity_scheduler.pdf
>
>
> It will be nice to enhance CS docs with real-world example configs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-7262) Update CS docs with better example configs

2011-05-09 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030912#comment-13030912
 ] 

Koji Noguchi commented on HADOOP-7262:
--

bq. Capacity Guarantees
bq.
Isn't 'guarantee' a strong word to use for the current capacity scheduler?
This is only possible when all of the queues have max-limit equal to queue 
capacity (since we don't have preemption).

bq. If true, priorities of jobs will be taken into account in scheduling 
decisions. 
bq.
On our internal cluster, we've never tried this.  It was once pointed out that 
we still have  priority inversion problems(MAPREDUCE:314) and should not enable 
this conf.  Was this fixed ?  If not, should we mention that it is not being 
used/tested yet?



> Update CS docs with better example configs
> --
>
> Key: HADOOP-7262
> URL: https://issues.apache.org/jira/browse/HADOOP-7262
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>Priority: Minor
> Fix For: 0.20.204.0
>
> Attachments: HADOOP-7262.patch, capacity_scheduler.pdf
>
>
> It will be nice to enhance CS docs with real-world example configs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-7519) hadoop fs commands should support tar/gzip or an equivalent

2011-08-09 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated HADOOP-7519:
-

Attachment: hadoop-7519-0.20.2XX-1.patch

Some of my users had this need in the past. Wrote a short wrapper to 
org.apache.tools.tar.  I got the idea from reading 
http://stuartsierra.com/2008/04/24/a-million-little-files where this user 
converted tar file into sequence file.

This is not ready to commit at all, but I think this gives an idea.  It needs 
ant.jar on its classpath.

{noformat}
% export HADOOP_CLASSPATH=./contrib/tar/lib/ant.jar
% hadoop jar contrib/tar/hadoop-tar.jar --help
usage: hadoop jar hadoop-tar.jar [options]
 -c,--create create a new archive
 -C,--directory Set the working directory to DIR
 -f,--file Use archive file (default '-' for
 stdin/stdout)
--help   show help message
--overwrite  overwrite existing directory
 -P,--absolute-names don't strip leading / from file name
 -p,--preserve-permissions   apply recorded permissions instead of
 applying user's umask when extracting files
--same-group create extracted files with the same group id
--same-owner create extracted files with the same
 ownership
 -t,--list   list files from an archive
 -v,--verboseprint verbose output
 -x,--extractextract files from an archive
 -z,--compress   filter the archive through
 compress/uncompress gzip
{noformat}


> hadoop fs commands should support tar/gzip or an equivalent
> ---
>
> Key: HADOOP-7519
> URL: https://issues.apache.org/jira/browse/HADOOP-7519
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 0.20.1
>Reporter: Keith Wiley
>Priority: Minor
>  Labels: hadoop
> Attachments: hadoop-7519-0.20.2XX-1.patch
>
>
> The "hadoop fs" subcommand should offer options for batching, unbatching, 
> compressing, and uncompressing files on hdfs.  The equivalent of "hadoop fs 
> -tar" or "hadoop fs -gzip".  These commands would greatly facilitate moving 
> large data (especially in a large number of files) back and forth from hdfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HADOOP-6167) bin/hadoop script doesn't allow for different memory settings for each daemon type

2009-07-27 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735846#action_12735846
 ] 

Koji Noguchi commented on HADOOP-6167:
--

bq. how does this not work?

I agree with ryan.  It does work as is (at least in our environment).
If the goal is to make it look better, I'm fine.

In our rhel environment, jvm gives preference to latter if argument is provided 
twice.


> bin/hadoop script doesn't allow for different memory settings for each daemon 
> type
> --
>
> Key: HADOOP-6167
> URL: https://issues.apache.org/jira/browse/HADOOP-6167
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 0.20.0
>Reporter: Fernando
> Attachments: hadoop, hadoop-script.diff
>
>
> bin/hadoop assumes that all daemon types ( namenode, datanode, jobtracker, 
> tasktracker ), all use the same memory settings.. (HADOOP_HEAPSIZE).
> I propose changes to that script to allow overriding the default memory ( 
> HADOOP_HEAPSIZE ), with daemon specific OPTS (HADOOP_NAMENODE_OPTS, etc ).
> Basically at the bottom of the bin/hadoop script, it will check to see if the 
> user has already set "-Xmx" in the HADOOP_OPTS variable.. if so, then it will 
> ignore the JAVA_HEAP_SIZE variable..
> as such:
> # run it
> if [[ $HADOOP_OPTS == *-Xmx* ]]; then
>   exec "$JAVA" $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS "$@"
> else
>   exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS "$@"
> fi
> I will attach the file as I have modified it..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-6202) harchive: Har doesn't work on files having '%' character

2009-08-18 Thread Koji Noguchi (JIRA)
harchive:   Har doesn't work on files having '%' character
--

 Key: HADOOP-6202
 URL: https://issues.apache.org/jira/browse/HADOOP-6202
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Koji Noguchi


If I have a harchive file test.har that contain 
/a/b  
/a/b/abc%cde 
/a/b/fgh 

{noformat}
$ hadoop dfs -cat test.har/_masterindex
1 
0 2046275926 0 244 
$ hadoop dfs -cat test.har/_index
/ dir none 0 0 user 
/user dir none 0 0 knoguchi 
/user/knoguchi/a/b dir none 0 0 abc%cde fgh 
/user/knoguchi dir none 0 0 a 
/user/knoguchi/a dir none 0 0 b 
/user/knoguchi/a/b/fgh file part-0 8 10 
/user/knoguchi/a/b/abc%cde file part-0 0 8 
$ hadoop dfs -lsr har:///user/knoguchi/test.har/
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user/knoguchi
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user/knoguchi/a
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user/knoguchi/a/b
lsr: could not get get listing for 
'har:/user/knoguchi/test.har/user/knoguchi/a/b' : File: 
har://hdfs-mithrilgold-nn1.gold.ygrid.yahoo.com:8020/user/knoguchi/test.har/user/knoguchi/a/b/abc%cde
 does not exist in har:///user/knoguchi/test.har
$
{noformat}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6202) harchive: Har doesn't work on files having '%' character

2009-08-18 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated HADOOP-6202:
-

Description: 
If I have a harchive file test.har that contain 
/a/b  
/a/b/abc%cde 
/a/b/fgh 

{noformat}
$ hadoop dfs -cat test.har/_masterindex
1 
0 2046275926 0 244 
$ hadoop dfs -cat test.har/_index
/ dir none 0 0 user 
/user dir none 0 0 knoguchi 
/user/knoguchi/a/b dir none 0 0 abc%cde fgh 
/user/knoguchi dir none 0 0 a 
/user/knoguchi/a dir none 0 0 b 
/user/knoguchi/a/b/fgh file part-0 8 10 
/user/knoguchi/a/b/abc%cde file part-0 0 8 
$ hadoop dfs -lsr har:///user/knoguchi/test.har/
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user/knoguchi
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user/knoguchi/a
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user/knoguchi/a/b
lsr: could not get get listing for 
'har:/user/knoguchi/test.har/user/knoguchi/a/b' : File: 
har://hdfs-aaa.bbb.ccc.com:8020/user/knoguchi/test.har/user/knoguchi/a/b/abc%cde
 does not exist in har:///user/knoguchi/test.har
$
{noformat}


  was:
If I have a harchive file test.har that contain 
/a/b  
/a/b/abc%cde 
/a/b/fgh 

{noformat}
$ hadoop dfs -cat test.har/_masterindex
1 
0 2046275926 0 244 
$ hadoop dfs -cat test.har/_index
/ dir none 0 0 user 
/user dir none 0 0 knoguchi 
/user/knoguchi/a/b dir none 0 0 abc%cde fgh 
/user/knoguchi dir none 0 0 a 
/user/knoguchi/a dir none 0 0 b 
/user/knoguchi/a/b/fgh file part-0 8 10 
/user/knoguchi/a/b/abc%cde file part-0 0 8 
$ hadoop dfs -lsr har:///user/knoguchi/test.har/
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user/knoguchi
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user/knoguchi/a
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user/knoguchi/a/b
lsr: could not get get listing for 
'har:/user/knoguchi/test.har/user/knoguchi/a/b' : File: 
har://hdfs-mithrilgold-nn1.gold.ygrid.yahoo.com:8020/user/knoguchi/test.har/user/knoguchi/a/b/abc%cde
 does not exist in har:///user/knoguchi/test.har
$
{noformat}



> harchive:   Har doesn't work on files having '%' character
> --
>
> Key: HADOOP-6202
> URL: https://issues.apache.org/jira/browse/HADOOP-6202
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Koji Noguchi
>
> If I have a harchive file test.har that contain 
> /a/b  
> /a/b/abc%cde 
> /a/b/fgh 
> {noformat}
> $ hadoop dfs -cat test.har/_masterindex
> 1 
> 0 2046275926 0 244 
> $ hadoop dfs -cat test.har/_index
> / dir none 0 0 user 
> /user dir none 0 0 knoguchi 
> /user/knoguchi/a/b dir none 0 0 abc%cde fgh 
> /user/knoguchi dir none 0 0 a 
> /user/knoguchi/a dir none 0 0 b 
> /user/knoguchi/a/b/fgh file part-0 8 10 
> /user/knoguchi/a/b/abc%cde file part-0 0 8 
> $ hadoop dfs -lsr har:///user/knoguchi/test.har/
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user/knoguchi
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user/knoguchi/a
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user/knoguchi/a/b
> lsr: could not get get listing for 
> 'har:/user/knoguchi/test.har/user/knoguchi/a/b' : File: 
> har://hdfs-aaa.bbb.ccc.com:8020/user/knoguchi/test.har/user/knoguchi/a/b/abc%cde
>  does not exist in har:///user/knoguchi/test.har
> $
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6202) harchive: Har doesn't work on files having '%' character

2009-08-18 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744687#action_12744687
 ] 

Koji Noguchi commented on HADOOP-6202:
--

Probably this is happening because
HarFileSystem.makeQualified uses URI.toString() which escapes '%' to '%25'.
So string matching fails (as well as incorrect hash value).

> harchive:   Har doesn't work on files having '%' character
> --
>
> Key: HADOOP-6202
> URL: https://issues.apache.org/jira/browse/HADOOP-6202
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Koji Noguchi
>
> If I have a harchive file test.har that contain 
> /a/b  
> /a/b/abc%cde 
> /a/b/fgh 
> {noformat}
> $ hadoop dfs -cat test.har/_masterindex
> 1 
> 0 2046275926 0 244 
> $ hadoop dfs -cat test.har/_index
> / dir none 0 0 user 
> /user dir none 0 0 knoguchi 
> /user/knoguchi/a/b dir none 0 0 abc%cde fgh 
> /user/knoguchi dir none 0 0 a 
> /user/knoguchi/a dir none 0 0 b 
> /user/knoguchi/a/b/fgh file part-0 8 10 
> /user/knoguchi/a/b/abc%cde file part-0 0 8 
> $ hadoop dfs -lsr har:///user/knoguchi/test.har/
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user/knoguchi
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user/knoguchi/a
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user/knoguchi/a/b
> lsr: could not get get listing for 
> 'har:/user/knoguchi/test.har/user/knoguchi/a/b' : File: 
> har://hdfs-aaa.bbb.ccc.com:8020/user/knoguchi/test.har/user/knoguchi/a/b/abc%cde
>  does not exist in har:///user/knoguchi/test.har
> $
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HADOOP-6202) harchive: Har doesn't work on files having '%' character

2009-08-19 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved HADOOP-6202.
--

Resolution: Duplicate

This is reported in HADOOP-6097.

> harchive:   Har doesn't work on files having '%' character
> --
>
> Key: HADOOP-6202
> URL: https://issues.apache.org/jira/browse/HADOOP-6202
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Koji Noguchi
>
> If I have a harchive file test.har that contain 
> /a/b  
> /a/b/abc%cde 
> /a/b/fgh 
> {noformat}
> $ hadoop dfs -cat test.har/_masterindex
> 1 
> 0 2046275926 0 244 
> $ hadoop dfs -cat test.har/_index
> / dir none 0 0 user 
> /user dir none 0 0 knoguchi 
> /user/knoguchi/a/b dir none 0 0 abc%cde fgh 
> /user/knoguchi dir none 0 0 a 
> /user/knoguchi/a dir none 0 0 b 
> /user/knoguchi/a/b/fgh file part-0 8 10 
> /user/knoguchi/a/b/abc%cde file part-0 0 8 
> $ hadoop dfs -lsr har:///user/knoguchi/test.har/
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user/knoguchi
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user/knoguchi/a
> drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
> /user/knoguchi/test.har/user/knoguchi/a/b
> lsr: could not get get listing for 
> 'har:/user/knoguchi/test.har/user/knoguchi/a/b' : File: 
> har://hdfs-aaa.bbb.ccc.com:8020/user/knoguchi/test.har/user/knoguchi/a/b/abc%cde
>  does not exist in har:///user/knoguchi/test.har
> $
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6097) Multiple bugs w/ Hadoop archives

2009-08-19 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745096#action_12745096
 ] 

Koji Noguchi commented on HADOOP-6097:
--

Ben, for the caching part, 
HarFileSystem.java has a comment 
{quote}
   * the uri of Har is 
   * har://underlyingfsscheme-host:port/archivepath.
   * or 
   * har:///archivepath.
{quote}

So it already creates a cache for each harpath?
 

> Multiple bugs w/ Hadoop archives
> 
>
> Key: HADOOP-6097
> URL: https://issues.apache.org/jira/browse/HADOOP-6097
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>Reporter: Ben Slusky
> Fix For: 0.20.1
>
> Attachments: HADOOP-6097.patch
>
>
> Found and fixed several bugs involving Hadoop archives:
> - In makeQualified(), the sloppy conversion from Path to URI and back mangles 
> the path if it contains an escape-worthy character.
> - It's possible that fileStatusInIndex() may have to read more than one 
> segment of the index. The LineReader and count of bytes read need to be reset 
> for each block.
> - har:// connections cannot be indexed by (scheme, authority, username) -- 
> the path is significant as well. Caching them in this way limits a hadoop 
> client to opening one archive per filesystem. It seems to be safe not to 
> cache them, since they wrap another connection that does the actual 
> networking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6097) Multiple bugs w/ Hadoop archives

2009-08-20 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745477#action_12745477
 ] 

Koji Noguchi commented on HADOOP-6097:
--

Ben, sorry about that. I was wrong.

I got confused since HarFileSystem.getUri() returns har://archivepath, 
I mistakenly thought FileSystem.CACHE would use the path as part of the hash 
key.

{noformat}
$ hadoop dfs -ls har:///user/knoguchi/test.har har:///user/knoguchi/test2.har   
 
Found 1 items
drw-r--r--   - knoguchi users  0 2009-08-18 18:52 
/user/knoguchi/test.har/user
ls: Invalid file name: /user/knoguchi/test2.har in har:///user/knoguchi/test.har


$ hadoop dfs -ls har:///user/knoguchi/test2.har har:///user/knoguchi/test.har
Found 1 items
drw---   - knoguchi users  0 2009-08-17 19:15 
/user/knoguchi/test2.har/user
ls: Invalid file name: /user/knoguchi/test.har in har:///user/knoguchi/test2.har
$ 
{noformat}

> Multiple bugs w/ Hadoop archives
> 
>
> Key: HADOOP-6097
> URL: https://issues.apache.org/jira/browse/HADOOP-6097
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>Reporter: Ben Slusky
> Fix For: 0.20.1
>
> Attachments: HADOOP-6097.patch
>
>
> Found and fixed several bugs involving Hadoop archives:
> - In makeQualified(), the sloppy conversion from Path to URI and back mangles 
> the path if it contains an escape-worthy character.
> - It's possible that fileStatusInIndex() may have to read more than one 
> segment of the index. The LineReader and count of bytes read need to be reset 
> for each block.
> - har:// connections cannot be indexed by (scheme, authority, username) -- 
> the path is significant as well. Caching them in this way limits a hadoop 
> client to opening one archive per filesystem. It seems to be safe not to 
> cache them, since they wrap another connection that does the actual 
> networking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6239) Command-line for append

2009-09-04 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated HADOOP-6239:
-

Description: 
Once append is implemented in hdfs, it would be nice if users can append files 
from a command-line. 
Either have a separate 'append' command or add '-append' option for 'put' and 
'cp'

Like,  
{noformat}
% cat mylocalfile  | hadoop dfs -put -append - /user/knoguchi/myfile('-' 
for stdin)
% hadoop dfs -cp -append myhdfsfile1 myhdfsfile2  
{noformat}

  was:
Once append is implemented in hdfs, it would be nice if users can append files 
from a command-line. 
Either have a separate 'append' command or add '-append' option for 'put' and 
'cp'

Like,  
% cat mylocalfile  | hadoop dfs -put -append - /user/knoguchi/myfile('-' 
for stdin)
% hadoop dfs -cp -append myhdfsfile1 myhdfsfile2  



> Command-line for append
> ---
>
> Key: HADOOP-6239
> URL: https://issues.apache.org/jira/browse/HADOOP-6239
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Reporter: Koji Noguchi
>
> Once append is implemented in hdfs, it would be nice if users can append 
> files from a command-line. 
> Either have a separate 'append' command or add '-append' option for 'put' and 
> 'cp'
> Like,  
> {noformat}
> % cat mylocalfile  | hadoop dfs -put -append - /user/knoguchi/myfile('-' 
> for stdin)
> % hadoop dfs -cp -append myhdfsfile1 myhdfsfile2  
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-6239) Command-line for append

2009-09-04 Thread Koji Noguchi (JIRA)
Command-line for append
---

 Key: HADOOP-6239
 URL: https://issues.apache.org/jira/browse/HADOOP-6239
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs
Reporter: Koji Noguchi


Once append is implemented in hdfs, it would be nice if users can append files 
from a command-line. 
Either have a separate 'append' command or add '-append' option for 'put' and 
'cp'

Like,  
% cat mylocalfile  | hadoop dfs -put -append - /user/knoguchi/myfile('-' 
for stdin)
% hadoop dfs -cp -append myhdfsfile1 myhdfsfile2  


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-6280) Uniform way of setting default param for ToolRunner (like passing -Ddfs.umask)

2009-09-23 Thread Koji Noguchi (JIRA)
Uniform way of setting default param for ToolRunner (like passing -Ddfs.umask)
--

 Key: HADOOP-6280
 URL: https://issues.apache.org/jira/browse/HADOOP-6280
 Project: Hadoop Common
  Issue Type: New Feature
  Components: util
Reporter: Koji Noguchi
Priority: Minor


Sometimes our users want to overwrite the dfs.umask setting we have on our 
cluster  (but continue to use the configdir we setup).
They would need to explicitly insert 

hadoop dfs -Ddfs.umask=23 -put ...
or 
hadoop jar myjar.jar org.MyMain -Ddfs.umask=23  ...

for all the hadoop related calls.

It would be nice if this  can be done by setting an environment like 

export HADOOP_TOOL_OPTS=-Ddfs.umask=23

and cover all cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6280) Uniform way of setting default param for ToolRunner (like passing -Ddfs.umask)

2009-09-23 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758779#action_12758779
 ] 

Koji Noguchi commented on HADOOP-6280:
--

My initial thought was to simply insert 

+ exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS 
$HADOOP_TOOL_OPTS "$@"

in the hadoop script. 

But this won't work when 
# if  $CLASS doesn't implement Tool, then command would fail 
# hadoop jar case won't work.

If I have 
% export HADOOP_TOOL_OPTS -Ddfs.umask=23 
% hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input ...

I want to have 

java org.apache.hadoop.util.RunJar $HADOOP_HOME/hadoop-streaming.jar 
-Ddfs.umask=23 -input 

and NOT

java org.apache.hadoop.util.RunJar -Ddfs.umask=23 
$HADOOP_HOME/hadoop-streaming.jar -input


There must be something obvious I'm missing...
If not, can I ask the ToolRunner to read in the HADOOP_TOOL_OPTS property 
directly?

> Uniform way of setting default param for ToolRunner (like passing -Ddfs.umask)
> --
>
> Key: HADOOP-6280
> URL: https://issues.apache.org/jira/browse/HADOOP-6280
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: util
>Reporter: Koji Noguchi
>Priority: Minor
>
> Sometimes our users want to overwrite the dfs.umask setting we have on our 
> cluster  (but continue to use the configdir we setup).
> They would need to explicitly insert 
> hadoop dfs -Ddfs.umask=23 -put ...
> or 
> hadoop jar myjar.jar org.MyMain -Ddfs.umask=23  ...
> for all the hadoop related calls.
> It would be nice if this  can be done by setting an environment like 
> export HADOOP_TOOL_OPTS=-Ddfs.umask=23
> and cover all cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-6284) Any hadoop commands crashing jvm (SIGBUS) when /tmp (tmpfs) is full

2009-09-23 Thread Koji Noguchi (JIRA)
Any hadoop commands crashing jvm (SIGBUS)  when /tmp (tmpfs) is full


 Key: HADOOP-6284
 URL: https://issues.apache.org/jira/browse/HADOOP-6284
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Reporter: Koji Noguchi
Priority: Minor


{noformat}
[knoguchi@ ~]$ df /tmp
Filesystem   1K-blocks  Used Available Use% Mounted on
tmpfs   524288524288 0 100% /tmp
[knoguchi@ ~]$ hadoop dfs -ls 
#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00824077, pid=19185, tid=4160617360
#
# Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
# Problematic frame:
# C  [libc.so.6+0x6e077]  memset+0x37
#
# An error report file with more information is saved as:
# /homes/knoguchi/hs_err_pid19185.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Aborted
[knoguchi@ ~]$ 
{noformat}

This does not happen when /tmp is not in tmpfs.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6284) Any hadoop commands crashing jvm (SIGBUS) when /tmp (tmpfs) is full

2009-09-23 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758896#action_12758896
 ] 

Koji Noguchi commented on HADOOP-6284:
--

Reproducing this error, it is crashing when trying to create 
/tmp/hsperf_knoguchi

[pid 17137] open("/tmp/hsperfdata_knoguchi/17135", O_RDWR|O_CREAT|O_TRUNC, 
0600) = 3
[pid 17137] ftruncate(3, 32768) = 0
[pid 17137] mmap2(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0xf7fb817c) 
= 0xf7fec000
[pid 17137] close(3)= 0
[pid 17137] --- SIGBUS (Bus error) @ 0 (0) ---

Since /tmp is a tmpfs, open itself goes through which is confusing the jvm.

It would have been nice if we can set different /tmp, but this is hard coded in 
java.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6447182

It is suggested that 
"One workaround would be to disable the temporary mapping of hsperfdata file by 
using "-XX:-UsePerfData". "


This can almost be done by setting HADOOP_CLIENT_OPTS  but we also have this in 
the hadoop script.
{noformat}
  JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} -Xmx32m 
org.apache.hadoop.util.PlatformName | sed -e "s/ /_/g"`
{noformat}
which also fails when /tmp is full. 

Can we have a way to set options for this command or hardcode  
"-XX:-UsePerfData" in the above line?

We have couple of incidents where one user fills up /tmp and failing all the 
hadoop commands from that node.


> Any hadoop commands crashing jvm (SIGBUS)  when /tmp (tmpfs) is full
> 
>
> Key: HADOOP-6284
> URL: https://issues.apache.org/jira/browse/HADOOP-6284
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Koji Noguchi
>Priority: Minor
>
> {noformat}
> [knoguchi@ ~]$ df /tmp
> Filesystem   1K-blocks  Used Available Use% Mounted on
> tmpfs   524288524288 0 100% /tmp
> [knoguchi@ ~]$ hadoop dfs -ls 
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0x00824077, pid=19185, tid=4160617360
> #
> # Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
> # Problematic frame:
> # C  [libc.so.6+0x6e077]  memset+0x37
> #
> # An error report file with more information is saved as:
> # /homes/knoguchi/hs_err_pid19185.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> Aborted
> [knoguchi@ ~]$ 
> {noformat}
> This does not happen when /tmp is not in tmpfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6284) Any hadoop commands crashing jvm (SIGBUS) when /tmp (tmpfs) is full

2009-09-23 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758913#action_12758913
 ] 

Koji Noguchi commented on HADOOP-6284:
--

bq. How about we add a HADOOP_JVM_OPTS?

I only want this set for 'JAVA_PLATFORM=`CLASSPATH... ${JAVA}'   command. 
(Since UsePerfData looks required for java tools to connect to jvm.)

 HADOOP_JVM_OPTS sounds too general for that.


> Any hadoop commands crashing jvm (SIGBUS)  when /tmp (tmpfs) is full
> 
>
> Key: HADOOP-6284
> URL: https://issues.apache.org/jira/browse/HADOOP-6284
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Koji Noguchi
>Priority: Minor
>
> {noformat}
> [knoguchi@ ~]$ df /tmp
> Filesystem   1K-blocks  Used Available Use% Mounted on
> tmpfs   524288524288 0 100% /tmp
> [knoguchi@ ~]$ hadoop dfs -ls 
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0x00824077, pid=19185, tid=4160617360
> #
> # Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
> # Problematic frame:
> # C  [libc.so.6+0x6e077]  memset+0x37
> #
> # An error report file with more information is saved as:
> # /homes/knoguchi/hs_err_pid19185.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> Aborted
> [knoguchi@ ~]$ 
> {noformat}
> This does not happen when /tmp is not in tmpfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6097) Multiple bugs w/ Hadoop archives

2009-09-24 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759193#action_12759193
 ] 

Koji Noguchi commented on HADOOP-6097:
--

Do we want a test case of handling two har files at once?
Other than that, +1.

> Multiple bugs w/ Hadoop archives
> 
>
> Key: HADOOP-6097
> URL: https://issues.apache.org/jira/browse/HADOOP-6097
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.19.2, 
> 0.20.0, 0.20.1
>Reporter: Ben Slusky
>Assignee: Ben Slusky
> Fix For: 0.20.2
>
> Attachments: HADOOP-6097-0.20.patch, HADOOP-6097-0.20.patch, 
> HADOOP-6097-v2.patch, HADOOP-6097.patch
>
>
> Found and fixed several bugs involving Hadoop archives:
> - In makeQualified(), the sloppy conversion from Path to URI and back mangles 
> the path if it contains an escape-worthy character.
> - It's possible that fileStatusInIndex() may have to read more than one 
> segment of the index. The LineReader and count of bytes read need to be reset 
> for each block.
> - har:// connections cannot be indexed by (scheme, authority, username) -- 
> the path is significant as well. Caching them in this way limits a hadoop 
> client to opening one archive per filesystem. It seems to be safe not to 
> cache them, since they wrap another connection that does the actual 
> networking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6097) Multiple bugs w/ Hadoop archives

2009-09-24 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759316#action_12759316
 ] 

Koji Noguchi commented on HADOOP-6097:
--

bq. since the core problem was fixed in HADOOP-6231 which already adds test to 
it, I dont think we need specific tests for har... 

I wanted a testcase so  that it would catch if someone take out 
{noformat}
fs.har.impl.disable.cache
true
{noformat} 
from core-default.xml 



> Multiple bugs w/ Hadoop archives
> 
>
> Key: HADOOP-6097
> URL: https://issues.apache.org/jira/browse/HADOOP-6097
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.19.2, 
> 0.20.0, 0.20.1
>Reporter: Ben Slusky
>Assignee: Ben Slusky
> Fix For: 0.20.2
>
> Attachments: HADOOP-6097-0.20.patch, HADOOP-6097-0.20.patch, 
> HADOOP-6097-v2.patch, HADOOP-6097.patch
>
>
> Found and fixed several bugs involving Hadoop archives:
> - In makeQualified(), the sloppy conversion from Path to URI and back mangles 
> the path if it contains an escape-worthy character.
> - It's possible that fileStatusInIndex() may have to read more than one 
> segment of the index. The LineReader and count of bytes read need to be reset 
> for each block.
> - har:// connections cannot be indexed by (scheme, authority, username) -- 
> the path is significant as well. Caching them in this way limits a hadoop 
> client to opening one archive per filesystem. It seems to be safe not to 
> cache them, since they wrap another connection that does the actual 
> networking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6284) Any hadoop commands crashing jvm (SIGBUS) when /tmp (tmpfs) is full

2009-09-24 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated HADOOP-6284:
-

Attachment: hadoop-6284-patch-v1.txt

It's a silly patch but introduces a new env HADOOP_JAVA_PLATFORM_OPTS .

With this, no option set.
{noformat}
[knoguchi@ ~]$ df /tmp
Filesystem   1K-blocks  Used Available Use% Mounted on
tmpfs   524288524288 0 100% /tmp

[knoguchi@ ~]$ hadoop dfs -ls /
#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00824077, pid=12811, tid=4160617360
#
# Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
# Problematic frame:
# C  [libc.so.6+0x6e077]  memset+0x37
#
# An error report file with more information is saved as:
# /homes/knoguchi/hs_err_pid12811.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Abort
{noformat}

Setting HADOOP_CLIENT_OPTS

{noformat}
[knoguchi@ ~]$ setenv HADOOP_CLIENT_OPTS '-XX:-UsePerfData'
[knoguchi@ ~]$ $HADOOP_HOME/bin/hadoop dfs -ls /
Exception in thread "main" java.lang.NoClassDefFoundError: 
#_An_unexpected_error_has_been_detected_by_Java_Runtime_Environment:
Caused by: java.lang.ClassNotFoundException: 
#_An_unexpected_error_has_been_detected_by_Java_Runtime_Environment:
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)

{noformat}

This is because hadoop is executing 
java -Xmx1000m -Djava.library.path=/.../hadoop/bin/../lib/native/# 
#_An_unexpected_error_has_been_detected_by_Java_Runtime_Environment: #

(basically JAVA_PLATFORM became a long error message)

and then 
{noformat}
[knoguchi@ ~]$ setenv HADOOP_JAVA_PLATFORM_OPTS '-XX:-UsePerfData'
[knoguchi@ ~]$ $HADOOP_HOME/bin/hadoop dfs -ls /   
Found 10 items
drwx--   - ...
{noformat}

works.

I'm reluctant to put -XX:-UsePerfData directly in hadoop script since I don't 
know when java stops supporting this option.

> Any hadoop commands crashing jvm (SIGBUS)  when /tmp (tmpfs) is full
> 
>
> Key: HADOOP-6284
> URL: https://issues.apache.org/jira/browse/HADOOP-6284
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Koji Noguchi
>Priority: Minor
> Attachments: hadoop-6284-patch-v1.txt
>
>
> {noformat}
> [knoguchi@ ~]$ df /tmp
> Filesystem   1K-blocks  Used Available Use% Mounted on
> tmpfs   524288524288 0 100% /tmp
> [knoguchi@ ~]$ hadoop dfs -ls 
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0x00824077, pid=19185, tid=4160617360
> #
> # Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
> # Problematic frame:
> # C  [libc.so.6+0x6e077]  memset+0x37
> #
> # An error report file with more information is saved as:
> # /homes/knoguchi/hs_err_pid19185.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> Aborted
> [knoguchi@ ~]$ 
> {noformat}
> This does not happen when /tmp is not in tmpfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6284) Any hadoop commands crashing jvm (SIGBUS) when /tmp (tmpfs) is full

2009-09-25 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759693#action_12759693
 ] 

Koji Noguchi commented on HADOOP-6284:
--

bq. I think a bug is needed to be files against Hotspot for this... 

I'll look into it.  But in the meantime, this Jira is asking for a way to pass 
an option for JAVA_PLATFORM.
(We could have done this for "-Xmx32m" HADOOP-5564 as well.)

> Any hadoop commands crashing jvm (SIGBUS)  when /tmp (tmpfs) is full
> 
>
> Key: HADOOP-6284
> URL: https://issues.apache.org/jira/browse/HADOOP-6284
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: hadoop-6284-patch-v1.txt
>
>
> {noformat}
> [knoguchi@ ~]$ df /tmp
> Filesystem   1K-blocks  Used Available Use% Mounted on
> tmpfs   524288524288 0 100% /tmp
> [knoguchi@ ~]$ hadoop dfs -ls 
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0x00824077, pid=19185, tid=4160617360
> #
> # Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
> # Problematic frame:
> # C  [libc.so.6+0x6e077]  memset+0x37
> #
> # An error report file with more information is saved as:
> # /homes/knoguchi/hs_err_pid19185.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> Aborted
> [knoguchi@ ~]$ 
> {noformat}
> This does not happen when /tmp is not in tmpfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6284) Any hadoop commands crashing jvm (SIGBUS) when /tmp (tmpfs) is full

2009-10-05 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated HADOOP-6284:
-

Attachment: HADOOP-6284-y0.20.1.patch

Patch for 0.20. (not meant for commit)

> Any hadoop commands crashing jvm (SIGBUS)  when /tmp (tmpfs) is full
> 
>
> Key: HADOOP-6284
> URL: https://issues.apache.org/jira/browse/HADOOP-6284
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: hadoop-6284-patch-v1.txt, HADOOP-6284-y0.20.1.patch
>
>
> {noformat}
> [knoguchi@ ~]$ df /tmp
> Filesystem   1K-blocks  Used Available Use% Mounted on
> tmpfs   524288524288 0 100% /tmp
> [knoguchi@ ~]$ hadoop dfs -ls 
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0x00824077, pid=19185, tid=4160617360
> #
> # Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
> # Problematic frame:
> # C  [libc.so.6+0x6e077]  memset+0x37
> #
> # An error report file with more information is saved as:
> # /homes/knoguchi/hs_err_pid19185.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> Aborted
> [knoguchi@ ~]$ 
> {noformat}
> This does not happen when /tmp is not in tmpfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-4933) ConcurrentModificationException in JobHistory.java

2009-10-30 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771969#action_12771969
 ] 

Koji Noguchi commented on HADOOP-4933:
--

FYI, when hitting this bug,
1) Job history look-up can fail with error as reported on this Jira.
or
2) Job history look-up can show information of completely different job.

(2) is worse. Maybe backport to 0.20 as well?


> ConcurrentModificationException in JobHistory.java
> --
>
> Key: HADOOP-4933
> URL: https://issues.apache.org/jira/browse/HADOOP-4933
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 0.20.0
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: HADOOP-4933-v1.1.patch
>
>
> {{JobHistory.java}} throws {{ConcurrentModificationException}} while finding 
> out the job history version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6335) Support reading of concatenated gzip files

2009-10-30 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772183#action_12772183
 ] 

Koji Noguchi commented on HADOOP-6335:
--

Is this a duplicate of MAPREDUCE-469 ?

Also, can we make a fix for the native compression as well (maybe in a new 
Jira)?

> Support reading of concatenated gzip files
> --
>
> Key: HADOOP-6335
> URL: https://issues.apache.org/jira/browse/HADOOP-6335
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
>
> GzipCodec.GzipInputStream needs to support reading of concatenated gzip files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6284) Any hadoop commands crashing jvm (SIGBUS) when /tmp (tmpfs) is full

2009-12-03 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785423#action_12785423
 ] 

Koji Noguchi commented on HADOOP-6284:
--

FYI, we deployed the fix with '-XX:-UsePerfData' config change to our clusters, 
only to find out this option would hang each jvm for 4 seconds when shutting 
down...
A single ls call, java_platform + dfsclient, used to take less than 0.1 second, 
now took 7-8 seconds after the change...   We ended up reverting the config and 
now changing the /tmp configuration.

> Any hadoop commands crashing jvm (SIGBUS)  when /tmp (tmpfs) is full
> 
>
> Key: HADOOP-6284
> URL: https://issues.apache.org/jira/browse/HADOOP-6284
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: hadoop-6284-patch-v1.txt, HADOOP-6284-y0.20.1.patch
>
>
> {noformat}
> [knoguchi@ ~]$ df /tmp
> Filesystem   1K-blocks  Used Available Use% Mounted on
> tmpfs   524288524288 0 100% /tmp
> [knoguchi@ ~]$ hadoop dfs -ls 
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0x00824077, pid=19185, tid=4160617360
> #
> # Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
> # Problematic frame:
> # C  [libc.so.6+0x6e077]  memset+0x37
> #
> # An error report file with more information is saved as:
> # /homes/knoguchi/hs_err_pid19185.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> Aborted
> [knoguchi@ ~]$ 
> {noformat}
> This does not happen when /tmp is not in tmpfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-6669) zlib.compress.level ignored for DefaultCodec initialization

2010-03-31 Thread Koji Noguchi (JIRA)
zlib.compress.level  ignored for DefaultCodec initialization 
-

 Key: HADOOP-6669
 URL: https://issues.apache.org/jira/browse/HADOOP-6669
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor


HADOOP-5879 added a compression level for codecs, but DefaultCodec seems to 
ignore this conf value when initialized.
This is only when codec is first created.  reinit() probably sets it right. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6669) zlib.compress.level ignored for DefaultCodec initialization

2010-03-31 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated HADOOP-6669:
-

Status: Patch Available  (was: Open)

> zlib.compress.level  ignored for DefaultCodec initialization 
> -
>
> Key: HADOOP-6669
> URL: https://issues.apache.org/jira/browse/HADOOP-6669
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: HADOOP-6669-0.patch
>
>
> HADOOP-5879 added a compression level for codecs, but DefaultCodec seems to 
> ignore this conf value when initialized.
> This is only when codec is first created.  reinit() probably sets it right. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-6669) zlib.compress.level ignored for DefaultCodec initialization

2010-03-31 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated HADOOP-6669:
-

Attachment: HADOOP-6669-0.patch

Attaching a patch that
1) Pass the conf when compressor is created
2) One test case to make sure no compression is done when NO_COMPRESSION is 
being set.
3) Added one extra check in codectest which compare the original and compressed 
output.  Not directly related to this Jira.  

> zlib.compress.level  ignored for DefaultCodec initialization 
> -
>
> Key: HADOOP-6669
> URL: https://issues.apache.org/jira/browse/HADOOP-6669
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: HADOOP-6669-0.patch
>
>
> HADOOP-5879 added a compression level for codecs, but DefaultCodec seems to 
> ignore this conf value when initialized.
> This is only when codec is first created.  reinit() probably sets it right. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6669) zlib.compress.level ignored for DefaultCodec initialization

2010-04-01 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852428#action_12852428
 ] 

Koji Noguchi commented on HADOOP-6669:
--

Hm. Didn't know that native library is not used in hudson testing.

My unit test was testing
1) gzipcodec-native
2) defaultcodec-native
3) defaultcodec-non-native

(1) and (2) are skipped in hudson testing.

Hudson testReport showing
bq. 2010-04-01 00:15:52,882 WARN  compress.TestCodec 
(TestCodec.java:testCodecInitWithCompressionLevel(373)) - 
testCodecInitWithCompressionLevel for native skipped: native libs not loaded

Testing manually, (2) and (3) failed without the patch and succeeded with the 
patch.


> zlib.compress.level  ignored for DefaultCodec initialization 
> -
>
> Key: HADOOP-6669
> URL: https://issues.apache.org/jira/browse/HADOOP-6669
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: HADOOP-6669-0.patch
>
>
> HADOOP-5879 added a compression level for codecs, but DefaultCodec seems to 
> ignore this conf value when initialized.
> This is only when codec is first created.  reinit() probably sets it right. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6702) Incorrect exit codes for "dfs -chown", "dfs -chgrp" when input is given in wildcard format.

2010-04-16 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857862#action_12857862
 ] 

Koji Noguchi commented on HADOOP-6702:
--

Incorrect return code for wildcard in hadoop is not limited to chown/chgrp.  
It's everywhere.
For example, 
In 'ls', this is how unix performs, 

{noformat}
% ls nonexist*
ls: No match.
% echo $?
1
% ls nonexist* file*
fileA
% echo $?
0
% ls file* nonexist* 
fileA
% echo $?
0
{noformat} 
*It returns 0 as long as one of the globing matches*.
and in hadoop 'ls'
{noformat}

% hadoop dfs -ls file\* nonexist\*
Found 1 items
-rw---   3 knoguchi users  7 2010-04-08 15:57 /user/knoguchi/fileA
ls: Cannot access nonexist*: No such file or directory.
% echo $?
255

% hadoop dfs -ls nonexist\* file\* 
ls: Cannot access nonexist*: No such file or directory.
Found 1 items
-rw---   3 knoguchi users  7 2010-04-08 15:57 /user/knoguchi/fileA
% echo $?
0
% 
{noformat}

hadoop 'ls' simply returns the result of the last globbing.

This behavior is also inconsistent from command to command. 
Picking three hadoop commands. chgrp/ls/du. 

|| command || single globbing  || multiple globbing ||
| chown/chgrp/etc | X (returns 0 even if globing returns empty) | X (returns 0 
even if all the globbing returns empty) |
| ls | O | X (returns last globbing result) | 
| du | O | X (returns non-zero even if one of the globbing fail) |  

Suggested fix in this Jira would simply change the behavior of 'chown/chgrp'  
to 'du'-like which means multiple globbing will still be incorrect.


> Incorrect exit codes for "dfs -chown", "dfs -chgrp"  when input is given in 
> wildcard format.
> 
>
> Key: HADOOP-6702
> URL: https://issues.apache.org/jira/browse/HADOOP-6702
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 0.19.1, 0.20.0, 0.20.1, 0.20.2
>Reporter: Ravi Phulari
>Assignee: Ravi Phulari
>Priority: Minor
> Fix For: 0.20.3, 0.21.0, 0.22.0
>
>
> Currently incorrect exit codes  are given for "dfs -chown", "dfs -chgrp"  
> when input is given in wildcard format.
> This bug is due to missing update of errors count in {{FsShell.java}}.
> {code:title=FsShell.java|borderStyle=solid}
> int runCmdHandler(CmdHandler handler, String[] args,
>int startIndex, boolean recursive) 
>throws IOException {
> int errors = 0;
> 
> for (int i=startIndex; i   Path srcPath = new Path(args[i]);
>   FileSystem srcFs = srcPath.getFileSystem(getConf());
>   Path[] paths = FileUtil.stat2Paths(srcFs.globStatus(srcPath), srcPath);
>   for(Path path : paths) {
> try {
>   FileStatus file = srcFs.getFileStatus(path);
>   if (file == null) {
> System.err.println(handler.getName() + 
>": could not get status for '" + path + "'");
> errors++;
>   } else {
> errors += runCmdHandler(handler, file, srcFs, recursive);
>   }
> } catch (IOException e) {
>   String msg = (e.getMessage() != null ? e.getLocalizedMessage() :
> (e.getCause().getMessage() != null ? 
> e.getCause().getLocalizedMessage() : "null"));
>   System.err.println(handler.getName() + ": could not get status for 
> '"
> + path + "': " + msg.split("\n")[0]);
>   errors++;
> }
>   }
> }
>  {code}
> If there are no files on HDFS matching to wildcard input then  
> {{srcFs.globStatus(srcpath)}} returns 0. 
> {{ Path[] paths = FileUtil.stat2Paths(srcFs.globStatus(srcPath), srcPath);}}
> Resulting no increment in {{errors}} and command exits with 0 even though 
> file/directory does not exist.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HADOOP-6713) The RPC server Listener thread is a scalability bottleneck

2010-04-22 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859905#action_12859905
 ] 

Koji Noguchi commented on HADOOP-6713:
--

Nice. Any performance numbers ?

> The RPC server Listener thread is a scalability bottleneck
> --
>
> Key: HADOOP-6713
> URL: https://issues.apache.org/jira/browse/HADOOP-6713
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 0.21.0
>Reporter: dhruba borthakur
>Assignee: Dmytro Molkov
> Attachments: HADOOP-6713.patch
>
>
> The Hadoop RPC Server implementation has a single Listener thread that reads 
> data from the socket and puts them into a call queue. This means that this 
> single thread can pull RPC requests off the network only as fast as a single 
> CPU can execute. This is a scalability bottlneck in our cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6702) Incorrect exit codes for "dfs -chown", "dfs -chgrp" when input is given in wildcard format.

2010-05-10 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865958#action_12865958
 ] 

Koji Noguchi commented on HADOOP-6702:
--

bq. HADOOP-6701  Fixes this issue. 

Ravi, HADOOP-6701 just moved the error handling of globbing from one type to 
another. 
Should I create a separate Jira for glob handling?

> Incorrect exit codes for "dfs -chown", "dfs -chgrp"  when input is given in 
> wildcard format.
> 
>
> Key: HADOOP-6702
> URL: https://issues.apache.org/jira/browse/HADOOP-6702
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 0.19.1, 0.20.0, 0.20.1, 0.20.2
>Reporter: Ravi Phulari
>Assignee: Ravi Phulari
>Priority: Minor
> Fix For: 0.20.3, 0.21.0, 0.22.0
>
>
> Currently incorrect exit codes  are given for "dfs -chown", "dfs -chgrp"  
> when input is given in wildcard format.
> This bug is due to missing update of errors count in {{FsShell.java}}.
> {code:title=FsShell.java|borderStyle=solid}
> int runCmdHandler(CmdHandler handler, String[] args,
>int startIndex, boolean recursive) 
>throws IOException {
> int errors = 0;
> 
> for (int i=startIndex; i   Path srcPath = new Path(args[i]);
>   FileSystem srcFs = srcPath.getFileSystem(getConf());
>   Path[] paths = FileUtil.stat2Paths(srcFs.globStatus(srcPath), srcPath);
>   for(Path path : paths) {
> try {
>   FileStatus file = srcFs.getFileStatus(path);
>   if (file == null) {
> System.err.println(handler.getName() + 
>": could not get status for '" + path + "'");
> errors++;
>   } else {
> errors += runCmdHandler(handler, file, srcFs, recursive);
>   }
> } catch (IOException e) {
>   String msg = (e.getMessage() != null ? e.getLocalizedMessage() :
> (e.getCause().getMessage() != null ? 
> e.getCause().getLocalizedMessage() : "null"));
>   System.err.println(handler.getName() + ": could not get status for 
> '"
> + path + "': " + msg.split("\n")[0]);
>   errors++;
> }
>   }
> }
>  {code}
> If there are no files on HDFS matching to wildcard input then  
> {{srcFs.globStatus(srcpath)}} returns 0. 
> {{ Path[] paths = FileUtil.stat2Paths(srcFs.globStatus(srcPath), srcPath);}}
> Resulting no increment in {{errors}} and command exits with 0 even though 
> file/directory does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6766) Spill can fail with bad call to Random

2010-05-15 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867950#action_12867950
 ] 

Koji Noguchi commented on HADOOP-6766:
--

Peter, if you're using yahoo clusters it could be ops accidentally deleting the 
local dirs after TaskTracker came up. 

> Spill can fail with bad call to Random
> --
>
> Key: HADOOP-6766
> URL: https://issues.apache.org/jira/browse/HADOOP-6766
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 0.20.2
>Reporter: Peter Arthur Ciccolo
>Priority: Minor
>
> java.lang.IllegalArgumentException: n must be positive
> at java.util.Random.nextInt(Random.java:250)
> at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:243)
> at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:289)
> at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> at 
> org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:159)
> confChanged assumes that the list of dirs it creates 
> (LocalDirAllocator.java:215) has at least one element in it by the end of the 
> function. If, for each local dir, either the conditional on line 221 is 
> false, or the call to DiskChecker.checkDir() throws an exception, this 
> assumption will not hold. In this case, dirIndexRandomizer.nextInt() is 
> called on the number of elements in dirs, which is 0. Since 
> dirIndexRandomizer (195) is an instance of Random(), it needs a positive 
> (non-zero) argument to nextInt().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6760) WebServer shouldn't increase port number in case of negative port setting caused by Jetty's race

2010-05-19 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869298#action_12869298
 ] 

Koji Noguchi commented on HADOOP-6760:
--

bq. Thus a daemon's webserver becomes pretty much useless.

Just for the record, we observed multiple Datanodes/TaskTrackers constantly 
failing as bleow. 

{noformat}
2010-04-01 00:36:39,706 INFO org.apache.hadoop.tools.DistCp: FAIL 4/part-00031 
: java.io.FileNotFoundException: 
http://abc1234.com:50076/streamFile?filename=/data/4/part-00031&ugi=knoguchi
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1288)
at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:143)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:415)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:543)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:310)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
{noformat}

{noformat}
2010-04-27 14:48:48,641 WARN org.apache.hadoop.mapred.ReduceTask: 
java.io.FileNotFoundException: 
http://abcl51349.com:50061/mapOutput?job=job_201004211814_&map=attempt_201004211814__m_000791_0&reduce=0
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1288)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1500)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1381)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1293)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1224)
{noformat}

Map tasks on these tasktrackers kept on failing with 
"Status : FAILED Too many fetch-failures"


> WebServer shouldn't increase port number in case of negative port setting 
> caused by Jetty's race
> 
>
> Key: HADOOP-6760
> URL: https://issues.apache.org/jira/browse/HADOOP-6760
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 0.20.3
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
> Fix For: 0.21.0
>
> Attachments: HADOOP-6760.0.20.patch, HADOOP-6760.0.20.patch, 
> HADOOP-6760.0.20.patch, HADOOP-6760.0.20.patch, HADOOP-6760.patch, 
> HADOOP-6760.patch, HADOOP-6760.patch
>
>
> When a negative port is assigned to a webserver socket (because of a race 
> inside of the Jetty server) the workaround from HADOOP-6386 is increasing the 
> original port number on the next bind attempt. Apparently, this is an 
> incorrect logic and next bind attempt should happen on the same port number 
> if possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.