How to enumerate files in the directories?
Hello, how can one determine the names of the files in a particular hadoop directory, programmatically?
Re: Ganglia 3.1 on Hadoop 0.20.2 ...
Brian, Works for me now.. one should point the servers param to the multicast address that gmond writes to and listens on... and not the ganglia server. Started working once I did this. thanks for you inputs, -G. On Tue, Aug 24, 2010 at 7:12 PM, Brian Bockelman bbock...@cse.unl.eduwrote: On Aug 24, 2010, at 8:27 AM, Gautam wrote: I was trying to get Ganglia 3.1 to work with the stable hadoop-0.20.2 version from Apache. I patched this release from HADOOP-4675 using HADOOP-4675-v7.patch as suggested by CDH3 release notes [1] I am unable to see any hadoop metrics on the Ganglia monitoring UI. The other metrics that gmond spews (system CPU/Memory etc) seem to work. When I switch to FileContext the metrics are written properly to the log file. Once I moved to GangliaContext31 it doesn't show anything. I tried pointing the servers param to localhost:8649 while listening on that port using netcat on that machine... nothing comes up on netcat. Has anyone faced this issue? This is possibly misleading - netcat won't work if Hadoop is using UDP. My advice is to do: telnet $Ganglia_Server 9988 and see if it spits out a bunch of XML. In the typical Ganglia configuration, it is set up to listen on UDP and write on TCP of the same port. A third thing to test is to switch the hadoop-metrics back to the file output, and make sure something gets written to the log file. The issue might be upstream. Brian This is what most of my hadoop-metrics looks like: dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 dfs.period=10 dfs.fileName=/tmp/dfsmetrics.log dfs.servers=$Ganglia_Server:9988 # Configuration of the mapred context for null mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 mapred.period=10 mapred.fileName=/tmp/mrmetrics.log mapred.servers=$Ganglia_Server:9988 # Configuration of the jvm context for null jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 jvm.period=10 jvm.fileName=/tmp/jvmmetrics.log jvm.servers=$GANGLIA_SERVER:9988 -G. [1] - http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228.CHANGES.txt -- If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers...
Re: Ganglia 3.1 on Hadoop 0.20.2 ...
Hi Gautam, Yup - that's one possible way to configure Ganglia and is common at many sites. That's why I usually recommend the telnet trick to determine what IP address your configuration is using. Brian On Aug 25, 2010, at 5:53 AM, Gautam wrote: Brian, Works for me now.. one should point the servers param to the multicast address that gmond writes to and listens on... and not the ganglia server. Started working once I did this. thanks for you inputs, -G. On Tue, Aug 24, 2010 at 7:12 PM, Brian Bockelman bbock...@cse.unl.eduwrote: On Aug 24, 2010, at 8:27 AM, Gautam wrote: I was trying to get Ganglia 3.1 to work with the stable hadoop-0.20.2 version from Apache. I patched this release from HADOOP-4675 using HADOOP-4675-v7.patch as suggested by CDH3 release notes [1] I am unable to see any hadoop metrics on the Ganglia monitoring UI. The other metrics that gmond spews (system CPU/Memory etc) seem to work. When I switch to FileContext the metrics are written properly to the log file. Once I moved to GangliaContext31 it doesn't show anything. I tried pointing the servers param to localhost:8649 while listening on that port using netcat on that machine... nothing comes up on netcat. Has anyone faced this issue? This is possibly misleading - netcat won't work if Hadoop is using UDP. My advice is to do: telnet $Ganglia_Server 9988 and see if it spits out a bunch of XML. In the typical Ganglia configuration, it is set up to listen on UDP and write on TCP of the same port. A third thing to test is to switch the hadoop-metrics back to the file output, and make sure something gets written to the log file. The issue might be upstream. Brian This is what most of my hadoop-metrics looks like: dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 dfs.period=10 dfs.fileName=/tmp/dfsmetrics.log dfs.servers=$Ganglia_Server:9988 # Configuration of the mapred context for null mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 mapred.period=10 mapred.fileName=/tmp/mrmetrics.log mapred.servers=$Ganglia_Server:9988 # Configuration of the jvm context for null jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 jvm.period=10 jvm.fileName=/tmp/jvmmetrics.log jvm.servers=$GANGLIA_SERVER:9988 -G. [1] - http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228.CHANGES.txt -- If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers... smime.p7s Description: S/MIME cryptographic signature
Re: How to enumerate files in the directories?
@Override public HDFSFile[] getFiles(String directory) { String result = executeCommand(hadoop fs -ls + directory); String[] items = result.split(\n); ListHDFSFile holder = new ArrayListHDFSFile(); for (int i = 1; i items.length; i++) { String item = items[i]; if (item.length() MIN__FILE_LENGTH) { try { holder.add(new HDFSFile(item)); } catch (Exception e) { } } } HDFSFile[] ret = new HDFSFile[holder.size()]; holder.toArray(ret); return ret; } On Wed, Aug 25, 2010 at 12:36 AM, Denim Live denim.l...@yahoo.com wrote: Hello, how can one determine the names of the files in a particular hadoop directory, programmatically? -- Steven M. Lewis PhD Institute for Systems Biology Seattle WA
Searching more Hadoop-Common content
Hello guys, Over at http://search-hadoop.com we index Hadoop-Common subprojects mailing lists, wiki, web site, source code, javadoc, jira... Would the community be interested in a patch that replaces the Google-powered search with that from search-hadoop.com, set to search only Hadoop-Common project by default? We look into adding this search service for all Hadoop's sub-projects. Assuming people are for this, any suggestions for how the search should function by default or any specific instructions for how the search box should be modified would be great! Thank you, Alex Baranau. P.S. HBase community already accepted our proposal (please refer to https://issues.apache.org/jira/browse/HBASE-2886) and new version (0.90) will include new search box. Also the patch is available for TIKA (we are in the process of discussing some details now): https://issues.apache.org/jira/browse/TIKA-488. Hadoop-Common's site looks much like Avro's for which we also created patch recently ( https://issues.apache.org/jira/browse/AVRO-626).
Custom partitioner for hadoop
I came across the tutorial on creating a custom partitioner on Hadoop ( http://philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/) I am trying to create my own partitioner on Hadoop, and the above blog has given me a good starting point. I had a question on the partitioner. In the code given in the blog they have: if( nbOccurences 3 ) return 0; else return 1; I want to do something similar, but I need the key to be in a range, like following: if(nbOccurenceslbrange0 nbOccurences ubrange0 ) return 0; if(nbOccurenceslbrange1 nbOccurences ubrange1 ) return 1; The range boundaries lbrange0, lbrange1, ubrange0, ubrange1 are calculated by reading a histogram that is stored on the HDFS. I initially thought I can read the histogram from the customPartitioner class and calculate the range boundaries, but then in this case the ranges get recalculated for every K,V pair emitted by the mapper. In order to avoid this I was thinking of passing the range boundaries to the partitioner. How would I do that? Is there an alternative? Any suggestion would prove useful. Thank you, Mithila Ph.D. Candidate, C.S., Arizona State University
Re: Hadoop startup problem - directory name required
Hmm. Without the / in the property tag, isn't the file malformed XML ? I am pretty sure Hadoop complains in such cases ? On Wed, Aug 25, 2010 at 4:44 AM, cliff palmer palmercl...@gmail.com wrote: Thanks Allen - that has resolved the problem. Good catch! Cliff On Tue, Aug 24, 2010 at 3:05 PM, Allen Wittenauer awittena...@linkedin.comwrote: On Aug 23, 2010, at 6:49 AM, cliff palmer wrote: Thanks Harsh, but I am still not sure I understand what is going on. The directory specified in the dfs.name.dir property, /var/lib/hadoop-0.20/dfsname, does exist and rights to that directory have been granted to the OS user that is running the Hadoop startup script. The directory mentioned in the error message is /var/lib/hadoop-0.20/cache/hadoop/dfs/name. I can create this directory and that would (I assume) remove the error, but I want to understand how the name is derived. From here: property namehadoop.tmp.dir/name value/var/lib/hadoop-0.20/cache/hadoop/value /property /configuration because: property namedfs.name.dir/name value/DFS/dfsname,/var/lib/hadoop-0.20/dfsname/value property is missing a / in the property line.
Re: Hadoop startup problem - directory name required
No complaints from hadoop, other than the error for the mangled directory name. On Wed, Aug 25, 2010 at 2:04 PM, Hemanth Yamijala yhema...@gmail.comwrote: Hmm. Without the / in the property tag, isn't the file malformed XML ? I am pretty sure Hadoop complains in such cases ? On Wed, Aug 25, 2010 at 4:44 AM, cliff palmer palmercl...@gmail.com wrote: Thanks Allen - that has resolved the problem. Good catch! Cliff On Tue, Aug 24, 2010 at 3:05 PM, Allen Wittenauer awittena...@linkedin.comwrote: On Aug 23, 2010, at 6:49 AM, cliff palmer wrote: Thanks Harsh, but I am still not sure I understand what is going on. The directory specified in the dfs.name.dir property, /var/lib/hadoop-0.20/dfsname, does exist and rights to that directory have been granted to the OS user that is running the Hadoop startup script. The directory mentioned in the error message is /var/lib/hadoop-0.20/cache/hadoop/dfs/name. I can create this directory and that would (I assume) remove the error, but I want to understand how the name is derived. From here: property namehadoop.tmp.dir/name value/var/lib/hadoop-0.20/cache/hadoop/value /property /configuration because: property namedfs.name.dir/name value/DFS/dfsname,/var/lib/hadoop-0.20/dfsname/value property is missing a / in the property line.
Re: How to enumerate files in the directories?
You should use the FileStatus API to access file metadata. See below a example. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSt atus.html Configuration conf = new Configuration(); // takes default conf FileSystem fs = FileSystem.get(conf); Path dir = new Path(/dir); FileStatus[] stats = fs.listStatus(dir); foreach(FileStatus stat : stats) { stat.getPath().toUri().getPath(); // gives directory name stat.getModificationTime(); stat.getReplication(); stat.getBlockSize(); stat.getOwner(); stat.getGroup(); stat.getPermission().toString(); } From: Denim Live denim.l...@yahoo.com Date: Wed, 25 Aug 2010 07:36:11 + (GMT) To: common-user@hadoop.apache.org Subject: How to enumerate files in the directories? Hello, how can one determine the names of the files in a particular hadoop directory, programmatically? iCrossing Privileged and Confidential Information This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Re: Custom partitioner for hadoop
On 08/25/2010 12:40 PM, Mithila Nagendra wrote: In order to avoid this I was thinking of passing the range boundaries to the partitioner. How would I do that? Is there an alternative? Any suggestion would prove useful. We use a custom partitioner, for which we pass in configuration data that gets used in the partitioning calculations. We do it by making the Partitioner implement Configurable, and then grab the needed config data from the configuration object that we're given. (We set the needed config data on the config object when we submit the job). i.e., like so: import org.apache.hadoop.mapreduce.Partitioner; import org.apache.hadoop.conf.Configurable; import org.apache.hadoop.conf.Configuration; public class OurPartitioner extends PartitionerBytesWritable, Writable implements Configurable { ... public int getPartition(BytesWritable key, Writable value, int numPartitions) { ... } public Configuration getConf() { return conf; } public void setConf(Configuration conf) { this.conf = conf; configure(); } @SuppressWarnings(unchecked) private void configure() throws IOException { String parmValue = conf.get(parmKey); if (parmValue == null) { throw new RuntimeException(.); } } private Configuration conf; } HTH, DR
Re: Custom partitioner for hadoop
In which of the three functions would I have to set the ranges? In the configure function? Would the configure be called once for every mapper? Thank you! On Wed, Aug 25, 2010 at 12:50 PM, David Rosenstrauch dar...@darose.netwrote: On 08/25/2010 12:40 PM, Mithila Nagendra wrote: In order to avoid this I was thinking of passing the range boundaries to the partitioner. How would I do that? Is there an alternative? Any suggestion would prove useful. We use a custom partitioner, for which we pass in configuration data that gets used in the partitioning calculations. We do it by making the Partitioner implement Configurable, and then grab the needed config data from the configuration object that we're given. (We set the needed config data on the config object when we submit the job). i.e., like so: import org.apache.hadoop.mapreduce.Partitioner; import org.apache.hadoop.conf.Configurable; import org.apache.hadoop.conf.Configuration; public class OurPartitioner extends PartitionerBytesWritable, Writable implements Configurable { ... public int getPartition(BytesWritable key, Writable value, int numPartitions) { ... } public Configuration getConf() { return conf; } public void setConf(Configuration conf) { this.conf = conf; configure(); } @SuppressWarnings(unchecked) private void configure() throws IOException { String parmValue = conf.get(parmKey); if (parmValue == null) { throw new RuntimeException(.); } } private Configuration conf; } HTH, DR
Re: Custom partitioner for hadoop
If you define a Hadoop object as implementing Configurable, then its setConf() method will be called once, right after it gets instantiated. So each partitioner that gets instantiated will have its setConf() method called right afterwards. I'm taking advantage of that fact by calling my own (private) configure() method when the Partitioner gets its configuration. So in that configure method, you would grab the ranges from out of the configuration object. The flip side of this is that your ranges won't just magically appear in the configuration object. You'll have to set them on the configuration object used in the Job that you're submitting. A copy of the job's config object will then get passed to each task in your job, which you can then use to configure that task. HTH, DR On 08/25/2010 04:23 PM, Mithila Nagendra wrote: In which of the three functions would I have to set the ranges? In the configure function? Would the configure be called once for every mapper? Thank you! On Wed, Aug 25, 2010 at 12:50 PM, David Rosenstrauchdar...@darose.netwrote: On 08/25/2010 12:40 PM, Mithila Nagendra wrote: In order to avoid this I was thinking of passing the range boundaries to the partitioner. How would I do that? Is there an alternative? Any suggestion would prove useful. We use a custom partitioner, for which we pass in configuration data that gets used in the partitioning calculations. We do it by making the Partitioner implement Configurable, and then grab the needed config data from the configuration object that we're given. (We set the needed config data on the config object when we submit the job).
Re: How to enumerate files in the directories?
I would use the FileSystem API. Here is a QD example import java.io.*; import java.util.*; import java.lang.*; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.FileStatus; public class dirc { public static void main ( String args[]) { try { String dirname = args[0]; Configuration conf = new Configuration(true); FileSystem fs = FileSystem.get(conf); Path path = new Path(dirname); FileStatus fstatus[] = fs.listStatus(path); for ( FileStatus f: fstatus ) { System.out.println(f.getPath().toUri().getPath()); } }catch ( IOException e ) { System.out.println(Usage dirc directory ); return ; } catch (ArrayIndexOutOfBoundsException e) { System.out.println(Usage dirc directory ); return ; } } } From: Steve Lewis lordjoe2...@gmail.com To: common-user@hadoop.apache.org Sent: Wed, August 25, 2010 9:04:41 AM Subject: Re: How to enumerate files in the directories? @Override public HDFSFile[] getFiles(String directory) { String result = executeCommand(hadoop fs -ls + directory); String[] items = result.split(\n); ListHDFSFile holder = new ArrayListHDFSFile(); for (int i = 1; i items.length; i++) { String item = items[i]; if (item.length() MIN__FILE_LENGTH) { try { holder.add(new HDFSFile(item)); } catch (Exception e) { } } } HDFSFile[] ret = new HDFSFile[holder.size()]; holder.toArray(ret); return ret; } On Wed, Aug 25, 2010 at 12:36 AM, Denim Live denim.l...@yahoo.com wrote: Hello, how can one determine the names of the files in a particular hadoop directory, programmatically? -- Steven M. Lewis PhD Institute for Systems Biology Seattle WA
Hadoop Benchmark Results
Hi, I'm reading Hadoop Definitive Guide book. I've tried to benchmark my hadoop cluster. I got some results but when i compared them, i was shocked. Because there were several interesting difference. I don't understand is it good or bad? Please help me Regards... Writing Test hadoop jar hadoop-0.20.2-test.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 10/08/25 23:49:40 INFO mapred.FileInputFormat: - TestDFSIO - : write 10/08/25 23:49:40 INFO mapred.FileInputFormat:Date time: Wed Aug 25 23:49:40 EEST 2010 10/08/25 23:49:40 INFO mapred.FileInputFormat:Number of files: 10 10/08/25 23:49:40 INFO mapred.FileInputFormat: Total MBytes processed: 1 10/08/25 23:49:40 INFO mapred.FileInputFormat: Throughput mb/sec: 36.09482833299645 10/08/25 23:49:40 INFO mapred.FileInputFormat: Average IO rate mb/sec: 49.026153564453125 10/08/25 23:49:40 INFO mapred.FileInputFormat: IO rate std deviation: 22.15250292439401 10/08/25 23:49:40 INFO mapred.FileInputFormat: Test exec time sec: 175.537 Reading Test hadoop jar hadoop-0.20.2-test.jar TestDFSIO -read -nrFiles 10 -fileSize 1000 10/08/25 23:54:11 INFO mapred.FileInputFormat: - TestDFSIO - : read 10/08/25 23:54:11 INFO mapred.FileInputFormat:Date time: Wed Aug 25 23:54:11 EEST 2010 10/08/25 23:54:11 INFO mapred.FileInputFormat:Number of files: 10 10/08/25 23:54:11 INFO mapred.FileInputFormat: Total MBytes processed: 1 10/08/25 23:54:11 INFO mapred.FileInputFormat: Throughput mb/sec: 152.87948510189418 10/08/25 23:54:11 INFO mapred.FileInputFormat: Average IO rate mb/sec: 152.8846893310547 10/08/25 23:54:11 INFO mapred.FileInputFormat: IO rate std deviation: 0.8895501092647955 10/08/25 23:54:11 INFO mapred.FileInputFormat: Test exec time sec: 61.618
svn/git revisions for 0.20.2
Hey folks, can somebody tell me how to get the source versions from git/svn for hadoop-hdfs and hadoop-mapreduce ? In hadoop-common there are branches and tags for the release. But how to get the corresponding version of the other 2 projects ? any help would be appreciated! Johannes
quota?
Is it possible to tell hadoop to restrict space usage of a specific dfs folder in the cluster, e.g. a user home directory (/user/accountA in dfs)? And is there a way to restrict the size of map/reduce output that can be saved to dfs? E.g. if a job creates over-limit data, it won't be allowed to save the result to the dfs. Thanks, Michael
Re: quota?
Refer to http://hadoop.apache.org/common/docs/r0.20.0/hdfs_quota_admin_guide.html#Space+Quotas On Wed, Aug 25, 2010 at 3:43 PM, jiang licht licht_ji...@yahoo.com wrote: Is it possible to tell hadoop to restrict space usage of a specific dfs folder in the cluster, e.g. a user home directory (/user/accountA in dfs)? And is there a way to restrict the size of map/reduce output that can be saved to dfs? E.g. if a job creates over-limit data, it won't be allowed to save the result to the dfs. Thanks, Michael
Re: quota?
Thanks, Ted. Michael --- On Wed, 8/25/10, Ted Yu yuzhih...@gmail.com wrote: From: Ted Yu yuzhih...@gmail.com Subject: Re: quota? To: common-user@hadoop.apache.org Date: Wednesday, August 25, 2010, 5:47 PM Refer to http://hadoop.apache.org/common/docs/r0.20.0/hdfs_quota_admin_guide.html#Space+Quotas On Wed, Aug 25, 2010 at 3:43 PM, jiang licht licht_ji...@yahoo.com wrote: Is it possible to tell hadoop to restrict space usage of a specific dfs folder in the cluster, e.g. a user home directory (/user/accountA in dfs)? And is there a way to restrict the size of map/reduce output that can be saved to dfs? E.g. if a job creates over-limit data, it won't be allowed to save the result to the dfs. Thanks, Michael
Re: svn/git revisions for 0.20.2
On Aug 25, 2010, at 3:20 PM, Johannes Zillmann wrote: Hey folks, can somebody tell me how to get the source versions from git/svn for hadoop-hdfs and hadoop-mapreduce ? In hadoop-common there are branches and tags for the release. But how to get the corresponding version of the other 2 projects ? 0.20 was pre-project split, so common included hdfs and mapreduce. -- Owen
data node disk usage per volume?
dfs.datanode.du.reserved can limit the size of space per volume that can be used by a data node. I have 2 related questions. 1. How volume is defined here? Say I have 2 folders listed for dfs.data.dir and each resides on a different disk. By setting dfs.datanode.du.reserved to N, does it mean 2N bytes reserved for non-dfs use, with N bytes on each disk? 2. Is it possible that we can reserve different size of space to non-dfs use on different disks? Thanks, Michael
Basic question
job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Does this mean the input to the reducer should be Text/IntWritable or the output of the reducer is Text/IntWritable? What is the inverse of this.. setInputKeyClass/setInputValueClass? Is this inferred by the JobInputFormatClass? Would someone mind briefly explaining? Thanks
Re: Basic question
The output of the reducer is Text/IntWritable. To set the input to the reducer you set the mapper output classes. Cheers James Sent from my mobile. Please excuse the typos. On 2010-08-25, at 8:13 PM, Mark static.void@gmail.com wrote: job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Does this mean the input to the reducer should be Text/IntWritable or the output of the reducer is Text/IntWritable? What is the inverse of this.. setInputKeyClass/setInputValueClass? Is this inferred by the JobInputFormatClass? Would someone mind briefly explaining? Thanks
data in compression format affect mapreduce speed
will data stored in compression format affect mapreduce job speed? increase or decrease? or more complex relationship between these two ? can anybody give some explanation in detail? 2010-08-26 shangan
Re: How can I run the other test cases except those defined in CoreTestDriver?
Anyone can help me with this? On Sun, Aug 22, 2010 at 1:18 PM, Min Zhou coderp...@gmail.com wrote: Hi all, When I run the command bin/hadoop jar hadoop-common-test-*.jar org.apache.hadoop.io.file. tfile.TestTFileSeqFileComparison a prompt is showed like below Valid program names are: testarrayfile: A test for flat files of binary key/value pairs. testipc: A test for ipc. testrpc: A test for rpc. testsetfile: A test for flat files of binary key/value pairs. AFAIK, this script finds the Main-class from a jar file essentially through RunJar, which firstly lookup the mainfest to decide whether this jar has defined a main class or not. If there is no defination, the first one of the uneaten arguments from the command line will be considered as the main class's name. But org/apache/hadoop/test/CoreTestDriver has been taken as the main class of hadoop test in build.xml. Therefore, I can only run 4 testcases ( testarrayfile, testipc, testrpc, testsetfile). How can I run the other test cases except those defined in that class? By running ant test -Dxxx ? Thanks, Min -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
Re: data in compression format affect mapreduce speed
Compressed data would increase processing time in mapper/reducer but decrease the amount of data transferred between tasktracker nodes. Normally you should consider applying some form of compression. On Wed, Aug 25, 2010 at 7:32 PM, shangan shan...@corp.kaixin001.com wrote: will data stored in compression format affect mapreduce job speed? increase or decrease? or more complex relationship between these two ? can anybody give some explanation in detail? 2010-08-26 shangan
JIRA down
JIRA seems to be down FYI. Database errors are being returned: *Cause: * org.apache.commons.lang.exception.NestableRuntimeException: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: Unable to esablish a connection with the database. (FATAL: database is not accepting commands to avoid wraparound data loss in database postgres) *Stack Trace: * [hide] org.apache.commons.lang.exception.NestableRuntimeException: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: Unable to esablish a connection with the database. (FATAL: database is not accepting commands to avoid wraparound data loss in database postgres) at com.atlassian.jira.web.component.TableLayoutFactory.getUserColumns(TableLayoutFactory.java:239) at com.atlassian.jira.web.component.TableLayoutFactory.getStandardLayout(TableLayoutFactory.java:42) at org.apache.jsp.includes.navigator.table_jsp._jspService(table_jsp.java:178) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:374) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:342) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:267)
Re: JIRA down
In case you need to access JIRA tonight, google JIRA number and click on Cached link. You would see: http://webcache.googleusercontent.com/search?q=cache:Tgi71phHrUoJ:https://issues.apache.org/jira/browse/HBASE-2893+hbase+metadata+layercd=4hl=enct=clnkgl=usclient=firefox-a On Wed, Aug 25, 2010 at 8:47 PM, Bill Graham billgra...@gmail.com wrote: JIRA seems to be down FYI. Database errors are being returned: *Cause: * org.apache.commons.lang.exception.NestableRuntimeException: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: Unable to esablish a connection with the database. (FATAL: database is not accepting commands to avoid wraparound data loss in database postgres) *Stack Trace: * [hide] org.apache.commons.lang.exception.NestableRuntimeException: com.atlassian.jira.exception.DataAccessException: org.ofbiz.core.entity.GenericDataSourceException: Unable to esablish a connection with the database. (FATAL: database is not accepting commands to avoid wraparound data loss in database postgres) at com.atlassian.jira.web.component.TableLayoutFactory.getUserColumns(TableLayoutFactory.java:239) at com.atlassian.jira.web.component.TableLayoutFactory.getStandardLayout(TableLayoutFactory.java:42) at org.apache.jsp.includes.navigator.table_jsp._jspService(table_jsp.java:178) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:374) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:342) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:267)
Re: data in compression format affect mapreduce speed
Logically it 'should' increase time as its an extra step beyond the Mapper/Reducer. But while your processing time would slightly (very very slightly) increase, your IO and Network Transfers time would decrease by a large margin -- giving you a clear impression that your total job time has decreased overall. The difference being in writing out say 10 GB before, and writing out 5-7 GB this time (a crude example). With the fast CPUs available these days, compressing and decompressing should hardly take a noticeable amount of extra time. Its almost negligible in case of using gzip, lzo or plain deflate. On Thu, Aug 26, 2010 at 9:13 AM, Ted Yu yuzhih...@gmail.com wrote: Compressed data would increase processing time in mapper/reducer but decrease the amount of data transferred between tasktracker nodes. Normally you should consider applying some form of compression. On Wed, Aug 25, 2010 at 7:32 PM, shangan shan...@corp.kaixin001.com wrote: will data stored in compression format affect mapreduce job speed? increase or decrease? or more complex relationship between these two ? can anybody give some explanation in detail? 2010-08-26 shangan -- Harsh J www.harshj.com
Re: Re: data in compression format affect mapreduce speed
I agree with you on the most part. But I have some other questions. mapper are working on local machine so there's no network transfers during this process, if the original data stored in hdfs is compressed it will only decrease the IO time. One major point is I doubt whether the mapper can deal with only part of the whole data if the data is compressed which seems can't be split ? I've try to do a select sum() in hive and trace the job, it seems the .tar.gz data can only worked on one single matchine and stuck there for quite a long time(seems like need to wait other part of data be copied from other machines),while other data not compressed can work on different machines parallelly. Do you know something about this ? 2010-08-26 shangan 发件人: Harsh J 发送时间: 2010-08-26 12:15:49 收件人: common-user 抄送: 主题: Re: data in compression format affect mapreduce speed Logically it 'should' increase time as its an extra step beyond the Mapper/Reducer. But while your processing time would slightly (very very slightly) increase, your IO and Network Transfers time would decrease by a large margin -- giving you a clear impression that your total job time has decreased overall. The difference being in writing out say 10 GB before, and writing out 5-7 GB this time (a crude example). With the fast CPUs available these days, compressing and decompressing should hardly take a noticeable amount of extra time. Its almost negligible in case of using gzip, lzo or plain deflate. On Thu, Aug 26, 2010 at 9:13 AM, Ted Yu yuzhih...@gmail.com wrote: Compressed data would increase processing time in mapper/reducer but decrease the amount of data transferred between tasktracker nodes. Normally you should consider applying some form of compression. On Wed, Aug 25, 2010 at 7:32 PM, shangan shan...@corp.kaixin001.com wrote: will data stored in compression format affect mapreduce job speed? increase or decrease? or more complex relationship between these two ? can anybody give some explanation in detail? 2010-08-26 shangan -- Harsh J www.harshj.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5397 (20100825) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com