Re: Writable questions
Wait- you want a print-to-user method or a 'serialize/deserialize' method? On Tue, Aug 31, 2010 at 2:42 PM, David Rosenstrauch dar...@darose.net wrote: On 08/31/2010 02:09 PM, Mark wrote: On 8/31/10 10:07 AM, David Rosenstrauch wrote: On 08/31/2010 12:58 PM, Mark wrote: I have a question regarding outputting Writable objects. I thought all Writables know how to serialize themselves to output. For example I have an ArrayWritable of strings (or Texts) but when I output it to a file it shows up as 'org.apache.hadoop.io.arraywrita...@21f7186f' Am I missing something? I would have expected it to output String1 String2 String3 etc. If I am going about this the wrong way can someone explain the proper way for my reduce phase to output a key and a list of values. Thanks Writables know how to serialize and deserialize themselves (i.e., to a binary I/O stream). But that doesn't necessarily mean that they have a toString method for generating human-readable output. DR Ok that makes sense. How would I go about outputing an ArrayWritable then? Use a StringBuilder? Hmmm Maybe something like this? Arrays.toString((TheArrayElementClass[])ArrayWritable.toArray()) HTH, DR -- Lance Norskog goks...@gmail.com
Re: DataDrivenInputFormat setInput with boundingQuery
Thank you for mentioning this problem- it's something fairly mysterious to me. On Tue, Aug 31, 2010 at 8:06 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Tue, Aug 31, 2010 at 10:32 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I am working with DataDrivenOutputFormat from trunk. None of the unit tests seem to test the bounded queries Configuration conf = new Configuration(); Job job = new Job(conf); job.setJarByClass(TestZ.class); job.setInputFormatClass(DataDrivenDBInputFormat.class); job.setMapperClass(PrintlnMapper.class); job.setOutputFormatClass(NullOutputFormat.class); job.setMapOutputKeyClass(NullWritable.class); job.setMapOutputValueClass(NullDBWritable.class); job.setOutputKeyClass(NullWritable.class); job.setOutputValueClass(NullWritable.class); job.setNumReduceTasks(0); job.getConfiguration().setInt(mapreduce.map.tasks, 2); DBConfiguration.configureDB(conf, com.mysql.jdbc.Driver, jdbc:mysql://localhost:3306/test, null, null); DataDrivenDBInputFormat.setInput(job, NullDBWritable.class, SELECT * FROM name WHERE $CONDITIONS, SELECT MIN(id),MAX(id) FROM name); int ret = job.waitForCompletion(true) ? 0 : 1; Exception in thread main java.lang.RuntimeException: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:165) Can someone tell me what I am missing here? Thanks, Edward Nevermind DBConfiguration.configureDB(job.getConfiguration(), com.mysql.jdbc.Driver, jdbc:mysql://localhost:3306/test, null, null); That is 4 hours of my life. I won't get back. -- Lance Norskog goks...@gmail.com
Over-replication in Hadoop
I configured a cluster in Hadoop (1 master and 2 slaves) and replication factor = 3: property namedfs.replication/name value3/value /property Is Hadoop aware of the over-replication setting the real replication factor to 2? Thanks -- View this message in context: http://old.nabble.com/Over-replication-in-Hadoop-tp29602656p29602656.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Do I need to write a RawComparator if my custom writable is not used as a Key?
Hello, Do I need to write a RawComparator if my custom writable is not used as a Key to improve performance? Regards, Vitaliy S
Re: Do I need to write a RawComparator if my custom writable is not used as a Key?
No, RawComparator is only needed for Keys. -- Owen On Sep 2, 2010, at 3:35, Vitaliy Semochkin vitaliy...@gmail.com wrote: Hello, Do I need to write a RawComparator if my custom writable is not used as a Key to improve performance? Regards, Vitaliy S
Re: Writable questions
On 9/1/10 11:28 PM, Lance Norskog wrote: Wait- you want a print-to-user method or a 'serialize/deserialize' method? On Tue, Aug 31, 2010 at 2:42 PM, David Rosenstrauchdar...@darose.net wrote: On 08/31/2010 02:09 PM, Mark wrote: On 8/31/10 10:07 AM, David Rosenstrauch wrote: On 08/31/2010 12:58 PM, Mark wrote: I have a question regarding outputting Writable objects. I thought all Writables know how to serialize themselves to output. For example I have an ArrayWritable of strings (or Texts) but when I output it to a file it shows up as 'org.apache.hadoop.io.arraywrita...@21f7186f' Am I missing something? I would have expected it to output String1 String2 String3 etc. If I am going about this the wrong way can someone explain the proper way for my reduce phase to output a key and a list of values. Thanks Writables know how to serialize and deserialize themselves (i.e., to a binary I/O stream). But that doesn't necessarily mean that they have a toString method for generating human-readable output. DR Ok that makes sense. How would I go about outputing an ArrayWritable then? Use a StringBuilder? Hmmm Maybe something like this? Arrays.toString((TheArrayElementClass[])ArrayWritable.toArray()) HTH, DR I wanted to output an ArrayWritable from my reducer to a human readable format... something like this ArrayWritable values = new ArrayWritable(new String[] { value1, value2, value3}); context.write(new Text(My Key), values); I would have thought it would have output the values with some configurable delimeter.
Why does Generic Options Parser only take the first -D option?
This is 0.20.0 I have an eclipse run configuration passing these as arguments -D hive2rdbms.jdbc.driver=com.mysql.jdbc.Driver -D hive2rdbms.connection.url=jdbc:mysql://localhost:3306/test -D hive2rdbms.data.query=SELECT id,name FROM name WHERE $CONDITIONS -D hive2rdbms.bounding.query=SELECT min(id),max(id) FROM name -D hive2rdbms.output.strategy=HDFS -D hive2rdbms.ouput.hdfs.path=/tmp/a My code does this: public int run(String[] args) throws Exception { conf = getConf(); GenericOptionsParser parser = new GenericOptionsParser(conf,args); for (String arg: parser.getRemainingArgs()){ System.out.println(arg); } hive2rdbms.connection.url=jdbc:mysql://localhost:3306/test -D hive2rdbms.data.query=SELECT id,name FROM name WHERE $CONDITIONS -D hive2rdbms.bounding.query=SELECT min(id),max(id) FROM name -D hive2rdbms.output.strategy=HDFS -D hive2rdbms.ouput.hdfs.path=/tmp/a 10/09/02 13:04:04 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= Exception in thread main java.io.IOException: hive2rdbms.connection.url not specified at com.media6.hive2rdbms.job.Rdbms2Hive.checkArgs(Rdbms2Hive.java:70) at com.media6.hive2rdbms.job.Rdbms2Hive.run(Rdbms2Hive.java:46) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.media6.hive2rdbms.job.Rdbms2Hive.main(Rdbms2Hive.java:145) So what gives does GenericOptionsParser only take hadoop arguments like mapred.map.tasks? If so how come it sucks up my first -D argument and considers the other ones Remaining Arguments. Any ideas?
Re: Why does Generic Options Parser only take the first -D option?
I checked GenericOptionsParser from 0.20.2 processGeneralOptions() should be able to process all -D options: if (line.hasOption('D')) { * String[] property = line.getOptionValues('D'); * for(String prop : property) { String[] keyval = prop.split(=, 2); if (keyval.length == 2) { conf.set(keyval[0], keyval[1]); } } } You can add a log after the bold line to verify that all -D options are returned. On Thu, Sep 2, 2010 at 10:09 AM, Edward Capriolo edlinuxg...@gmail.comwrote: This is 0.20.0 I have an eclipse run configuration passing these as arguments -D hive2rdbms.jdbc.driver=com.mysql.jdbc.Driver -D hive2rdbms.connection.url=jdbc:mysql://localhost:3306/test -D hive2rdbms.data.query=SELECT id,name FROM name WHERE $CONDITIONS -D hive2rdbms.bounding.query=SELECT min(id),max(id) FROM name -D hive2rdbms.output.strategy=HDFS -D hive2rdbms.ouput.hdfs.path=/tmp/a My code does this: public int run(String[] args) throws Exception { conf = getConf(); GenericOptionsParser parser = new GenericOptionsParser(conf,args); for (String arg: parser.getRemainingArgs()){ System.out.println(arg); } hive2rdbms.connection.url=jdbc:mysql://localhost:3306/test -D hive2rdbms.data.query=SELECT id,name FROM name WHERE $CONDITIONS -D hive2rdbms.bounding.query=SELECT min(id),max(id) FROM name -D hive2rdbms.output.strategy=HDFS -D hive2rdbms.ouput.hdfs.path=/tmp/a 10/09/02 13:04:04 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= Exception in thread main java.io.IOException: hive2rdbms.connection.url not specified at com.media6.hive2rdbms.job.Rdbms2Hive.checkArgs(Rdbms2Hive.java:70) at com.media6.hive2rdbms.job.Rdbms2Hive.run(Rdbms2Hive.java:46) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.media6.hive2rdbms.job.Rdbms2Hive.main(Rdbms2Hive.java:145) So what gives does GenericOptionsParser only take hadoop arguments like mapred.map.tasks? If so how come it sucks up my first -D argument and considers the other ones Remaining Arguments. Any ideas?
Exception while archiving
Hi , I am trying to archive a folder containing text files using following command, from haddop home dir. ./bin/hadoop archive -archiveName xxx.har /root/new/* /root/ and reciveing the following output : ___ Exception null __ any idea what is going wrong? is there any log which i can check for details? regards ranjib
Re: Writable questions
You could use the standard List.toString() method which does a nice job of printing something like this; (A1,A2,A3) assuming the objects contained in the list implement toString() to something you'd want to see. Use in conjuction with java.util.Arrays.asList() and the ArrayWriteable.toStrings() like this: ArrayWritable values = new ArrayWritable(new String[] { value1, value2, value3}); String humanReadableString = Arrays.asList(values.toStrings()).toString(); Steve On Thu, Sep 2, 2010 at 9:34 AM, Mark static.void@gmail.com wrote: On 9/1/10 11:28 PM, Lance Norskog wrote: Wait- you want a print-to-user method or a 'serialize/deserialize' method? On Tue, Aug 31, 2010 at 2:42 PM, David Rosenstrauchdar...@darose.net wrote: On 08/31/2010 02:09 PM, Mark wrote: On 8/31/10 10:07 AM, David Rosenstrauch wrote: On 08/31/2010 12:58 PM, Mark wrote: I have a question regarding outputting Writable objects. I thought all Writables know how to serialize themselves to output. For example I have an ArrayWritable of strings (or Texts) but when I output it to a file it shows up as 'org.apache.hadoop.io.arraywrita...@21f7186f' Am I missing something? I would have expected it to output String1 String2 String3 etc. If I am going about this the wrong way can someone explain the proper way for my reduce phase to output a key and a list of values. Thanks Writables know how to serialize and deserialize themselves (i.e., to a binary I/O stream). But that doesn't necessarily mean that they have a toString method for generating human-readable output. DR Ok that makes sense. How would I go about outputing an ArrayWritable then? Use a StringBuilder? Hmmm Maybe something like this? Arrays.toString((TheArrayElementClass[])ArrayWritable.toArray()) HTH, DR I wanted to output an ArrayWritable from my reducer to a human readable format... something like this ArrayWritable values = new ArrayWritable(new String[] { value1, value2, value3}); context.write(new Text(My Key), values); I would have thought it would have output the values with some configurable delimeter.