Re: mini node in a cluster
Hi Pat, Sounds like you would just turn off the datanode and the tasktracker. Your config will still point to the Namenode and JT, so you can still launch jobs and read/write from HDFS. You'll probably want to replicate the data off first of course. Thanks, Tom On Mon, Jun 4, 2012 at 2:06 PM, Pat Ferrel p...@occamsmachete.com wrote: I have a machine that is part of the cluster but I'd like to dedicate it to being the web server and run the db but still have access to starting jobs and getting data out of hdfs. In other words I'd like to have the cores, memory, and disk only minimally affected by running jobs on the cluster yet still have easy access when I need to get data out. I assume I can do something like set the max number of jobs for the node to 0 and something similar for hdfs? Is there a recommended way to go about this?
Re: mini node in a cluster
Hi Pat, Sounds like the trick. This node is a slave so it's datanode and tasktracker are started from the master. - how do I start the cluster without starting the datanode and the tasktracker on the mini-node slave? Remove it from slaves? There's no main cluster software, just don't start those services. If you're on Linux and have init.d scripts, look for the ones that are appended with datanode and tasktracker. - what do I minimally need to start on the mini-node? Nothing except the hadoop jars. The presence of the config files in your CLASSPATH is all you need to talk to your cluster. So, if you can run hadoop dfs -ls /some/path/in/hdfs and it succeeds, you're probably OK. Also I have replication set to 2 so the data will just get re-replicated once the mini-node is reconfigured, right? There should be another copy somewhere on the cluster. Probably. It's not really a mini-node, it's really just a client at this point, it's not known by your cluster. You could configure your laptop or any other machine to do the same thing, for example. Thanks, Tom
Re: how to unit test my RawComparator
Hi Chris and all, hope you don't mind if I inject a question in here. It's highly related IMO (famous last words). On Sat, Mar 31, 2012 at 2:18 PM, Chris White chriswhite...@gmail.com wrote: You can serialize your Writables to a ByteArrayOutputStream and then get it's underlying byte array: ByteArrayOutputStream baos = new ByteArrayOutputStream(); DataOutputStream dos = new DataOutputStream(baos); Writable myWritable = new Text(text); myWritable.write(dos); byte[] bytes = baos.toByteArray(); I popped in this into a quick test and it failed. What I want are the exact bytes back from the Writable (in my case, BytesWritable). So, this fails for me: @Test public void byteswritabletest() { ByteArrayOutputStream baos = new ByteArrayOutputStream(); DataOutputStream dos = new DataOutputStream(baos); BytesWritable myBW = new BytesWritable(test.getBytes()); try { myBW.write(dos); } catch (IOException e) { e.printStackTrace(); } byte[] bytes = baos.toByteArray(); assertEquals(test.getBytes().length, bytes.length); //I get expected: 4, actual 8 with this assertion } I see that in new versions of Text and BytesWritable, there is a .copyBytes() method that is available that gives us that. https://reviews.apache.org/r/182/diff/ Is there another way (without the upgrade) to achieve that? Thanks, Tom
Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec
Sounds like mapred.task.timeout? The default is 10 minutes. http://hadoop.apache.org/common/docs/current/mapred-default.html Thanks, Tom On Wed, Jan 18, 2012 at 2:05 PM, Steve Lewis lordjoe2...@gmail.com wrote: The map tasks fail timing out after 600 sec. I am processing one 9 GB file with 16,000,000 records. Each record (think is it as a line) generates hundreds of key value pairs. The job is unusual in that the output of the mapper in terms of records or bytes orders of magnitude larger than the input. I have no idea what is slowing down the job except that the problem is in the writes. If I change the job to merely bypass a fraction of the context.write statements the job succeeds. This is one map task that failed and one that succeeded - I cannot understand how a write can take so long or what else the mapper might be doing JOB FAILED WITH TIMEOUT *Parser*TotalProteins90,103NumberFragments10,933,089 *FileSystemCounters*HDFS_BYTES_READ67,245,605FILE_BYTES_WRITTEN444,054,807 *Map-Reduce Framework*Combine output records10,033,499Map input records 90,103Spilled Records10,032,836Map output bytes3,520,182,794Combine input records10,844,881Map output records10,933,089 Same code but fewer writes JOB SUCCEEDED *Parser*TotalProteins90,103NumberFragments206,658,758 *FileSystemCounters*FILE_BYTES_READ111,578,253HDFS_BYTES_READ67,245,607 FILE_BYTES_WRITTEN220,169,922 *Map-Reduce Framework*Combine output records4,046,128Map input records90,103Spilled Records4,046,128Map output bytes662,354,413Combine input records4,098,609Map output records2,066,588 Any bright ideas -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Re: Question about accessing another HDFS
I'm hoping there is a better answer, but I'm thinking you could load another configuration file (with B.company in it) using Configuration, grab a FileSystem obj with that and then go forward. Seems like some unnecessary overhead though. Thanks, Tom On Thu, Dec 8, 2011 at 2:42 PM, Frank Astier fast...@yahoo-inc.com wrote: Hi - We have two namenodes set up at our company, say: hdfs://A.mycompany.com hdfs://B.mycompany.com From the command line, I can do: Hadoop fs –ls hdfs://A.mycompany.com//some-dir And Hadoop fs –ls hdfs://B.mycompany.com//some-other-dir I’m now trying to do the same from a Java program that uses the HDFS API. No luck there. I get an exception: “Wrong FS”. Any idea what I’m missing in my Java program?? Thanks, Frank
Re: Hadoop Streaming
So that code 126 should be kicked out by your program - do you know what that means? Your code can read from stdin? Thanks, Tom On Sat, Dec 3, 2011 at 7:09 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: I have the following error in running hadoop streaming, PipeMapRed\.waitOutputThreads(): subprocess failed with code 126 at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) at org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545) at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57) at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36) at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358) at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170) I couldn't find out any other error information. Any help ?
Re: Hadoop Streaming
Hi Daniel, I see from your other thread that your HADOOP script has a line like: #!/bin/shrm -f temp.txt I'm not sure what that is, exactly. I suspect the -f is reading from some file and the while loop you had listed read from stdin it seems. What does your input look like? I think what's happening is that you might be expecting lines of input and you're getting splits. What does your input look like? You might want to try this: -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat Thanks, Tom On Sat, Dec 3, 2011 at 7:22 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Thanks Tom for your reply, I think my code is reading from stdin. Because I tried it locally using the following command and its running: $ bin/hadoop fs -cat /user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt | head -2 | ./HADOOP But when I tried streaming , it failed and gave me the error code 126. Date: Sat, 3 Dec 2011 19:14:20 -0800 Subject: Re: Hadoop Streaming From: t...@supertom.com To: common-user@hadoop.apache.org So that code 126 should be kicked out by your program - do you know what that means? Your code can read from stdin? Thanks, Tom On Sat, Dec 3, 2011 at 7:09 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: I have the following error in running hadoop streaming, PipeMapRed\.waitOutputThreads(): subprocess failed with code 126 at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) at org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545) at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57) at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36) at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358) at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170) I couldn't find out any other error information. Any help ?
Re: Hadoop Streaming
Oh, I see the line wrapped. My bad. Either way, I think the NLineInputFormat is what you need. I'm assuming you want one line of input to execute on one mapper. Thanks, Tom On Sat, Dec 3, 2011 at 7:57 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: TOM, What the HADOOP script do is ...read each line from the STDIN and execute the program pknotsRG. tmp.txt is a temporary file. the script is like this: #!/bin/sh rm -f temp.txt; while read line do echo $line temp.txt; done exec /data/yehdego/hadoop-0.20.2/PKNOTSRG/src/pknotsRG -k 0 -F temp.txt; Date: Sat, 3 Dec 2011 19:49:46 -0800 Subject: Re: Hadoop Streaming From: t...@supertom.com To: common-user@hadoop.apache.org Hi Daniel, I see from your other thread that your HADOOP script has a line like: #!/bin/shrm -f temp.txt I'm not sure what that is, exactly. I suspect the -f is reading from some file and the while loop you had listed read from stdin it seems. What does your input look like? I think what's happening is that you might be expecting lines of input and you're getting splits. What does your input look like? You might want to try this: -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat Thanks, Tom On Sat, Dec 3, 2011 at 7:22 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Thanks Tom for your reply, I think my code is reading from stdin. Because I tried it locally using the following command and its running: $ bin/hadoop fs -cat /user/yehdego/Hadoop-Data-New/RF00171_A.bpseqL3G1_seg_Optimized_Method.txt | head -2 | ./HADOOP But when I tried streaming , it failed and gave me the error code 126. Date: Sat, 3 Dec 2011 19:14:20 -0800 Subject: Re: Hadoop Streaming From: t...@supertom.com To: common-user@hadoop.apache.org So that code 126 should be kicked out by your program - do you know what that means? Your code can read from stdin? Thanks, Tom On Sat, Dec 3, 2011 at 7:09 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: I have the following error in running hadoop streaming, PipeMapRed\.waitOutputThreads(): subprocess failed with code 126 at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) at org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:545) at org\.apache\.hadoop\.streaming\.PipeMapper\.close(PipeMapper\.java:132) at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:57) at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36) at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:358) at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:307) at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170) I couldn't find out any other error information. Any help ?
Re: How do I programmatically get total job execution time?
On Fri, Dec 2, 2011 at 9:57 AM, W.P. McNeill bill...@gmail.com wrote: After my Hadoop job has successfully completed I'd like to log the total amount of time it took. This is the Finished in statistic in the web UI. How do I get this number programmatically? Is there some way I can query the Job object? I didn't see anything in the API documentation. This probably *doesn't* help you, but if you're using (or planning on using) oozie, it has a restful API that can give you this information. Thanks, Tom
questions regarding data storage and inputformat
Hi Folks, I have a bunch of binary files which I've stored in a sequencefile. The name of the file is the key, the data is the value and I've stored them sorted by key. (I'm not tied to using a sequencefile for this). The current test data is only 50MB, but the real data will be 500MB - 1GB. My M/R job requires that it's input be several of these records in the sequence file, which is determined by the key. The sorting mentioned above keeps these all packed together. 1. Any reason not to use a sequence file for this? Perhaps a mapfile? Since I've sorted it, I don't need random accesses, but I do need to be aware of the keys, as I need to be sure that I get all of the relevant keys sent to a given mapper 2. Looks like I want a custom inputformat for this, extending SequenceFileInputFormat. Do you agree? I'll gladly take some opinions on this, as I ultimately want to split the based on what's in the file, which might be a little unorthodox. 3. Another idea might be create separate seq files for chunk of records and make them non-splittable, ensuring that they go to a single mapper. Assuming I can get away with this, see any pros/cons with that approach? Thanks, Tom -- === Skybox is hiring. http://www.skyboximaging.com/careers/jobs
Re: questions regarding data storage and inputformat
3. Another idea might be create separate seq files for chunk of records and make them non-splittable, ensuring that they go to a single mapper. Assuming I can get away with this, see any pros/cons with that approach? Separate sequence files would require the least amount of custom code. Thanks for the response, Joey. So, if I were to do the above, I would still need a custom record reader to put all the keys and values together, right? Thanks, Tom -- === Skybox is hiring. http://www.skyboximaging.com/careers/jobs
Re: Custom FileOutputFormat / RecordWriter
Hi Harsh, Cool, thanks for the details. For anyone interested, with your tip and description I was able to find an example inside the Hadoop in Action (Chapter 7, p168) book. Another question, though, it doesn't look like MultipleOutputs will let me control the filename in a per-key (per map) manner. So, basically, if my map receives a key of mykey, I want my file to be mykey-someotherstuff.foo (this is a binary file). Am I right about this? Thanks, Tom On Tue, Jul 26, 2011 at 1:34 AM, Harsh J ha...@cloudera.com wrote: Tom, What I meant to say was that doing this is well supported with existing API/libraries itself: - The class MultipleOutputs supports providing a filename for an output. See MultipleOutputs.addNamedOutput usage [1]. - The type 'NullWritable' is a special writable that doesn't do anything. So if its configured into the above filename addition as a key-type, and you pass NullWritable.get() as the key in every write operation, you will end up just writing the value part of (key, value). - This way you do not have to write a custom OutputFormat for your use-case. [1] - http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html (Also available for the new API, depending on which version/distribution of Hadoop you are on) On Tue, Jul 26, 2011 at 3:36 AM, Tom Melendez t...@supertom.com wrote: Hi Harsh, Thanks for the response. Unfortunately, I'm not following your response. :-) Could you elaborate a bit? Thanks, Tom On Mon, Jul 25, 2011 at 2:10 PM, Harsh J ha...@cloudera.com wrote: You can use MultipleOutputs (or MultiTextOutputFormat for direct key-file mapping, but I'd still prefer the stable MultipleOutputs). Your sinking Key can be of NullWritable type, and you can keep passing an instance of NullWritable.get() to it in every cycle. This would write just the value, while the filenames are added/sourced from the key inside the mapper code. This, if you are not comfortable writing your own code and maintaining it, I s'pose. Your approach is correct as well, if the question was specifically that. On Tue, Jul 26, 2011 at 1:55 AM, Tom Melendez t...@supertom.com wrote: Hi Folks, Just doing a sanity check here. I have a map-only job, which produces a filename for a key and data as a value. I want to write the value (data) into the key (filename) in the path specified when I run the job. The value (data) doesn't need any formatting, I can just write it to HDFS without modification. So, looking at this link (the Output Formats section): http://developer.yahoo.com/hadoop/tutorial/module5.html Looks like I want to: - create a new output format - override write, tell it not to call writekey as I don't want that written - new getRecordWriter method that use the key as the filename and calls my outputformat Sound reasonable? Thanks, Tom -- === Skybox is hiring. http://www.skyboximaging.com/careers/jobs -- Harsh J -- === Skybox is hiring. http://www.skyboximaging.com/careers/jobs -- Harsh J -- === Skybox is hiring. http://www.skyboximaging.com/careers/jobs
Re: Custom FileOutputFormat / RecordWriter
Hi Robert, In this specific case, that's OK. I'll never write to the same file from two different mappers. Otherwise, think it's cool? I haven't played with the outputformat before. Thanks, Tom On Mon, Jul 25, 2011 at 1:30 PM, Robert Evans ev...@yahoo-inc.com wrote: Tom, That assumes that you will never write to the same file from two different mappers or processes. HDFS currently does not support writing to a single file from multiple processes. --Bobby On 7/25/11 3:25 PM, Tom Melendez t...@supertom.com wrote: Hi Folks, Just doing a sanity check here. I have a map-only job, which produces a filename for a key and data as a value. I want to write the value (data) into the key (filename) in the path specified when I run the job. The value (data) doesn't need any formatting, I can just write it to HDFS without modification. So, looking at this link (the Output Formats section): http://developer.yahoo.com/hadoop/tutorial/module5.html Looks like I want to: - create a new output format - override write, tell it not to call writekey as I don't want that written - new getRecordWriter method that use the key as the filename and calls my outputformat Sound reasonable? Thanks, Tom -- === Skybox is hiring. http://www.skyboximaging.com/careers/jobs -- === Skybox is hiring. http://www.skyboximaging.com/careers/jobs
Re: Custom FileOutputFormat / RecordWriter
Hi Bobby, Yeah, that won't be a big deal in this case. It will create about 40 files, each about 60MB each. This job is kind of an odd one that won't be run very often. Thanks, Tom On Mon, Jul 25, 2011 at 1:34 PM, Robert Evans ev...@yahoo-inc.com wrote: Tom, I also forgot to mention that if you are writing to lots of little files it could cause issues too. HDFS is designed to handle relatively few BIG files. There is some work to improve this, but it is still a ways off. So it is likely going to be very slow and put a big load on the namenode if you are going to create lot of small files using this method. --Bobby On 7/25/11 3:30 PM, Robert Evans ev...@yahoo-inc.com wrote: Tom, That assumes that you will never write to the same file from two different mappers or processes. HDFS currently does not support writing to a single file from multiple processes. --Bobby On 7/25/11 3:25 PM, Tom Melendez t...@supertom.com wrote: Hi Folks, Just doing a sanity check here. I have a map-only job, which produces a filename for a key and data as a value. I want to write the value (data) into the key (filename) in the path specified when I run the job. The value (data) doesn't need any formatting, I can just write it to HDFS without modification. So, looking at this link (the Output Formats section): http://developer.yahoo.com/hadoop/tutorial/module5.html Looks like I want to: - create a new output format - override write, tell it not to call writekey as I don't want that written - new getRecordWriter method that use the key as the filename and calls my outputformat Sound reasonable? Thanks, Tom -- === Skybox is hiring. http://www.skyboximaging.com/careers/jobs -- === Skybox is hiring. http://www.skyboximaging.com/careers/jobs
Re: Custom FileOutputFormat / RecordWriter
Hi Harsh, Thanks for the response. Unfortunately, I'm not following your response. :-) Could you elaborate a bit? Thanks, Tom On Mon, Jul 25, 2011 at 2:10 PM, Harsh J ha...@cloudera.com wrote: You can use MultipleOutputs (or MultiTextOutputFormat for direct key-file mapping, but I'd still prefer the stable MultipleOutputs). Your sinking Key can be of NullWritable type, and you can keep passing an instance of NullWritable.get() to it in every cycle. This would write just the value, while the filenames are added/sourced from the key inside the mapper code. This, if you are not comfortable writing your own code and maintaining it, I s'pose. Your approach is correct as well, if the question was specifically that. On Tue, Jul 26, 2011 at 1:55 AM, Tom Melendez t...@supertom.com wrote: Hi Folks, Just doing a sanity check here. I have a map-only job, which produces a filename for a key and data as a value. I want to write the value (data) into the key (filename) in the path specified when I run the job. The value (data) doesn't need any formatting, I can just write it to HDFS without modification. So, looking at this link (the Output Formats section): http://developer.yahoo.com/hadoop/tutorial/module5.html Looks like I want to: - create a new output format - override write, tell it not to call writekey as I don't want that written - new getRecordWriter method that use the key as the filename and calls my outputformat Sound reasonable? Thanks, Tom -- === Skybox is hiring. http://www.skyboximaging.com/careers/jobs -- Harsh J -- === Skybox is hiring. http://www.skyboximaging.com/careers/jobs
Re: tips and tools to optimize cluster
Thanks Chris, these are quite helpful. Thanks, Tom On Tue, May 24, 2011 at 11:13 AM, Chris Smith csmi...@gmail.com wrote: Worth a look at OpenTSDB ( http://opentsdb.net/ ) as it doesn't lose precision on the historical data. It also has some neat tracks around the collection and display of data. Another useful tool is 'collectl' ( http://collectl.sourceforge.net/ ) which is a light weight Perl script that both captures and compresses the metrics, manages it's metrics data files and then filters and presents the metrics as requested. I find collectl lightweight and useful enough that I set it up to capture everything and then leave it running in the background on most systems I build because when you need the measurement data the event is usually in the past and difficult to reproduce. With collectl running I have a week to recognise the event and analyse/save the relevant data file(s); data file approx. 21MB/node/day gzipped. With a little bit of bash or awk or perl scripting you can convert the collectl output into a form easily loadable into Pig. Pig also has User Defined Functions (UDFs) that can import the Hadoop job history so with some Pig Latin you can marry your infrastructure metrics with your job metrics; a bit like the cluster eating it own dog food. BTW, watch out for a little gotcha with Ganglia. It doesn't seem to report the full jvm metrics via gmond although if you output the jvm metrics to file you get a record for each jvm on the node. I haven't looked into it in detail yet but it looks like Gangla only reports the last jvm record in each batch. Anyone else seen this? Chris On 24 May 2011 01:48, Tom Melendez t...@supertom.com wrote: Hi Folks, I'm looking for tips, tricks and tools to get at node utilization to optimize our cluster. I want answer questions like: - what nodes ran a particular job? - how long did it take for those nodes to run the tasks for that job? - how/why did Hadoop pick those nodes to begin with? More detailed questions like - how much memory did the task for the job use on that node? - average CPU load on that node during the task run And more aggregate questions like: - are some nodes favored more than others? - utilization averages (generally, how many cores on that node are in use, etc.) There are plenty more that I'm not asking, but you get the point? So, what are you guys using for this? I see some mentions of Ganglia, so I'll definitely look into that. Anything else? Anything you're using to monitor in real-time (like a 'top' across the nodes or something like that)? Any info or war-stories greatly appreciated. Thanks, Tom
tips and tools to optimize cluster
Hi Folks, I'm looking for tips, tricks and tools to get at node utilization to optimize our cluster. I want answer questions like: - what nodes ran a particular job? - how long did it take for those nodes to run the tasks for that job? - how/why did Hadoop pick those nodes to begin with? More detailed questions like - how much memory did the task for the job use on that node? - average CPU load on that node during the task run And more aggregate questions like: - are some nodes favored more than others? - utilization averages (generally, how many cores on that node are in use, etc.) There are plenty more that I'm not asking, but you get the point? So, what are you guys using for this? I see some mentions of Ganglia, so I'll definitely look into that. Anything else? Anything you're using to monitor in real-time (like a 'top' across the nodes or something like that)? Any info or war-stories greatly appreciated. Thanks, Tom
Re: Linker errors with Hadoop pipes
I'm on Ubuntu and use pipes. These are my ssl packages, notice libssl and libssl-dev in particular: supertom@hadoop-2:~/h-v8$ dpkg -l |grep -i ssl ii libopenssl-ruby 4.2 OpenSSL interface for Ruby ii libopenssl-ruby1.8 1.8.7.249-2 OpenSSL interface for Ruby 1.8 ii libssl-dev 0.9.8k-7ubuntu8.6 SSL development libraries, header files and ii libssl0.9.8 0.9.8k-7ubuntu8.6 SSL shared libraries ii openssl 0.9.8k-7ubuntu8 Secure Socket Layer (SSL) binary and related ii python-openssl 0.10-1 Python wrapper around the OpenSSL library ii ssl-cert1.0.23ubuntu2 simple debconf wrapper for OpenSSL Hope that helps, Thanks, Tom On Thu, May 19, 2011 at 3:28 PM, tdp2110 thomas.d.pet...@gmail.com wrote: n00b here, just started playing around with pipes. I'm getting linker errors while compiling a simple WordCount example using hadoop-0.20.203 (current most recent version) that did not appear for the same code in hadoop-0.20.2 Linker errors of the form: undefined reference to `EVP_sha1' in HadoopPipes.cc. EVP_sha1 (and all of the undefined references I get) are part of the openssl library which HadoopPipes.cc from hadoop-0.20.203 uses, but hadoop-0.20.2 does not. I've tried adjusting my makefile to link to the ssl libraries, but I'm still out of luck. Any ideas would be greatly appreciated. Thanks! PS, here is my current makefile: CC = g++ HADOOP_INSTALL = /usr/local/hadoop-0.20.203.0 SSL_INSTALL = /usr/local/ssl PLATFORM = Linux-amd64-64 CPPFLAGS = -m64 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include -I$(SSL_INSTALL)/include WordCount: WordCount.cc $(CC) $(CPPFLAGS) $ -Wall -Wextra -L$(SSL_INSTALL)/lib -lssl -lcrypto -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes -lhadooputils -lpthread -g -O2 -o $@ -- View this message in context: http://old.nabble.com/Linker-errors-with-Hadoop-pipes-tp31634596p31634596.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
passing classpath through to datanodes?
Hi Folks, I'm having trouble getting a custom classpath through to the datanodes in my cluster. I'm using libhdfs and pipes, and the hdfsConnect call in libhdfs requires that the classpath is set. My code executes fine on a standalone machine, but when I take to the cluster, I can see that the classpath is not set, as the error is emitted into the logs. I'm mucked around with the hadoop-env.sh file and restarted the tasktracker and datanode, but since I'm new to tinkering with this file, I'm hoping that someone here with can help me with the steps getting my classpath set correctly. Maybe hadoop-env.sh is NOT the right way to do this? Thanks, Tom