Re: copy data from one hadoop cluster to another hadoop cluster + cant use distcp

2015-06-20 Thread SF Hadoop
Really depends on your requirements for the format of the data. The easiest way I can think of is to stream batches of data into a pub sub system that the target system can access and then consume. Verify each batch and then ditch them. You can throttle the size of the intermediary

Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread SF Hadoop
I'm not sure if this is an HBase issue or an Hadoop issue so if this is off-topic please forgive. I am having a problem with Hadoop maxing out drive space on a select few nodes when I am running an HBase job. The scenario is this: - The job is a data import using Map/Reduce / HBase - The data

Re: Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread SF Hadoop
full. hdfs-site.xml property namedfs.datanode.du.reserved/name value/value descriptionReserved space in bytes per volume. Always leave this much space free for non dfs use. /description /property 2014-10-09 14:01 GMT+08:00 SF Hadoop sfhad...@gmail.com javascript:_e(%7B%7D,'cvml

Re: Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread SF Hadoop
node ? Cheers On Oct 8, 2014, at 11:01 PM, SF Hadoop sfhad...@gmail.com javascript:; wrote: I'm not sure if this is an HBase issue or an Hadoop issue so if this is off-topic please forgive. I am having a problem with Hadoop maxing out drive space on a select few nodes when I am

Re: Hadoop configuration for cluster machines with different memory capacity / # of cores etc.

2014-10-09 Thread SF Hadoop
Yes. You are correct. Just keep in mind, for every spec X machine you have to have version X of hadoop configs (that only reside on spec X machines). Version Y configs reside on only version Y machines, and so on. But yes, it is possible. On Thu, Oct 9, 2014 at 9:40 AM, Manoj Samel

Re: Standby Namenode and Datanode coexistence

2014-10-09 Thread SF Hadoop
You can run any of the daemons on any machine you want, you just have to be aware of the trade offs you are making with RAM allocation. I am hoping this is a DEV cluster. This is definitely not a configuration you would want to use in production. If you are asking in regards to a production

Re: MapReduce jobs start only on the PC they are typed on

2014-10-09 Thread SF Hadoop
What is in /etc/hadoop/conf/slaves? Something tells me it just says 'localhost'. You need to specify your slaves in that file. On Thu, Oct 9, 2014 at 2:24 PM, Piotr Kubaj pku...@riseup.net wrote: Hi. I'm trying to run Hadoop on a 2-PC cluster (I need to do some benchmarks for my bachelor

Block placement without rack aware

2014-10-02 Thread SF Hadoop
What is the block placement policy hadoop follows when rack aware is not enabled? Does it just round robin? Thanks.

Re: Block placement without rack aware

2014-10-02 Thread SF Hadoop
-file-locality-in-hdfs.html On Thu, Oct 2, 2014 at 4:12 PM, SF Hadoop sfhad...@gmail.com wrote: What is the block placement policy hadoop follows when rack aware is not enabled? Does it just round robin? Thanks.

Re: Data node with multiple disks

2014-05-17 Thread SF Hadoop
just set you replication factor to 1 and you will be fine. On Tue, May 13, 2014 at 8:12 AM, Marcos Sousa falecom...@marcossousa.comwrote: Yes, I don't want to replicate, just use as one disk? Isn't possible to make this work? Best regards, Marcos On Tue, May 13, 2014 at 6:55 AM,

Re: Data node with multiple disks

2014-05-13 Thread SF Hadoop
Your question is unclear. Please restate and describe what you are attempting to do. Thanks. On Monday, May 12, 2014, Marcos Sousa falecom...@marcossousa.com wrote: Hi, I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to be my datanode: /vol1/hadoop/data

Re: Not information in Job History UI

2014-03-04 Thread SF Hadoop
-and-access-in-yarn/ Thanks. On Mon, Mar 3, 2014 at 4:29 PM, SF Hadoop sfhad...@gmail.com wrote: Thanks for that info Jian. You said, there are no job logs generated on the server that is running the job.. So am I correct in assuming the logs will be in the dir specified

Not information in Job History UI

2014-03-03 Thread SF Hadoop
Hadoop 2.2.0 CentOS 6.4 Viewing UI in various browsers. I am having a problem where no information is visible in my Job History UI. I run test jobs, they complete without error, but no information ever populates the nodemanager or jobhistory server UI. Also, there are no job logs generated on

Re: Not information in Job History UI

2014-03-03 Thread SF Hadoop
config yarn.nodemanager.delete.debug-delay-sec, to delay the deletion of the logs. Jian On Mon, Mar 3, 2014 at 10:45 AM, SF Hadoop sfhad...@gmail.com wrote: Hadoop 2.2.0 CentOS 6.4 Viewing UI in various browsers. I am having a problem where no information is visible in my Job History UI

Java version with Hadoop 2.0

2013-10-09 Thread SF Hadoop
I am preparing to deploy multiple cluster / distros of Hadoop for testing / benchmarking. In my research I have noticed discrepancies in the version of the JDK that various groups are using. Example: Hortonworks is suggesting JDK6u31, CDH recommends either 6 or 7 providing you stick to some

Re: Java version with Hadoop 2.0

2013-10-09 Thread SF Hadoop
I hadn't. Thank you!!! Very helpful. Andy On Wed, Oct 9, 2013 at 2:25 PM, Patai Sangbutsarakum patai.sangbutsara...@turn.com wrote: maybe you've already seen this. http://wiki.apache.org/hadoop/HadoopJavaVersions On Oct 9, 2013, at 2:16 PM, SF Hadoop sfhad...@gmail.com wrote