[yarn] job is not getting assigned

2013-08-29 Thread Andre Kelpe
Hi,

I am in the middle of setting up a hadoop 2 cluster. I am using the hadoop
2.1-beta tarball.

My cluster has 1 master node running the hdfs namenode, the resourcemanger
and the job history server. Next to that I have  3 nodes acting as
datanodes and nodemanagers.

In order to test, if everything is working, I submitted the teragen job
from the hadoop-examples jar like this:

$ hadoop jar
$HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.1.0-beta.jar
teragen 1000 /user/vagrant/teragen

The job starts up and I  get the following output:

13/08/29 14:42:46 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
13/08/29 14:42:47 INFO client.RMProxy: Connecting to ResourceManager at
master.local/192.168.7.10:8032
13/08/29 14:42:48 INFO terasort.TeraSort: Generating 1000 using 2
13/08/29 14:42:48 INFO mapreduce.JobSubmitter: number of splits:2
13/08/29 14:42:48 WARN conf.Configuration: user.name is deprecated.
Instead, use mapreduce.job.user.name
13/08/29 14:42:48 WARN conf.Configuration: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
13/08/29 14:42:48 WARN conf.Configuration: mapred.reduce.tasks is
deprecated. Instead, use mapreduce.job.reduces
13/08/29 14:42:48 WARN conf.Configuration: mapred.output.value.class is
deprecated. Instead, use mapreduce.job.output.value.class
13/08/29 14:42:48 WARN conf.Configuration: mapreduce.map.class is
deprecated. Instead, use mapreduce.job.map.class
13/08/29 14:42:48 WARN conf.Configuration: mapred.job.name is deprecated.
Instead, use mapreduce.job.name
13/08/29 14:42:48 WARN conf.Configuration: mapreduce.inputformat.class is
deprecated. Instead, use mapreduce.job.inputformat.class
13/08/29 14:42:48 WARN conf.Configuration: mapred.output.dir is deprecated.
Instead, use mapreduce.output.fileoutputformat.outputdir
13/08/29 14:42:48 WARN conf.Configuration: mapreduce.outputformat.class is
deprecated. Instead, use mapreduce.job.outputformat.class
13/08/29 14:42:48 WARN conf.Configuration: mapred.map.tasks is deprecated.
Instead, use mapreduce.job.maps
13/08/29 14:42:48 WARN conf.Configuration: mapred.output.key.class is
deprecated. Instead, use mapreduce.job.output.key.class
13/08/29 14:42:48 WARN conf.Configuration: mapred.working.dir is
deprecated. Instead, use mapreduce.job.working.dir
13/08/29 14:42:49 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1377787324271_0001
13/08/29 14:42:50 INFO impl.YarnClientImpl: Submitted application
application_1377787324271_0001 to ResourceManager at master.local/
192.168.7.10:8032
13/08/29 14:42:50 INFO mapreduce.Job: The url to track the job:
http://master.local:8088/proxy/application_1377787324271_0001/
13/08/29 14:42:50 INFO mapreduce.Job: Running job: job_1377787324271_0001

and then it stops. If I check the UI, I see this:

application_1377787324271_0001
vagrantTeraGenMAPREDUCEdefaultThu, 29 Aug 2013 14:42:49 GMTN/AACCEPTED
UNDEFINED

UNASSIGNED 
I have no idea, why it is not starting, nor what to look for. Any pointers
are more than welcome!

Thanks!

- André

-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: [yarn] job is not getting assigned

2013-08-30 Thread Andre Kelpe
Hi Vinod,

I found the issue: The yarn.nodemanager.resource.memory-mb value was to
low. I set it back to the default value and the job runs fine now.

Thanks!

- André


On Thu, Aug 29, 2013 at 7:36 PM, Vinod Kumar Vavilapalli  wrote:

>
> This usually means there are no available resources as seen by the
> ResourceManager. Do you see "Active Nodes" on the RM web UI first page? If
> not, you'll have to check the NodeManager logs to see if they crashed for
> some reason.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Aug 29, 2013, at 7:52 AM, Andre Kelpe wrote:
>
> Hi,
>
> I am in the middle of setting up a hadoop 2 cluster. I am using the hadoop
> 2.1-beta tarball.
>
> My cluster has 1 master node running the hdfs namenode, the resourcemanger
> and the job history server. Next to that I have  3 nodes acting as
> datanodes and nodemanagers.
>
> In order to test, if everything is working, I submitted the teragen job
> from the hadoop-examples jar like this:
>
> $ hadoop jar
> $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.1.0-beta.jar
> teragen 1000 /user/vagrant/teragen
>
> The job starts up and I  get the following output:
>
> 13/08/29 14:42:46 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 13/08/29 14:42:47 INFO client.RMProxy: Connecting to ResourceManager at
> master.local/192.168.7.10:8032
> 13/08/29 14:42:48 INFO terasort.TeraSort: Generating 1000 using 2
> 13/08/29 14:42:48 INFO mapreduce.JobSubmitter: number of splits:2
> 13/08/29 14:42:48 WARN conf.Configuration: user.name is deprecated.
> Instead, use mapreduce.job.user.name
> 13/08/29 14:42:48 WARN conf.Configuration: mapred.jar is deprecated.
> Instead, use mapreduce.job.jar
> 13/08/29 14:42:48 WARN conf.Configuration: mapred.reduce.tasks is
> deprecated. Instead, use mapreduce.job.reduces
> 13/08/29 14:42:48 WARN conf.Configuration: mapred.output.value.class is
> deprecated. Instead, use mapreduce.job.output.value.class
> 13/08/29 14:42:48 WARN conf.Configuration: mapreduce.map.class is
> deprecated. Instead, use mapreduce.job.map.class
> 13/08/29 14:42:48 WARN conf.Configuration: mapred.job.name is deprecated.
> Instead, use mapreduce.job.name
> 13/08/29 14:42:48 WARN conf.Configuration: mapreduce.inputformat.class is
> deprecated. Instead, use mapreduce.job.inputformat.class
> 13/08/29 14:42:48 WARN conf.Configuration: mapred.output.dir is
> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
> 13/08/29 14:42:48 WARN conf.Configuration: mapreduce.outputformat.class is
> deprecated. Instead, use mapreduce.job.outputformat.class
> 13/08/29 14:42:48 WARN conf.Configuration: mapred.map.tasks is deprecated.
> Instead, use mapreduce.job.maps
> 13/08/29 14:42:48 WARN conf.Configuration: mapred.output.key.class is
> deprecated. Instead, use mapreduce.job.output.key.class
> 13/08/29 14:42:48 WARN conf.Configuration: mapred.working.dir is
> deprecated. Instead, use mapreduce.job.working.dir
> 13/08/29 14:42:49 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1377787324271_0001
> 13/08/29 14:42:50 INFO impl.YarnClientImpl: Submitted application
> application_1377787324271_0001 to ResourceManager at master.local/
> 192.168.7.10:8032
> 13/08/29 14:42:50 INFO mapreduce.Job: The url to track the job:
> http://master.local:8088/proxy/application_1377787324271_0001/
> 13/08/29 14:42:50 INFO mapreduce.Job: Running job: job_1377787324271_0001
>
> and then it stops. If I check the UI, I see this:
>
> application_1377787324271_0001<http://master.local:8088/cluster/app/application_1377787324271_0001>
> vagrantTeraGenMAPREDUCEdefaultThu, 29 Aug 2013 14:42:49 GMTN/AACCEPTED
> UNDEFINED
>
> UNASSIGNED <http://master.local:8088/cluster/apps#>
> I have no idea, why it is not starting, nor what to look for. Any pointers
> are more than welcome!
>
> Thanks!
>
> - André
>
> --
> André Kelpe
> an...@concurrentinc.com
> http://concurrentinc.com
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


verifying tarball downloads

2013-09-01 Thread Andre Kelpe
Hi,

I am looking for the most obvious way to verify the downloads of
hadoop tarballs. I saw that you provide a .mds file containing MD5,
SHA1 and other check sums, which is produced by gpg --print-mds. I
fail finding the right way to verify those in a reliable way. I came
up with this:

 md5sum --check  <(grep "MD5 = " hadoop-1.2.1.tar.gz.mds | sed -e
"s/MD5 = //g;s/ //g" | awk -F: '{print tolower($2), "", $1}')

and that works, but that cannot be the right way of doing it. Please
point me to the part of the docs, that I fail finding.

Thanks!

- André

P.S.: Since you are using gpg already, why are you not signing the
releases, like other projects do?

-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: java.net.ConnectException when using Httpfs

2013-09-03 Thread Andre Kelpe
something is wrong with your name-resolution. If you look at the error
message, it says you are trying to connect to 127.0.0.1 instead of the
remote host.

-André

On Tue, Sep 3, 2013 at 12:05 PM, Visioner Sadak
 wrote:
> Hello Hadoopers,
>
>   I am trying to configure httpfs below are my
> configurations in
>
>
> httpfs-site.xml
>
> 
> httpfs.fsAccess.conf:fs.default.name
> 
> hdfs://132.168.0.10:8020
> 
>
> and in core-site.xml
>
> 
>  hadoop.proxyuser.hadoop.hosts
>  132.168.0.10
>  
>
> 
> fs.default.name
> viewfs:///
> 
>
> 
> fs.viewfs.mounttable.default.link./NN1Home
> hdfs://132.168.0.10:8020/NN1Home
> 
>
>  I am able to start httpfs but when i try  to access a file through httpfs
> it throws the below error
>
> {"RemoteException":{"message":"Call From server1.local\/127.0.0.1 to
> localhost:8020 failed on connection exception: java.net.ConnectException:
> Connection refused; For more details see:
> http:\/\/wiki.apache.org\/hadoop\/ConnectionRefused","exception":"ConnectException","javaClassName":"java.net.ConnectException"}}
>
>
>
>
>
>
>



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Running Contrail on hadoop

2013-09-03 Thread Andre Kelpe
This is usually a String.format() problem, when the developer was
using an English locale and was not aware of the fact, that
String.format is locale dependent.

Try this:

export LANG=en_EN.UTF-8


- André

On Tue, Sep 3, 2013 at 3:20 PM, Felipe Gutierrez
 wrote:
> Hi,
>
> I am trying to run Contrail on Hadoop and it starts ok, but after some time
> throws an error. Some java class can't convert a string to float because
> there is a comma.
>
> I think the problem is not at contrail code. Maybe the location of my
> timezone? Does anyone already passed through this?
>
> $ bin/hadoop jar contrail-1.0-SNAPSHOT.jar contrail.Contrail -asm assembly
> -k 25 -reads reads
> == Starting time 2013-09-03 10:08:59
> Preprocess reads:   2
> job_201309030936_0005 9 s  250 converted
> Build Initial:  4
>
> job_201309030936_0006 358 s  2814 nodes [250 (100,00%) good reads, 9000 bp]
>   Quick Merge:  java.lang.NumberFormatException: For input string: "1,00"
> at
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1242)
> at java.lang.Float.parseFloat(Float.java:439)
> at contrail.Node.cov(Node.java:1096)
> at contrail.Node.toNodeMsg(Node.java:664)
> at contrail.QuickMerge$QuickMergeMapper.map(QuickMerge.java:69)
> at contrail.QuickMerge$QuickMergeMapper.map(QuickMerge.java:47)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
>
> --
> --
> -- Felipe Oliveira Gutierrez
> -- felipe.o.gutier...@gmail.com
> -- https://sites.google.com/site/lipe82/Home/diaadia



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Running Contrail on hadoop

2013-09-03 Thread Andre Kelpe
I don't know anything about contrail, I believe it is a better idea,
to ask on their mailing list for help:
http://sourceforge.net/p/contrail-bio/mailman/?source=navbar

- André

On Tue, Sep 3, 2013 at 5:27 PM, Felipe Gutierrez
 wrote:
> I configure the languages but the error persists
>
> I wrote:
> $ export LANG=en_EN.UTF-8
> $ export LANGUAGE=en_US:en
> $ locale
> locale: Cannot set LC_CTYPE to default locale: No such file or directory
> locale: Cannot set LC_MESSAGES to default locale: No such file or directory
> locale: Cannot set LC_ALL to default locale: No such file or directory
> LANG=en_EN.UTF-8
> LANGUAGE=en_US:en
> LC_CTYPE="en_EN.UTF-8"
> LC_NUMERIC="en_EN.UTF-8"
> LC_TIME="en_EN.UTF-8"
> LC_COLLATE="en_EN.UTF-8"
> LC_MONETARY="en_EN.UTF-8"
> LC_MESSAGES="en_EN.UTF-8"
> LC_PAPER="en_EN.UTF-8"
> LC_NAME="en_EN.UTF-8"
> LC_ADDRESS="en_EN.UTF-8"
> LC_TELEPHONE="en_EN.UTF-8"
> LC_MEASUREMENT="en_EN.UTF-8"
> LC_IDENTIFICATION="en_EN.UTF-8"
> LC_ALL=
>
> $ bin/hadoop jar contrail-1.0-SNAPSHOT.jar contrail.Contrail -asm assembly
> -k 25 -reads reads
> == Starting time 2013-09-03 12:12:56
> Preprocess reads:   job_201309030936_0016 9 s  250 converted
> Build Initial:  job_201309030936_0017 363 s  2814 nodes [250 (100.00%) good
> reads, 9000 bp]
>   Quick Merge:  java.lang.NumberFormatException: For input string: "1,00"
> at
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1242)
> at java.lang.Float.parseFloat(Float.java:439)
> at contrail.Node.cov(Node.java:1098)
> at contrail.Node.toNodeMsg(Node.java:664)
> at contrail.QuickMerge$QuickMergeMapper.map(QuickMerge.java:69)
> at contrail.QuickMerge$QuickMergeMapper.map(QuickMerge.java:47)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
>
>
> On Tue, Sep 3, 2013 at 11:00 AM, Andre Kelpe 
> wrote:
>>
>> This is usually a String.format() problem, when the developer was
>> using an English locale and was not aware of the fact, that
>> String.format is locale dependent.
>>
>> Try this:
>>
>> export LANG=en_EN.UTF-8
>> 
>>
>> - André
>>
>> On Tue, Sep 3, 2013 at 3:20 PM, Felipe Gutierrez
>>  wrote:
>> > Hi,
>> >
>> > I am trying to run Contrail on Hadoop and it starts ok, but after some
>> > time
>> > throws an error. Some java class can't convert a string to float because
>> > there is a comma.
>> >
>> > I think the problem is not at contrail code. Maybe the location of my
>> > timezone? Does anyone already passed through this?
>> >
>> > $ bin/hadoop jar contrail-1.0-SNAPSHOT.jar contrail.Contrail -asm
>> > assembly
>> > -k 25 -reads reads
>> > == Starting time 2013-09-03 10:08:59
>> > Preprocess reads:   2
>> > job_201309030936_0005 9 s  250 converted
>> > Build Initial:  4
>> >
>> > job_201309030936_0006 358 s  2814 nodes [250 (100,00%) good reads, 9000
>> > bp]
>> >   Quick Merge:  java.lang.NumberFormatException: For input string:
>> > "1,00"
>> > at
>> > sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1242)
>> > at java.lang.Float.parseFloat(Float.java:439)
>> > at contrail.Node.cov(Node.java:1096)
>> > at contrail.Node.toNodeMsg(Node.java:664)
>> > at contrail.QuickMerge$QuickMergeMapper.map(QuickMerge.java:69)
>> > at contrail.QuickMerge$QuickMergeMapper.map(QuickMerge.java:47)
>> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> > at
>> > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:416)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> >
>> >
>> >
>> > --
>> > --
>> > -- Felipe Oliveira Gutierrez
>> > -- felipe.o.gutier...@gmail.com
>> > -- https://sites.google.com/site/lipe82/Home/diaadia
>>
>>
>>
>> --
>> André Kelpe
>> an...@concurrentinc.com
>> http://concurrentinc.com
>
>
>
>
> --
> --
> -- Felipe Oliveira Gutierrez
> -- felipe.o.gutier...@gmail.com
> -- https://sites.google.com/site/lipe82/Home/diaadia



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


junit jars shipping with hadoop

2013-09-04 Thread Andre Kelpe
Hi,

while running some local tests, I had trouble with junit and it turned
out, that hadoop itself (1.2.1 and 2.1.0-beta) ships with a junit jar.
I am wondering if that is a bug or a feature. If it is a feature,
upgrading to something newer than 4.5 would be nice for the stable
version of hadoop. If it is a bug, I can submit a JIRA (+ patch).

Thanks!

André

-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: DataNode: Can't replicate block

2013-09-06 Thread Andre Kelpe
It looks like an overflow somewhere 9223372036854775807 ==
0x7fffL == java.lang.Long.MAX_VALUE.

- André

On Fri, Sep 6, 2013 at 5:42 PM, Logan Hardy  wrote:
> I'm seeing thousands of the following messages per day on my Datanodes. In
> every single message the NameNode recorded length is exactly
> "9223372036854775807". That's 8192 Petabytes. Ideas? I posted this error as
> part of a larger question that went's unanswered on the CDH user group but I
> really want to understand this specific message so I'm breaking it out here.
>
> hadoop-hdfs-datanode-2.0.0+1357-1.cdh4.3.0
>
> 2013-09-05 03:57:56,765 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_-917008810045374143_520
> because on-disk length 843090 is shorter than NameNode recorded length
> 9223372036854775807
> 2013-09-05 03:57:59,765 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_4638855469361221841_522
> because on-disk length 108990 is shorter than NameNode recorded length
> 9223372036854775807
> 2013-09-05 03:57:59,766 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_-3096141510869821776_526
> because on-disk length 706015 is shorter than NameNode recorded length
> 9223372036854775807
> 2013-09-05 03:58:02,767 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_1061908974542106736_528
> because on-disk length 659123 is shorter than NameNode recorded length
> 9223372036854775807
> 2013-09-05 03:58:08,772 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_235372784090791353_585
> because on-disk length 4609178 is shorter than NameNode recorded length
> 9223372036854775807
> 2013-09-05 03:59:17,805 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_8980835874496757824_2223043
> because on-disk length 166263 is shorter than NameNode recorded length
> 9223372036854775807
> 2013-09-05 03:59:20,807 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_9066832331074569403_2223100
> because on-disk length 33742 is shorter than NameNode recorded length
> 9223372036854775807
> 2013-09-05 03:59:20,808 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_-3987096455745701841_2223102
> because on-disk length 4609178 is shorter than NameNode recorded length
> 9223372036854775807
> 2013-09-05 03:59:56,831 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_3246916891446829945_2223298
> because on-disk length 108990 is shorter than NameNode recorded length
> 9223372036854775807
> 2013-09-05 03:59:56,832 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_-8519971491515089660_2223300
> because on-disk length 697887 is shorter than NameNode recorded length
> 9223372036854775807
> 2013-09-05 03:59:59,833 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_713523039348005_2223308
> because on-disk length 241259 is shorter than NameNode recorded length
> 9223372036854775807
> 2013-09-05 03:59:59,834 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Can't replicate block
> BP-1353222332-10.45.100.25-1360087637866:blk_3360556618070774912_2223310
> because on-disk length 166263 is shorter than NameNode recorded length
> 9223372036854775807
>
> - Logan-



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Java version with Hadoop 2.0

2013-10-09 Thread Andre Kelpe
also keep in mind, that java 6 no longer gets "public" updates from
Oracle: http://www.oracle.com/technetwork/java/eol-135779.html

- André

On Wed, Oct 9, 2013 at 11:48 PM, SF Hadoop  wrote:
> I hadn't.  Thank you!!!  Very helpful.
>
> Andy
>
>
> On Wed, Oct 9, 2013 at 2:25 PM, Patai Sangbutsarakum
>  wrote:
>>
>> maybe you've already seen this.
>>
>> http://wiki.apache.org/hadoop/HadoopJavaVersions
>>
>>
>> On Oct 9, 2013, at 2:16 PM, SF Hadoop 
>>  wrote:
>>
>> I am preparing to deploy multiple cluster / distros of Hadoop for testing
>> / benchmarking.
>>
>> In my research I have noticed discrepancies in the version of the JDK that
>> various groups are using.  Example:  Hortonworks is suggesting JDK6u31, CDH
>> recommends either 6 or 7 providing you stick to some guidelines for each and
>> Apache Hadoop seems to be somewhat of a "no mans land"; a lot of people
>> using a lot of different versions.
>>
>> Does anyone have any insight they could share about how to approach
>> choosing the best JDK release?  (I'm a total Java newb, so any info /
>> further reading you guys can provide is appreciated.)
>>
>> Thanks.
>>
>> sf
>>
>>
>



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Create Multiple VM's on Mac

2013-10-11 Thread Andre Kelpe
Have a look at our vagrant hadoop cluster, that does just that (using
ubuntu though):

https://github.com/Cascading/vagrant-cascading-hadoop-cluster

-- André

On Sat, Oct 12, 2013 at 12:33 AM, Raj Hadoop  wrote:
> All,
>
> I have a CentOS VM image and want to replicate it four times on my Mac
> computer. How
> can I set it up so that I can have 4 individual machines that can be used as
> nodes
> in my Hadoop cluster.
>
> Please advise.
>
>
> Thanks,
> Raj



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Error in documentation

2013-10-18 Thread Andre Kelpe
The best thing to do is to open a JIRA here:
https://issues.apache.org/jira/secure/Dashboard.jspa You might also
want to submit a patch, which is very easy.

- André

On Fri, Oct 18, 2013 at 11:28 AM, Siddharth Tiwari
 wrote:
> The installation documentation for Hadoop yarn at this link
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
> has error in the yarn-site for property yarn.nodemanager.aux-services. it
> should be  mapreduce_shuffle rather than mapreduce.shuffle.
>
>
>
>
>
>
> **
> Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of
> God.”
> "Maybe other people will try to limit me but I don't limit myself"



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Error in documentation

2013-10-18 Thread Andre Kelpe
Now get a copy of the code, fix the mistake and attach the patch to the JIRA.

- André

On Fri, Oct 18, 2013 at 11:49 AM, Siddharth Tiwari
 wrote:
> Opened a Jira https://issues.apache.org/jira/browse/YARN-1319
>
>
>
> **
> Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of
> God.”
> "Maybe other people will try to limit me but I don't limit myself"
>
>
>> Date: Fri, 18 Oct 2013 11:42:29 +0200
>> Subject: Re: Error in documentation
>> From: ake...@concurrentinc.com
>> To: user@hadoop.apache.org
>>
>> The best thing to do is to open a JIRA here:
>> https://issues.apache.org/jira/secure/Dashboard.jspa You might also
>> want to submit a patch, which is very easy.
>>
>> - André
>>
>> On Fri, Oct 18, 2013 at 11:28 AM, Siddharth Tiwari
>>  wrote:
>> > The installation documentation for Hadoop yarn at this link
>> >
>> > http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
>> > has error in the yarn-site for property yarn.nodemanager.aux-services.
>> > it
>> > should be mapreduce_shuffle rather than mapreduce.shuffle.
>> >
>> >
>> >
>> >
>> >
>> >
>> > **
>> > Cheers !!!
>> > Siddharth Tiwari
>> > Have a refreshing day !!!
>> > "Every duty is holy, and devotion to duty is the highest form of worship
>> > of
>> > God.”
>> > "Maybe other people will try to limit me but I don't limit myself"
>>
>>
>>
>> --
>> André Kelpe
>> an...@concurrentinc.com
>> http://concurrentinc.com



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: C++ example for hadoop-2.2.0

2013-11-04 Thread Andre Kelpe
I reported the 32bit/64bit problem a few weeks ago. There hasn't been
much activity around it though:
https://issues.apache.org/jira/browse/HADOOP-9911

- André

On Mon, Nov 4, 2013 at 2:20 PM, Salman Toor  wrote:
> Hi,
>
> Ok so 2.x is not a new version its another branch. Good to know! Actually
> 32bit will be difficult as the code I got have already some dependencies on
> 64 bit.
>
> Otherwise I will continue with 1.x version. Can you suggest some version
> with 1.x series which is stable and work on the cluster environment?
> especially with C++ ...
>
> Regards..
> Salman.
>
> Salman Toor, PhD
> salman.t...@it.uu.se
>
>
>
> On Nov 4, 2013, at 1:54 PM, Amr Shahin wrote:
>
> Well, the 2 series isn't exactly the "next version". It's a continuation of
> branch .2.
> Also, the error message from the gcc indicates that the library you're
> trying to link to isn't compatible which made me suspicious. check the
> documentation to see if hadoop has 64 libraries, or otherwise compile
> against the 32 ones
>
>
> On Mon, Nov 4, 2013 at 4:51 PM, Salman Toor  wrote:
>>
>> Hi,
>>
>> Thanks for your answer!
>>
>> But are you sure about it? Actually Hadoop version 1.2 have both 32 and 64
>> bit libraries so I believe the the next version should have both... But I am
>> not sure just a random guess :-(
>>
>> Regards..
>> Salman.
>>
>> Salman Toor, PhD
>> salman.t...@it.uu.se
>>
>>
>>
>> On Nov 4, 2013, at 1:38 PM, Amr Shahin wrote:
>>
>> I believe hadoop isn't compatible with 64 architecture. Try installing the
>> 32 libraries and compile against them.
>> This error (skipping incompatible
>> /home/sztoor/hadoop-2.2.0/lib/native/libhadooppipes.a when searching
>> -lhadooppipes) indicates so
>>
>>
>> On Mon, Nov 4, 2013 at 2:44 PM, Salman Toor  wrote:
>>>
>>> Hi,
>>>
>>> Can someone give a pointer?
>>>
>>>
>>> Thanks in advance.
>>>
>>> Regards..
>>> Salman.
>>>
>>>
>>> Salman Toor, PhD
>>> salman.t...@it.uu.se
>>>
>>>
>>>
>>> On Nov 3, 2013, at 11:31 PM, Salman Toor wrote:
>>>
>>> Hi,
>>>
>>> I am quite new to the Hadoop world, previously was running hadoop-1.2.0
>>> stable version on my small cluster and encountered some strange problems
>>> like the local path to the mapper  file didn't copy to the hdfs  It
>>> works fine on the single node setup but on multiple node simple word-count
>>> python example didn't work...  I read on the blog that it might be the
>>> problem in the version I am using. So I thought to change the version and
>>> downloaded the Hadoop 2.2.0. This version has yarn together with many new
>>> features that I hope I will learn in the future. Now simple wordcount
>>> example works without any problem on the multi-node setup. I am using simple
>>> python example.
>>>
>>> Now I would like to compile my C++ code. Since the directory structure
>>> together with other things have been changed. I have started to get the
>>> following error:
>>>
>>> 
>>> /urs/bin/ld:  skipping incompatible
>>> /home/sztoor/hadoop-2.2.0/lib/native/libhadooputils.a when searching
>>> -lhadooputils
>>> cannot find -lhadooputils
>>>
>>>  /urs/bin/ld:  skipping incompatible
>>> /home/sztoor/hadoop-2.2.0/lib/native/libhadooppipes.a when searching
>>> -lhadooppipes
>>> cannot find -lhadooppipes
>>> --
>>>
>>> I have managed to run the c++ example successfully with the 1.2.0 version
>>> on single node setup.
>>>
>>> I am having 64bit Ubuntu machine. previously I was using Linux-amd64-64
>>>
>>> Now in new version, "lib" and "include" directories are in the
>>> hadoop-2.2.0 directory. No build.xml is available...
>>>
>>> Can someone please give me an example of a makefile based on the version
>>> 2.2.0? Or suggest me which version I should go for? Or if there are some
>>> prerequisites that I should do before compiling my code?
>>>
>>> Thanks in advance.
>>>
>>> Regards..
>>> Salman.
>>>
>>>
>>>
>>> Salman Toor, PhD
>>> salman.t...@it.uu.se
>>>
>>>
>>>
>>>
>>
>>
>
>



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: C++ example for hadoop-2.2.0

2013-11-04 Thread Andre Kelpe
No, because I was trying to set up a cluster automatically with the
tarballs from apache.org.

- André

On Mon, Nov 4, 2013 at 3:05 PM, Salman Toor  wrote:
> Hi,
>
> Did you tried to compile with source?
>
> /Salman.
>
>
> Salman Toor, PhD
> salman.t...@it.uu.se
>
>
>
> On Nov 4, 2013, at 2:55 PM, Andre Kelpe wrote:
>
> I reported the 32bit/64bit problem a few weeks ago. There hasn't been
> much activity around it though:
> https://issues.apache.org/jira/browse/HADOOP-9911
>
> - André
>
> On Mon, Nov 4, 2013 at 2:20 PM, Salman Toor  wrote:
>
> Hi,
>
>
> Ok so 2.x is not a new version its another branch. Good to know! Actually
>
> 32bit will be difficult as the code I got have already some dependencies on
>
> 64 bit.
>
>
> Otherwise I will continue with 1.x version. Can you suggest some version
>
> with 1.x series which is stable and work on the cluster environment?
>
> especially with C++ ...
>
>
> Regards..
>
> Salman.
>
>
> Salman Toor, PhD
>
> salman.t...@it.uu.se
>
>
>
>
> On Nov 4, 2013, at 1:54 PM, Amr Shahin wrote:
>
>
> Well, the 2 series isn't exactly the "next version". It's a continuation of
>
> branch .2.
>
> Also, the error message from the gcc indicates that the library you're
>
> trying to link to isn't compatible which made me suspicious. check the
>
> documentation to see if hadoop has 64 libraries, or otherwise compile
>
> against the 32 ones
>
>
>
> On Mon, Nov 4, 2013 at 4:51 PM, Salman Toor  wrote:
>
>
> Hi,
>
>
> Thanks for your answer!
>
>
> But are you sure about it? Actually Hadoop version 1.2 have both 32 and 64
>
> bit libraries so I believe the the next version should have both... But I am
>
> not sure just a random guess :-(
>
>
> Regards..
>
> Salman.
>
>
> Salman Toor, PhD
>
> salman.t...@it.uu.se
>
>
>
>
> On Nov 4, 2013, at 1:38 PM, Amr Shahin wrote:
>
>
> I believe hadoop isn't compatible with 64 architecture. Try installing the
>
> 32 libraries and compile against them.
>
> This error (skipping incompatible
>
> /home/sztoor/hadoop-2.2.0/lib/native/libhadooppipes.a when searching
>
> -lhadooppipes) indicates so
>
>
>
> On Mon, Nov 4, 2013 at 2:44 PM, Salman Toor  wrote:
>
>
> Hi,
>
>
> Can someone give a pointer?
>
>
>
> Thanks in advance.
>
>
> Regards..
>
> Salman.
>
>
>
> Salman Toor, PhD
>
> salman.t...@it.uu.se
>
>
>
>
> On Nov 3, 2013, at 11:31 PM, Salman Toor wrote:
>
>
> Hi,
>
>
> I am quite new to the Hadoop world, previously was running hadoop-1.2.0
>
> stable version on my small cluster and encountered some strange problems
>
> like the local path to the mapper  file didn't copy to the hdfs  It
>
> works fine on the single node setup but on multiple node simple word-count
>
> python example didn't work...  I read on the blog that it might be the
>
> problem in the version I am using. So I thought to change the version and
>
> downloaded the Hadoop 2.2.0. This version has yarn together with many new
>
> features that I hope I will learn in the future. Now simple wordcount
>
> example works without any problem on the multi-node setup. I am using simple
>
> python example.
>
>
> Now I would like to compile my C++ code. Since the directory structure
>
> together with other things have been changed. I have started to get the
>
> following error:
>
>
> 
>
> /urs/bin/ld:  skipping incompatible
>
> /home/sztoor/hadoop-2.2.0/lib/native/libhadooputils.a when searching
>
> -lhadooputils
>
> cannot find -lhadooputils
>
>
> /urs/bin/ld:  skipping incompatible
>
> /home/sztoor/hadoop-2.2.0/lib/native/libhadooppipes.a when searching
>
> -lhadooppipes
>
> cannot find -lhadooppipes
>
> --
>
>
> I have managed to run the c++ example successfully with the 1.2.0 version
>
> on single node setup.
>
>
> I am having 64bit Ubuntu machine. previously I was using Linux-amd64-64
>
>
> Now in new version, "lib" and "include" directories are in the
>
> hadoop-2.2.0 directory. No build.xml is available...
>
>
> Can someone please give me an example of a makefile based on the version
>
> 2.2.0? Or suggest me which version I should go for? Or if there are some
>
> prerequisites that I should do before compiling my code?
>
>
> Thanks in advance.
>
>
> Regards..
>
> Salman.
>
>
>
>
> Salman Toor, PhD
>
> salman.t...@it.uu.se
>
>
>
>
>
>
>
>
>
>
>
>
> --
> André Kelpe
> an...@concurrentinc.com
> http://concurrentinc.com
>
>



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Hadoop 2.2.0 from source configuration

2013-12-02 Thread Andre Kelpe
Hi Daniel,

first of all, before posting to a mailing list, take a deep breath and
let your frustrations out. Then write the email. Using words like
"crappy", "toxicware", "nightmare" are not going to help you getting
useful responses.

While I agree that the docs can be confusing, we should try to stay
constructive. You haven't  mentioned which documentation you are
using. I found the cluster tutorial sufficient to get me started:
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html

If you are looking for an easy way to spin up a small cluster with
hadoop 2.2, try the hadoop2 branch of this vagrant setup:

https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2

- André

On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard  wrote:
> I am trying to configure hadoop 2.2.0 from source code and I found the
> instructions really crappy and incomplete. It is like they were written to
> avoid someone can do the job himself and must contract someone else to do it
> or buy a packaged version.
>
> It is about three days I am struggling with this stuff with partial success.
> The documentation is less than clear and most of the stuff out there apply
> to earlier version and they haven't been updated for version 2.2.0.
>
> I was able to setup HDFS, however I am still unable to use it. I am doing a
> single node installation and the instruction page doesn't explain anything
> beside telling you to do this and that without documenting what each thing
> is doing and what choices are available and what guidelines you should
> follow. There is even environment variables you are told to set, but nothing
> is said about what they mean and to which value they should be set. It seems
> it assumes prior knowledge of everything about hadoop.
>
> Anyone knows a site with proper documentation about hadoop or it's hopeless
> and this whole thing is just a piece of toxicware?
>
> I am already looking for alternate solutions to hadoop which for sure will
> be a nightmare to manage and install each time a new version, release will
> become available.
>
> TIA
> -
> Daniel Savard



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Permission problem in Junit test - Hadoop 2.2.0

2013-12-17 Thread Andre Kelpe
You have to start eclipse from an environment that has the correct umask
set, otherwise it will not inherit the settings.

Open a terminal, do umask 022 && eclipse and re-try to run the tests.

- André


On Wed, Dec 18, 2013 at 12:35 AM, Karim Awara wrote:

>
> Yes. Nothing yet. I should mention I compiled hadoop 2.2 from the src
> using maven on a single machine (mac os x). It seems whatever I do in the
> permissions, the error persists.
>
> --
> Best Regards,
> Karim Ahmed Awara
>
>
> On Wed, Dec 18, 2013 at 2:24 AM, Ted Yu  wrote:
>
>> Have you set umask to 022 ?
>>
>> See https://issues.apache.org/jira/browse/HDFS-2556
>>
>> Cheers
>>
>>
>> On Tue, Dec 17, 2013 at 3:12 PM, Karim Awara wrote:
>>
>>> Hi,
>>>
>>> I am running Junit test on hadoop 2.2.0 on eclipse on mac os x. Whenever
>>> I run the test, I am faced with the following error
>>>
>>> It seems there is a problem with the permission test data dir.   Please
>>> advise.
>>>
>>>
>>> 2013-12-18 02:09:19,326 ERROR hdfs.MiniDFSCluster
>>> (MiniDFSCluster.java:initMiniDFSCluster(647)) - IOE creating namenodes.
>>> Permissions dump:
>>> path 'build/test/data/dfs/data':
>>>
>>> absolute:/Volumes/Me/kepler_workspace/hadoop-hdfs/build/test/data/dfs/data
>>> permissions: 
>>> path 'build/test/data/dfs':
>>> absolute:/Volumes/Me/kepler_workspace/hadoop-hdfs/build/test/data/dfs
>>> permissions: drwx
>>> path 'build/test/data':
>>> absolute:/Volumes/Me/kepler_workspace/hadoop-hdfs/build/test/data
>>> permissions: drwx
>>> path 'build/test':
>>> absolute:/Volumes/Me/kepler_workspace/hadoop-hdfs/build/test
>>> permissions: drwx
>>> path 'build':
>>> absolute:/Volumes/Me/kepler_workspace/hadoop-hdfs/build
>>> permissions: drwx
>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Karim Ahmed Awara
>>>
>>> --
>>> This message and its contents, including attachments are intended solely
>>> for the original recipient. If you are not the intended recipient or have
>>> received this message in error, please notify me immediately and delete
>>> this message from your computer system. Any unauthorized use or
>>> distribution is prohibited. Please consider the environment before printing
>>> this email.
>>
>>
>>
>
> --
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.
>



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: from relational to bigger data

2013-12-20 Thread Andre Kelpe
You could also give cascading lingual a try:
http://www.cascading.org/lingual/ http://docs.cascading.org/lingual/1.0/

We have a connector for oracle (
https://github.com/Cascading/cascading-jdbc#oracle), so you could read the
data from oracle do the processing on a hadoop cluster and write it back
into oracle all via SQL or a combination of SQL and Java/Cascading (
https://github.com/Cascading/cascading-jdbc#in-lingual).

- André




On Thu, Dec 19, 2013 at 9:35 PM, Jay Vee  wrote:

> We have a large relational database ( ~ 500 GB, hundreds of tables ).
>
> We have summary tables that we rebuild from scratch each night that takes
> about 10 hours.
> From these summary tables, we have a web interface that accesses the
> summary tables to build reports.
>
> There is a business reason for doing a complete rebuild of the summary
> tables each night, and using
> views (as in the sense of Oracle views) is not an option at this time.
>
> If I wanted to leverage Big Data technologies to speed up the summary
> table rebuild, what would be the first step into getting all data into some
> big data storage technology?
>
> Ideally in the end, we want to retain the summary tables in a relational
> database and have reporting work the same without modifications.
>
> It's just the crunching of the data and building these relational summary
> tables where we need a significant performance increase.
>
>
>


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Can the file storage in HDFS be customized?

2014-02-25 Thread Andre Kelpe
This might get you further: https://github.com/mraad/Shapefile

- André


On Tue, Feb 25, 2014 at 11:29 AM, Sugandha Naolekar
wrote:

> Hello,
>
> I have a huge shapefile which has some 500 polygon  geometries. Is there a
> way to store this shapefile in such a format in HDFS that each block will
> have 100 polygon geometries. And each block representing a quad core
> machine.
>
> Thus, 5 machines, with 5 blocks, which have in total 500 polygon
> geometries.
>
> Internally, I would like to read each of the block of HDFS in such a way
> where, each polygon geometry is fed to the map() task. THus, 100 map()
> tasks per block per machine.
>
> --
> Thanks & Regards,
> Sugandha Naolekar
>
>
>
>


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: mr1 and mr2

2014-05-14 Thread Andre Kelpe
May I recommend using Cascading instead of using MR directly? Cascading
supports Hadoop 1.x and Hadoop 2.x based distros and you don't have to
wrestle with these things all the time: http://www.cascading.org/ It's OSS,
ASL v2 licensed and all the good stuff.

- André


On Sun, May 11, 2014 at 1:52 AM, Tony Dean  wrote:

>  Hi,
>
>
>
> I am trying to write a Java application that works with either MR1 and
> MR2.  At the present I have MR2 (YARN) implementation deployed and
> running.  I am using mapred API.  I believe that I read mapred and
> mapreduce APIs are compatible so either should work.  The only thing that
> is different is the configuration properties that need to be specified
> depending on whether the back-end is MR1 or MR2. BTW: I’m using CDH 4.6
> (Hadoop 2.0).
>
>
>
> My problem is that I can’t seem to submit a job to the cluster.  It always
> runs locally.  I setup JobConf with appropriate properties and submit the
> jobs using JobClient.  The properties that I set on JobConf are as follows:
>
>
>
> mapreduce.jobtracker.address=host:port (I know this is for MR1, but I’m
> trying everything)
>
> mapreduce.framework.name=yarn
>
> yarn.resourcemanager.address=host:port
>
> yarn.resourcemanager.host=host:port
>
>
>
> The last 2 are the same but I read 2 different ways to set it in different
> conflicting documentations.
>
>
>
> Anyway, can someone explain how to get this seemingly simple deployment to
> work?  What am I missing?
>
>
>
> Thanks!!!
>



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Hadoop virtual machine

2014-07-06 Thread Andre Kelpe
We have a multi-vm or single-vm setup with apache hadoop, if you want to
give that a spin:
https://github.com/Cascading/vagrant-cascading-hadoop-cluster

- André


On Sun, Jul 6, 2014 at 9:05 AM, MrAsanjar .  wrote:

> For my hadoop development and testing I use LXC (linux container) instead
> of VM, mainly due to its light weight resource consumption. As mater of
> fact as I am typing, my ubuntu system is automatically building a 6 nodes
> hadoop cluster on my 16G labtop.
> If you have an Ubuntu system you could install a fully configurable Hadoop
> 2.2.0 single node or multi-node cluster in less then 10 minutes.
> Here what you need to do:
> 1) Install and learn Ubuntu Juju (shouldn't take an hour)- instructions :
> https://juju.ubuntu.com/docs/getting-started.html
> 2) there are two types hadoop charms:
>  a) Single node for hadoop development :
> https://jujucharms.com/?text=hadoop2-devel
>  b) multi-node for testing  testing :
> https://jujucharms.com/?text=hadoop
> Let me know if you need more help
>
>
> On Sun, Jul 6, 2014 at 7:59 AM, Marco Shaw  wrote:
>
>> Note that the CDH link is for Cloudera which only provides Hadoop for
>> Linux.
>>
>> HDP has "pre-built VMs" for both Linux and Windows hosts.
>>
>> You can also search for "HDInsight emulator" which runs on Windows and is
>> based on HDP.
>>
>> Marco
>>
>> On Jul 6, 2014, at 12:38 AM, Gavin Yue  wrote:
>>
>> http://hortonworks.com/products/hortonworks-sandbox/
>>
>> or
>>
>> CDH5
>>
>> http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
>>
>>
>>
>> On Sat, Jul 5, 2014 at 11:27 PM, Manar Elkady 
>> wrote:
>>
>>> Hi,
>>>
>>> I am a newcomer in using Hadoop, and I read many online tutorial to set
>>> up Hadoop on Window by using virtual machines, but all of them link to old
>>> versions of Hadoop virtual machines.
>>> Could any one help me to find a Hadoop virtual machine, which include a
>>> newer version of hadoop? Or should I do it myself from scratch?
>>> Also, any well explained Hadoop installing tutorial and any other
>>> helpful material are appreciated.
>>>
>>>
>>> Manar,
>>>
>>>
>>> --
>>>
>>>
>>
>


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-13 Thread Andre Kelpe
Why don't you just use the apache tarball? We even have that automated, if
vagrant is your thing:
https://github.com/Cascading/vagrant-cascading-hadoop-cluster

- André


On Tue, Aug 12, 2014 at 10:12 PM, mani kandan  wrote:

> Which distribution are you people using? Cloudera vs Hortonworks vs
> Biginsights?
>



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Hadoop 2.4.1 Snappy Smoke Test failed

2014-08-19 Thread Andre Kelpe
Could this be caused by the fact that hadoop no longer ships with 64bit
libs? https://issues.apache.org/jira/browse/HADOOP-9911

- André


On Tue, Aug 19, 2014 at 5:40 PM, arthur.hk.c...@gmail.com <
arthur.hk.c...@gmail.com> wrote:

> Hi,
>
> I am trying Snappy in Hadoop 2.4.1, here are my steps:
>
> (CentOS 64-bit)
> 1)
> yum install snappy snappy-devel
>
> 2)
> added the following
> (core-site.xml)
>
> io.compression.codecs
>
> org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec
>
>
> 3)
> mapred-site.xml
>
> mapreduce.admin.map.child.java.opts
> -server -XX:NewRatio=8
> -Djava.library.path=/usr/lib/hadoop/lib/native/
> -Djava.net.preferIPv4Stack=true
> true
>
>
> mapreduce.admin.reduce.child.java.opts
> -server -XX:NewRatio=8
> -Djava.library.path=/usr/lib/hadoop/lib/native/
> -Djava.net.preferIPv4Stack=true
> true
>
>
> 4) smoke test
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar  teragen
> 10 /tmp/teragenout
>
> I got the following warning, actually there is no any test file created in
> hdfs:
>
> 14/08/19 22:50:10 WARN mapred.YARNRunner: Usage of -Djava.library.path in
> mapreduce.admin.map.child.java.opts can cause programs to no longer
> function if hadoop native libraries are used. These values should be set as
> part of the LD_LIBRARY_PATH in the map JVM env using
> mapreduce.admin.user.env config settings.
> 14/08/19 22:50:10 WARN mapred.YARNRunner: Usage of -Djava.library.path in
> mapreduce.admin.reduce.child.java.opts can cause programs to no longer
> function if hadoop native libraries are used. These values should be set as
> part of the LD_LIBRARY_PATH in the reduce JVM env using
> mapreduce.admin.user.env config settings.
>
> Can anyone please advise how to install and enable SNAPPY in Hadoop 2.4.1?
> or what would be wrong? or is my new change in mapred-site.xml incorrect?
>
> Regards
> Arthur
>
>
>
>
>
>


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


FSDownload, LocalFileSystem and DistributedCache permissions

2014-08-20 Thread Andre Kelpe
Hi,

I am trying to use the DistributedCache and I am running into problems in a
test, when using the LocalFileSystem. FSDownload complains about
permissions like so. This is hadoop 2.4.1 with JDK 6 on Linux.:

Caused by: java.io.IOException: Resource file:/path/to/some/file is not
publicly accessable and as such cannot be part of the public cache.
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:257)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:60)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:355)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:353)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:352)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)

I looked into the code and saw that there is an exception being made for
Windows and the LocalFileSystem:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java#L145

I am wondering if this could be elevated to allow it in every case, where
the LocalFileSystem is used and if not, what sort of permissions I have to
set in my test, to make this work.

Thanks!

- André

-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: MiniMRClientCluster and JAVA_HOME

2014-08-21 Thread Andre Kelpe
On Wed, Aug 20, 2014 at 11:54 PM, Ken Krugler 
wrote:

>
>
> PS - And why, oh why is "target" hard-coded all over the place in the
> mini-cluster code as the directory (from CWD) for logs, data blocks, etc?
>
>
https://issues.apache.org/jira/browse/YARN-1442

- André


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: hadoop cluster crash problem

2014-09-17 Thread Andre Kelpe
virtualbox is known for causing instabilities in the host-kernel (or at
least, it used to). You might be better off asking for support there:
https://www.virtualbox.org/wiki/Bugtracker

- André

On Wed, Sep 17, 2014 at 4:25 AM, Li Li  wrote:

> hi all,
> I know it's not a problem related to hadoop but administrator can
> not find any clues.
> I have a machine with 24 core and 64GB memory with ubuntu 12.04
> LTS. we use virtual box to create 4 virtual machine. Each vm has 10GB
> memory and 6 core.
> I have setup a small hadoop 1.2.1 cluster with one
> jobtracker/namenode and 3 tasktracker/datanode. Each tasktrack has 4
> mapper slots and 4 reducers slot.
> But it always crashs(the host machine crash, not vm crash).
> Sometimes it crashes for the first map-reduce job. Sometimes it can
> run a few jobs.
> is there any clues? I have checked the sys log and can find any
> thing useful. Using monitor system, The cpu and io is not abnormal.
> The only abnormal phenomenon is context switch is high. about 40k.
>



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: datanodes not connecting

2014-11-24 Thread Andre Kelpe
Did you format your namenode before starting HDFS?

- André

On Sun, Nov 23, 2014 at 7:24 PM, Tim Dunphy  wrote:

> Hey all,
>
>  OK thanks for your advice on setting up a hadoop test environment to get
> started in learning how to use hadoop! I'm very excited to be able to start
> to take this plunge!
>
> Although rather than using BigTop or Cloudera, I just decided to go for a
> straight apache hadoop install. I setup 3 t2micro instances on EC2 for my
> training purposes. And that seemed to go alright! As far as installing
> hadoop and starting the services goes.
>
> I went so far as to setup the ssh access that the nodes will need. And the
> services seem to start without issue:
>
> bash-4.2$ whoami
> hadoop
>
> bash-4.2$ start-dfs.sh
>
> Starting namenodes on [hadoop1.mydomain.com]
>
> hadoop1.mydomain.com: starting namenode, logging to
> /home/hadoop/logs/hadoop-hadoop-namenode-hadoop1.out
>
> hadoop2.mydomain.com: starting datanode, logging to
> /home/hadoop/logs/hadoop-hadoop-datanode-hadoop2.out
>
> hadoop3.mydomain.com: starting datanode, logging to
> /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.out
>
> Starting secondary namenodes [0.0.0.0]
>
> 0.0.0.0: starting secondarynamenode, logging to
> /home/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop1.out
>
> bash-4.2$ start-yarn.sh
>
> starting yarn daemons
>
> starting resourcemanager, logging to
> /home/hadoop/logs/yarn-hadoop-resourcemanager-hadoop1.out
>
> hadoop2.mydomain.com: starting nodemanager, logging to
> /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop2.out
>
> hadoop3.mydomain.com: starting nodemanager, logging to
> /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop3.out
>
> And I opened up these ports on the security groups for the two data nodes:
>
> [root@hadoop2:~] #netstat -tulpn | grep -i listen | grep java
>
> tcp0  0 0.0.0.0:*50010*   0.0.0.0:*
> LISTEN  21405/java
>
> tcp0  0 0.0.0.0:*50075*   0.0.0.0:*
> LISTEN  21405/java
>
> tcp0  0 0.0.0.0:*50020*   0.0.0.0:*
> LISTEN  21405/java
> But when I go to the hadoop web interface at:
>
> http://hadoop1.mydomain.com:50070 
>
> And click on the data node tab, I see no nodes are connected!
>
> I see that the hosts are listening on all interfaces.
>
> I also put all hosts into the /etc/hosts file on the master node.
>
> Using the first data node as an example I can telnet into each port on
> both datanodes from the master node:
>
> bash-4.2$ telnet hadoop2.mydomain.com *50010*
>
> Trying 172.31.63.42...
>
> Connected to hadoop2.mydomain.com.
>
> Escape character is '^]'.
>
> ^]
>
> telnet> quit
>
> Connection closed.
>
> bash-4.2$ telnet hadoop2.mydomain.com *50075*
>
> Trying 172.31.63.42...
>
> Connected to hadoop2.mydomain.com.
>
> Escape character is '^]'.
>
> ^]
>
> telnet> quit
>
> Connection closed.
>
> bash-4.2$ telnet hadoop2.mydomain.com *50020*
>
> Trying 172.31.63.42...
>
> Connected to hadoop2.mydomain.com.
>
> Escape character is '^]'.
>
> ^]
>
> telnet> quit
>
> Connection closed.
>
> So apparently I've hit my first snag in setting up a hadoop cluster. Can
> anyone give me some tips as to how I can get the data nodes to show as
> connected to the master?
>
>
> Thanks
>
> Tim
>
>
>
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Split files into 80% and 20% for building model and prediction

2014-12-12 Thread Andre Kelpe
Try Cascading multitool: http://docs.cascading.org/multitool/2.6/

- André

On Fri, Dec 12, 2014 at 10:30 AM, unmesha sreeveni 
wrote:

> I am trying to divide my HDFS file into 2 parts/files
> 80% and 20% for classification algorithm(80% for modelling and 20% for
> prediction)
> Please provide suggestion for the same.
> To take 80% and 20% to 2 seperate files we need to know the exact number
> of record in the data set
> And it is only known if we go through the data set once.
> so we need to write 1 MapReduce Job for just counting the number of
> records and
> 2 nd Mapreduce Job for separating 80% and 20% into 2 files using Multiple
> Inputs.
>
>
> Am I in the right track or there is any alternative for the same.
> But again a small confusion how to check if the reducer get filled with
> 80% data.
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Any working VM of Apache Hadoop ?

2015-01-18 Thread Andre Kelpe
Try our vagrant setup:
https://github.com/Cascading/vagrant-cascading-hadoop-cluster

- André

On Sat, Jan 17, 2015 at 10:07 PM, Krish Donald  wrote:

> Hi,
>
> I am looking for working VM of Apache Hadoop.
> Not looking for cloudera or Horton works VMs.
> If anybody has it and if they can share that would be great .
>
> Thanks
> Krish
>



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Simple example for Cluster configuration of Hadoop 2.6.0

2015-02-04 Thread Andre Kelpe
See here:
https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

- André

On Wed, Feb 4, 2015 at 2:47 PM, Fernando Carvalho <
fernandocarvalhocoe...@gmail.com> wrote:

> Dear Hadoop users,
>
> I would like to know if some have an simple example of how to setup Hadoop
> 2.6.0 in Cluster mode.
> I'm googling for this, but I'm only able to find examples for earlier
> versions of Hadoop which are not fully compatible with the current.
> Can someone help me?
>
> --
> Fernando
>
>


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Hadoop svn cannot connected.

2015-02-06 Thread Andre Kelpe
Hadoop has moved to git: https://wiki.apache.org/hadoop/GitAndHadoop

-- André

On Fri, Feb 6, 2015 at 9:13 AM, Azuryy Yu  wrote:

> Hi,
>
> http://svn.apache.org/viewcvs.cgi/hadoop/common/trunk/
>
> I cannot open this URL. does that anybody can access it?
>
> another, I cannot "svn up" the new release in branch-2. It always stay in
> Aug 2014.
>
>
>
>


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Hadoop imports Fail

2015-04-14 Thread Andre Kelpe
Please just use a build tool like maven or gradle for your build.
There is no way to manage your classpath like this and stay sane.
Nobody does this and you shouldn't either.

- André

On Tue, Apr 14, 2015 at 7:34 AM, Anand Murali  wrote:
> Dear Naik:
>
> I have already set path both for Hadoop and Java in .hadoop and run it prior
> to compilation
>
> Export JAVA_HOME=/home/anand_vihar/jdk1.7.0_75
> Export HADOOP_INSTALL=/home/anand_vihar/hadoop-2.6.0
> PATH=:$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/share:$JAVA_HOME
>
>
> So, is there not a way that the compiler goes thru all folders/jar and pick
> up the right class and copile with just
>
> $javac *.java.
>
>
> If not is there a way to do so, else every time compilation needs to be done
> then a complete path list has to be typed and who knows which class is
> located in which jar which is needed for compilation, which when not found
> gives a compile error which becomes difficult to debug.
>
> Thanks
>
> Regards
>
> Anand Murali
> 11/7, 'Anand Vihar', Kandasamy St, Mylapore
> Chennai - 600 004, India
> Ph: (044)- 28474593/ 43526162 (voicemail)
>
>
>
> On Monday, April 13, 2015 9:58 PM, Ravindra Kumar Naik
>  wrote:
>
>
> You need to set classpath to hadoop jar
>
> java -classpath " MaxTemperatureMapper.java
>
>
>
> On Mon, Apr 13, 2015 at 11:33 AM, Anand Murali 
> wrote:
>
> Dear Mr. Ted:
>
> Please find details
>
> Last login: Sun Apr 12 16:33:18 2015 from localhost
> anand_vihar@Latitude-E5540:~$ cd hadoop-2.6.0
> anand_vihar@Latitude-E5540:~/hadoop-2.6.0$ . .hadoop
> anand_vihar@Latitude-E5540:~/hadoop-2.6.0$ cd etc/hadoop
> anand_vihar@Latitude-E5540:~/hadoop-2.6.0/etc/hadoop$ sh hadoop-env.sh
> anand_vihar@Latitude-E5540:~/hadoop-2.6.0/etc/hadoop$ java -version
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> anand_vihar@Latitude-E5540:~/hadoop-2.6.0/etc/hadoop$ hadoop version
> Hadoop 2.6.0
> Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
> e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
> Compiled by jenkins on 2014-11-13T21:10Z
> Compiled with protoc 2.5.0
> From source with checksum 18e43357c8f927c0695f1e9522859d6a
> This command was run using
> /home/anand_vihar/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar
> anand_vihar@Latitude-E5540:~/hadoop-2.6.0/etc/hadoop$ cd ..
> anand_vihar@Latitude-E5540:~/hadoop-2.6.0/etc$ cd ..
> anand_vihar@Latitude-E5540:~/hadoop-2.6.0$ start-dfs.sh --config
> /home/anand_vihar/hadoop-2.6.0/conf
>
> Error Messages
>
>
> $javac MaxTemperatureMapper.java
>
> symbol: class Mapper
> MaxTemperatureMapper.java:8: error: cannot find symbol
> public class MaxTemperatureMapper extends Mapper IntWritable>
>  ^
>   symbol: class LongWritable
> MaxTemperatureMapper.java:8: error: cannot find symbol
> public class MaxTemperatureMapper extends Mapper IntWritable>
>^
>   symbol: class Text
> MaxTemperatureMapper.java:8: error: cannot find symbol
> public class MaxTemperatureMapper extends Mapper IntWritable>
>  ^
>   symbol: class Text
> MaxTemperatureMapper.java:8: error: cannot find symbol
> public class MaxTemperatureMapper extends Mapper IntWritable>
>^
>   symbol: class IntWritable
> MaxTemperatureMapper.java:15: error: cannot find symbol
> public void map(LongWritable key, Text value, Context context)
> ^
>   symbol:   class LongWritable
>   location: class MaxTemperatureMapper
> MaxTemperatureMapper.java:15: error: cannot find symbol
> public void map(LongWritable key, Text value, Context context)
>   ^
>   symbol:   class Text
>   location: class MaxTemperatureMapper
> MaxTemperatureMapper.java:15: error: cannot find symbol
> public void map(LongWritable key, Text value, Context context)
>   ^
>   symbol:   class Context
>   location: class MaxTemperatureMapper
> MaxTemperatureMapper.java:25: error: cannot find symbol
> airtemperature = Integer.parseInt(line.substring(87,92));
> ^
>   symbol:   variable airtemperature
>   location: class MaxTemperatureMapper
> MaxTemperatureMapper.java:28: error: cannot find symbol
> if (airTemparature != MISSING && quality.matches("[01459]"))
> ^
>   symbol:   variable airTemparature
>   location: class MaxTemperatureMapper
> MaxTemperatureMapper.java:30: error: cannot find symbol
> context.write(new Text(year), new IntWritable(airTemperature));
>   ^
>   symbol:   class Text
>   location: class MaxTemperatureMapper
> MaxTemperatureMapper.java:30: error: cannot find symbol
> context.writ

Re: Will Hadoop 2.6.1 be released soon?

2015-04-23 Thread Andre Kelpe
Are you serious?

"This release is *not* yet ready for production use. Critical issues
   are being ironed out via testing and downstream adoption. Production
users
   should wait for a *2.7.1/2.7.2* release."

- André

On Thu, Apr 23, 2015 at 3:13 PM, Ted Yu  wrote:

> Can you use 2.7.0 ?
>
>
> http://search-hadoop.com/m/LgpTk2Kk956/Vinod+hadoop+2.7.0&subj=+ANNOUNCE+Apache+Hadoop+2+7+0+Release
>
> Cheers
>
> On Apr 23, 2015, at 3:21 AM, Казаков Сергей Сергеевич <
> skaza...@skbkontur.ru> wrote:
>
>  Hi!
>
>
>
> We see some serious issues in HDFS of 2.6.0, which were, according to
> JIRA, fixed in 2.6.1. Any plans to release this patch or if it already was,
> where can we download it?
>
>
>
> Kind regards,
>
> Sergey Kazakov
>
>
>
>


-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: How to display all Hadoop failed jobs logs files directly on console

2015-10-08 Thread Andre Kelpe
yarn logs -applicationId application_1434970319691_0135 should do the
trick. Note that you have to enable log aggregation in yarn to make that
work.


- André

On Thu, Oct 8, 2015 at 9:39 AM, Dhanashri Desai 
wrote:

> Everytime when I run a job on hadoop cluster, I have to see failed job log
> files on cluster by putting command like: hdfs dfs -cat
> /var/log/hadoop-yarn/apps/hduser/logs/application_1434970319691_0135/UbuntuD7.abc.net_42182.
> Is there any way to make all hadoop failed job log messages files to print
> directly on console as soon as job fails??
>
>
> Thanks,
> Dhanashri
>



-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com