from:"Alexander Alten\-Lorenz"

RE: Windows and Linux hadoop cluster

2016-07-20 Thread Alexander Alten-Lorenz

Security:
Windows doesn’t have a working OpenSSL implementation
Malware, Virus and other typical windows based threads
Disk level permission or encryption 

Performance:
Different thread handling per OS
Yarn implementation differs, can cause negative performance 
Windows CPU / Core scaling isn’t the same as Linux 

I would not go with a mixed environment in production, and I see no sense 
behind. Stable solutions are often use CentOS, since the TCO is much smaller 
than in Windows environments. If you’re a Windows shop, go with Azure.

Cheers,
 --alex


From: Prachi Sharma

RE: Windows and Linux hadoop cluster

2016-07-20 Thread Alexander Alten-Lorenz

Hi,

That should be possible, but will have performance impacts / additional 
configurations and potential misbehavior. But in general, it should work for 
Yarn, but not for MRv1.
https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/SecureContainer.html

cheers, 
 --alex

--
b: mapredit.blogspot.com 

From: Prachi Sharma

Re: JobHistory location/configuration

2016-01-20 Thread Alexander Alten-Lorenz

Robert,

you should also take a look at:

mapreduce.jobhistory.max-age-ms

configures how long the files are held (default one week). And you’re right, 
just wait the configured time to see the logs.


> On Jan 20, 2016, at 11:21 AM, Robert Schmidtke  wrote:
> 
> Nvm, it would seem mapreduce.jobhistory.move.interval-ms specifies exactly 
> that.
> 
> On Wed, Jan 20, 2016 at 10:56 AM, Robert Schmidtke  > wrote:
> Okay so I assumed I could specify paths on my local filesystem, which is not 
> the case. After doing a hadoop fs -ls -R on my HDFS I found the done and 
> done_intermediate folders. However only the intermediate folder contains 
> files (even after the job finished). Is there any amount of time I have to 
> wait before they're moved to the done directory? Thanks!
> 
> Robert
> 
> On Tue, Jan 19, 2016 at 4:21 PM, Robert Schmidtke  > wrote:
> Hi everyone,
> 
> I'm running Hadoop 2.7.1 and I'd like to be able to examine the details of my 
> jobs after they're finished. My setup does not permit me to view the history 
> server's UI, so I'm bound to the CLI (bin/hadoop job -history all ). I 
> have configured:
> 
> mapreduce.jobhistory.address,
> mapreduce.jobhistory.webapp.address,
> mapreduce.jobhistory.intermediate-done-dir, and
> mapreduce.jobhistory.done-dir,
> 
> but there are no history files to be found. Am I missing something here? How 
> can I get a hold of the files, or make MR produce them at all? Thanks a bunch 
> in advance!
> 
> Robert
> 
> 
> 
> -- 
> My GPG Key ID: 336E2680
> 
> 
> 
> -- 
> My GPG Key ID: 336E2680
> 
> 
> 
> -- 
> My GPG Key ID: 336E2680

Re: ANSWER PLEASE

2015-10-28 Thread Alexander Alten-Lorenz

LOL - thats gorgeous! Well spoken, Kai :) 

> On Oct 28, 2015, at 12:50 PM, Kai Voigt  wrote:
> 
> No, the correct answer is „Don’t cheat on a Cloudera exam“ :-) This has been 
> reported to certificat...@cloudera.com 
> 
> Looks like you won’t get that certificate...
> 
>> Am 28.10.2015 um 11:46 schrieb t...@bentzn.com :
>> 
>> The correct answer would be:
>> 
>> do your own homework :-D
>> 
>> 
>> 
>> 
>> -Original Besked-
>> Fra: "Sajid Mohammed" > >
>> Til: user@hadoop.apache.org 
>> Dato: 28-10-2015 11:32
>> Emne: ANSWER PLEASE
>> 
>> You have a cluster running with the fair Scheduler enabled. There are 
>> currently no jobs running on the cluster, and you submit a job A, so that 
>> only job A is running on the cluster. A while later, you submit Job B. now 
>> Job A and Job B are running on the cluster at the same time. How will the 
>> Fair Scheduler handle these two jobs? (Choose 2)
>> 
>> 
>> 
>> A. When Job B gets submitted, it will get assigned tasks, while job A 
>> continues to run with fewer tasks.
>> 
>> B. When Job B gets submitted, Job A has to finish first, before job B can 
>> gets scheduled.
>> 
>> C. When Job A gets submitted, it doesn't consumes all the task slots.
>> 
>> D. When Job A gets submitted, it consumes all the task slots.
>> 
> 
> Kai Voigt Am Germaniahafen 1  
> k...@123.org 
>   24143 Kiel  
> +49 160 96683050
>   Germany 
> @KaiVoigt
>

Re: Teragen-Terasort=10GB fails

2015-05-27 Thread Alexander Alten-Lorenz

FSError: java.io.IOException: No space left on device

check the local disks for enough free tmp space in /hadoop/

best,

--
Alexander Alten-Lorenz
m: wget.n...@gmail.com
b: mapredit.blogspot.com

 On 27 May 2015, at 6:29 am, Pratik Gadiya pratik_gad...@persistent.com 
 wrote:
 
  
 Hi All,
  
 When I run teragen-terasort test on my hadoop deployed cluster, I get 
 following error
  
 15/05/27 06:24:36 INFO mapreduce.Job: map 57% reduce 18%
 15/05/27 06:24:39 INFO mapreduce.Job: Task Id : 
 attempt_1432720271082_0005_r_00_0, Status : FAILED
 Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in InMemoryMerger - Thread to merge in-memory shuffled map-outputs
 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not 
 find any valid local directory for 
 output/attempt_1432720271082_0005_r_00_0/map_38.out
 at 
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
 at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
 at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
 at 
 org.apache.hadoop.mapred.YarnOutputFiles.getInputFileForWrite(YarnOutputFiles.java:213)
 at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$InMemoryMerger.merge(MergeManagerImpl.java:457)
 at 
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
 
 15/05/27 06:24:40 INFO mapreduce.Job: map 57% reduce 0%
 15/05/27 06:24:46 INFO mapreduce.Job: Task Id : 
 attempt_1432720271082_0005_m_41_0, Status : FAILED
 FSError: java.io.IOException: No space left on device
 15/05/27 06:24:48 INFO mapreduce.Job: Task Id : 
 attempt_1432720271082_0005_m_46_0, Status : FAILED
 FSError: java.io.IOException: No space left on device
 15/05/27 06:24:49 INFO mapreduce.Job: Task Id : 
 attempt_1432720271082_0005_m_44_0, Status : FAILED
 Error: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
 any valid local directory for 
 attempt_1432720271082_0005_m_44_0_spill_0.out
 at 
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
 at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
 at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
 at 
 org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1584)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1482)
 at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:720)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:790)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 
 15/05/27 06:24:50 INFO mapreduce.Job: Task Id : 
 attempt_1432720271082_0005_m_45_0, Status : FAILED
 Error: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
 any valid local directory for 
 attempt_1432720271082_0005_m_45_0_spill_0.out
 at 
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
 at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
 at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
 at 
 org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1584)
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1482)
 at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:720)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:790)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs

Re: Question about Block size configuration

2015-05-11 Thread Alexander Alten-Lorenz

If you set the BS lesser then 64MB, you’ll get into Namenode issues  when a 
larger file will be read by a client. The client will ask for every block the 
NN - imagine what happen when you want to read a 1TB file.
The optimal BS size is 128MB. You have to have in mind, that every block will 
be replicated (typically 3 times). And since Hadoop is made to store large 
files in a JBOD (just a bunch of disks) configuration, a BS lesser than 64MB 
would also overwhelmed the physical disks. 

BR,
 Alex


 On 12 May 2015, at 07:47, Krishna Kishore Bonagiri write2kish...@gmail.com 
 wrote:
 
 The default HDFS block size 64 MB means, it is the maximum size of block of 
 data written on HDFS. So, if you write 4 MB files, they will still be 
 occupying only 1 block of 4 MB size, not more than that. If your file is more 
 than 64MB, it gets split into multiple blocks.
 
 If you set the HDFS block size to 2MB, then your 4 MB file will get split 
 into two blocks.
 
 On Tue, May 12, 2015 at 8:38 AM, Himawan Mahardianto mahardia...@ugm.ac.id 
 mailto:mahardia...@ugm.ac.id wrote:
 Hi guys, I have a couple question about HDFS block size:
 
 What if I set my HDFS block size from default 64 MB to 2 MB each block, what 
 will gonna happen?
 
 I decrease the value of a block size because I want to store an image file 
 (jpeg, png etc) that have size about 4MB each file, what is your opinion or 
 suggestion?
 
 What will gonna happen if i don't change the default size of a block size, 
 then I store an image file with 4MB size, will Hadoop use full 64MB block, or 
 it will create 4Mb block instead 64MB?
 
 How much memory used on RAM to store each block if my block size is 64MB, or 
 my block size is 4MB?
 
 Is there anyone have experience with this? Any suggestion are welcome
 Thank you

Re: Access on HDFS

2015-05-08 Thread Alexander Alten-Lorenz

Per default only hdfs (superuser) has permissions to write into hdfs root (/). 
Just sudo into hdfs, create the root-folders, change the permissions to your 
user and start again.

BR,

--
Alexander Alten-Lorenz
m: wget.n...@gmail.com
b: mapredit.blogspot.com

 On May 8, 2015, at 6:26 PM, Pratik Gadiya pratik_gad...@persistent.com 
 wrote:
 
 @Matt:
 [root@vmkdev0022 ~]# hadoop fs -ls -d /
 drwxr-xr-x   - hdfs hdfs  0 2015-05-08 10:44 /
  
 @Olivier:
 [admin@vmkdev0022 root]$ hadoop fs -mkdir /tptp
 mkdir: Permission denied: user=admin, access=WRITE, 
 inode=/:hdfs:hdfs:drwxr-xr-x
  
 [admin@vmkdev0022 root]$ groups admin
 admin : admin hdfs
  
 [admin@vmkdev0022 root]$ id admin
 uid=500(admin) gid=500(admin) groups=500(admin),498(hdfs)
  
  
 Please suggest what should be the change for the permissions on / or if there 
 is any other way to resolve this.
  
 Thanks,
 Pratik
 From: Matt Narrell [mailto:matt.narr...@gmail.com] 
 Sent: Friday, May 08, 2015 9:37 PM
 To: user@hadoop.apache.org
 Subject: Re: Access on HDFS
  
 It looks like the group permissions on / is set to prohibit writes.
  
 mn
  
 On May 8, 2015, at 9:59 AM, Olivier Renault orena...@hortonworks.com 
 mailto:orena...@hortonworks.com wrote:
  
 I meant did you check that user admin is part of the hdfs group on the 
 namenode. 
  
 Olivier
  
  
 From: Pratik Gadiya
 Reply-To: user@hadoop.apache.org mailto:user@hadoop.apache.org
 Date: Friday, 8 May 2015 16:57
 To: user@hadoop.apache.org mailto:user@hadoop.apache.org
 Subject: RE: Access on HDFS
  
 Confirmed, that I have checked it on Namenode.
 In addition to that, I have checked on all of the nodes in observed that 
 superusergroup is set as “hdfs” in hdfs-site.xml for all nodes.
  
 Thanks,
 Pratik Gadiya
  
 From: Olivier Renault [mailto:orena...@hortonworks.com 
 mailto:orena...@hortonworks.com] 
 Sent: Friday, May 08, 2015 8:59 PM
 To: user@hadoop.apache.org mailto:user@hadoop.apache.org
 Subject: Re: Access on HDFS
  
 Could you confirm that you are doing it on the namenode ? ( I tend to do it 
 on all nodes )
  
 Thanks,
 Olivier
  
  
 From: Pratik Gadiya
 Reply-To: user@hadoop.apache.org mailto:user@hadoop.apache.org
 Date: Friday, 8 May 2015 16:14
 To: user@hadoop.apache.org mailto:user@hadoop.apache.org
 Subject: RE: Access on HDFS
  
 Checked it’s hdfs
  
 property
   namedfs.permissions.superusergroup/name
   valuehdfs/value
 /property
  
 Thanks,
 Pratik
  
 From: Olivier Renault [mailto:orena...@hortonworks.com 
 mailto:orena...@hortonworks.com] 
 Sent: Friday, May 08, 2015 8:39 PM
 To: user@hadoop.apache.org mailto:user@hadoop.apache.org
 Subject: Re: Access on HDFS
  
 You need to double check which group is setup in hdfs-site.xml. The parameter 
 is: dfs.permissions.superusergroup
  
 Thanks
 Olivier
  
 From: Pratik Gadiya
 Reply-To: user@hadoop.apache.org mailto:user@hadoop.apache.org
 Date: Friday, 8 May 2015 15:46
 To: user@hadoop.apache.org mailto:user@hadoop.apache.org
 Subject: Access on HDFS
  
 Hi All,
  
 I want to add a user who can create/remove and perform all the operations on 
 hdfs.
  
 Here is what I tried:
  
 1.   Create user: adduser admin
 2.   /usr/sbin/usermod -a -G hdfs admin
  
 And when I try to create a directory on hdfs under “/” it raises error as 
 follows,
 [root@vmkdev0021 ~]# id admin
 uid=500(admin) gid=500(admin) groups=500(admin),498(hdfs)
  
 [root@vmkdev0021 ~]# su admin
 [admin@vmkdev0021 root]$ hadoop  fs -mkdir /input
 mkdir: Permission denied: user=admin, access=WRITE, 
 inode=/:hdfs:hdfs:drwxr-xr-x
  
 Please let me know how can I overcome this, this is reallly bugging me.
  
 With Regards,
 Pratik Gadiya
  
 DISCLAIMER == This e-mail may contain privileged and confidential 
 information which is the property of Persistent Systems Ltd. It is intended 
 only for the use of the individual or entity to which it is addressed. If you 
 are not the intended recipient, you are not authorized to read, retain, copy, 
 print, distribute or use this message. If you have received this 
 communication in error, please notify the sender and delete all copies of 
 this message. Persistent Systems Ltd. does not accept any liability for virus 
 infected mails.
 DISCLAIMER == This e-mail may contain privileged and confidential 
 information which is the property of Persistent Systems Ltd. It is intended 
 only for the use of the individual or entity to which it is addressed. If you 
 are not the intended recipient, you are not authorized to read, retain, copy, 
 print, distribute or use this message. If you have received this 
 communication in error, please notify the sender and delete all copies of 
 this message. Persistent Systems Ltd. does not accept any liability for virus 
 infected mails. 
 DISCLAIMER == This e-mail may contain privileged and confidential 
 information which is the property of Persistent Systems Ltd. It is intended 
 only for the use of the individual or entity

Re: Connect c language with HDFS

2015-05-04 Thread Alexander Alten-Lorenz

Google:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/LibHdfs.html

--
Alexander Alten-Lorenz
m: wget.n...@gmail.com
b: mapredit.blogspot.com

 On May 4, 2015, at 10:57 AM, unmesha sreeveni unmeshab...@gmail.com wrote:
 
 Hi 
   Can we connect c with HDFS using cloudera hadoop distribution.
 
 -- 
 Thanks  Regards
 
 Unmesha Sreeveni U.B
 Hadoop, Bigdata Developer
 Centre for Cyber Security | Amrita Vishwa Vidyapeetham
 http://www.unmeshasreeveni.blogspot.in/ 
 http://www.unmeshasreeveni.blogspot.in/

Re: Connect c language with HDFS

2015-05-04 Thread Alexander Alten-Lorenz

That depends on the installation source (rpm, tgz or parcels). Usually, when 
you use parcels, libhdfs.so* should be within /opt/cloudera/parcels/CDH/lib64/ 
(or similar). Or just use linux' locate (locate libhdfs.so*) to find the 
library.




--
Alexander Alten-Lorenz
m: wget.n...@gmail.com
b: mapredit.blogspot.com

 On May 4, 2015, at 11:39 AM, unmesha sreeveni unmeshab...@gmail.com wrote:
 
 thanks alex
   I have gone through the same. but once I checked my cloudera distribution I 
 am not able to get those folders ..Thats y I posted here. I dont know if I 
 made any mistake.
 
 On Mon, May 4, 2015 at 2:40 PM, Alexander Alten-Lorenz wget.n...@gmail.com 
 mailto:wget.n...@gmail.com wrote:
 Google:
 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/LibHdfs.html
  
 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/LibHdfs.html
 
 --
 Alexander Alten-Lorenz
 m: wget.n...@gmail.com mailto:wget.n...@gmail.com
 b: mapredit.blogspot.com http://mapredit.blogspot.com/
 
 On May 4, 2015, at 10:57 AM, unmesha sreeveni unmeshab...@gmail.com 
 mailto:unmeshab...@gmail.com wrote:
 
 Hi 
   Can we connect c with HDFS using cloudera hadoop distribution.
 
 -- 
 Thanks  Regards
 
 Unmesha Sreeveni U.B
 Hadoop, Bigdata Developer
 Centre for Cyber Security | Amrita Vishwa Vidyapeetham
 http://www.unmeshasreeveni.blogspot.in/ 
 http://www.unmeshasreeveni.blogspot.in/
 
 
 
 
 
 
 -- 
 Thanks  Regards
 
 Unmesha Sreeveni U.B
 Hadoop, Bigdata Developer
 Centre for Cyber Security | Amrita Vishwa Vidyapeetham
 http://www.unmeshasreeveni.blogspot.in/ 
 http://www.unmeshasreeveni.blogspot.in/

Re: YARN Exceptions

2015-04-26 Thread Alexander Alten-Lorenz

Please have a closer look at the quoted error - the user (User edhdtaesvc not 
found) doesnt exists in your hadoop installation, which let the MR job fail.

BR,
 AL

--
Alexander Alten-Lorenz
m: wget.n...@gmail.com
b: mapredit.blogspot.com

 On Apr 25, 2015, at 4:40 PM, Kumar Jayapal kjayapa...@gmail.com wrote:
 
 15/04/25 13:37:40 INFO mapreduce.Job: Job job_1429968417065_0004 failed with 
 state FAILED due to: Application application_1429968417065_0004 failed 2 
 times due to AM Container for appattempt_1429968417065_0004_02 exited 
 with  exitCode: -1000 due to: Application application_1429968417065_0004 
 initialization failed (exitCode=255) with output: User edhdtaesvc not found

Re: ResourceLocalizationService: Localizer failed when running pi example

2015-04-19 Thread Alexander Alten-Lorenz

As you said, that looks like a config issue. I would spot on the NM's local 
scratch dir (yarn.nodemanager.local-dirs). 

But without a complete stack trace, its a blind call.

BR,
 AL

--
mapredit.blogspot.com

 On Apr 18, 2015, at 6:24 PM, Fernando O. fot...@gmail.com wrote:
 
 Hey All,
 It's me again with another noob question: I deployed a cluster (HA mode) 
 everything looked good but when I tried to run the pi example:
 
  bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar 
 pi 16 100
 
 the same error occurs if I try to generate data with teragen 1 
 /test/data
 
 
 2015-04-18 15:49:04,090 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localizer failed
 java.lang.NullPointerException
   at 
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:268)
   at 
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)
   at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
   at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
   at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
   at 
 org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:420)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1075)
 
 
 I'm guessing it's a configuration issue but I don't know what am I missing :S

Re: Data doesn't write in HDFS

2015-03-27 Thread Alexander Alten-Lorenz

Hi 

Have a closer look at:

java.io.IOException: File 
/testing/syncfusion/C#/JMS_message.1427429746967.log.tmp could only be 
replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) 
running and 1 node(s) are excluded in this operation.

BR,
 AL


 On 27 Mar 2015, at 05:48, Ramesh Rocky rmshkumar...@outlook.com wrote:
 
 Hi,
 
 I try to write the data in hdfs using flume on windows machine. Here I 
 configure flume and hadoop on same machine and write data into hdfs its works 
 perfectly.
 
 But configure hadoop and flume on different machines (both are windows 
 machines). I try to write data in hdfs it shows the following error.
 
 15/03/27 09:46:35 WARN security.UserGroupInformation: No groups available for 
 user SYSTEM
 15/03/27 09:46:35 WARN security.UserGroupInformation: No groups available for 
 user SYSTEM
 15/03/27 09:46:35 WARN security.UserGroupInformation: No groups available for 
 user SYSTEM
 15/03/27 09:46:36 INFO namenode.FSEditLog: Number of transactions: 2 Total 
 time for transactions(ms): 28 Number of transactions batched in Syncs: 0 
 Number of syncs: 2 SyncTimes(ms): 46
 15/03/27 09:46:37 WARN security.UserGroupInformation: No groups available for 
 user SYSTEM
 15/03/27 09:46:39 INFO hdfs.StateChange: BLOCK* allocateBlock: 
 /testing/syncfusion/C#/JMS_message.1427429746967.log.tmp. 
 BP-412829692-192.168.56.1-1427371070417
  blk_1073741836_1012{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[[DISK]DS-57962794-b57c-476e-a811-ebcf871f4f12:NORMAL:192.168.56.1:50010|RBW]]}
 15/03/27 09:46:42 WARN blockmanagement.BlockPlacementPolicy: Failed to place 
 enough replicas, still in need of 1 to reach 1 (unavailableStorages=[],
 storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
 creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) 
 For more information, please enable DEBUG log level on 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
 15/03/27 09:46:42 WARN protocol.BlockStoragePolicy: Failed to place enough 
 replicas: expected size is 1 but only 0 storage types can be selected 
 (replication=1,
  selected=[], unavailable=[DISK], removed=[DISK], 
 policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], 
 replicationFallbacks=[ARCHIVE]})
 15/03/27 09:46:42 WARN blockmanagement.BlockPlacementPolicy: Failed to place 
 enough replicas, still in need of 1 to reach 1 (unavailableStorages=[DISK], 
 storage
 Policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], 
 replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types 
 are unava
 ilable:  unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
 storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
 15/03/27 09:46:42 INFO ipc.Server: IPC Server handler 9 on 9000, call 
 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
 192.168.15.242:57416 Call#7 Retry#0
 java.io.IOException: File 
 /testing/syncfusion/C#/JMS_message.1427429746967.log.tmp could only be 
 replicated to 0 nodes instead of minReplication (=1).  There are 1 
 datanode(s) running and 1 node(s) are excluded in this operation.
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
 15/03/27 09:46:46 WARN namenode.FSNamesystem: trying to get DT with no secret 
 manager running
 
 Please anybody know about this issue..
 Thanks  Regards
 Ramesh

Re: Total memory available to NameNode

2015-03-26 Thread Alexander Alten-Lorenz

Hi Mich,

the book Hadoop Operations may a good start:
https://books.google.de/books?id=drbI_aro20oCpg=PA308lpg=PA308dq=hadoop+memory+namenodesource=blots=t_yltgk_i7sig=_6LXkcSjfuwwqfz_kDGDi9ytgqUhl=ensa=Xei=Nt8TVfn9AcjLPZyXgKACved=0CFYQ6AEwBg#v=onepageq=hadoop%20memory%20namenodef=false

https://books.google.de/books?id=drbI_aro20oCpg=PA308lpg=PA308dq=hadoop+memory+namenodesource=blots=t_yltgk_i7sig=_6LXkcSjfuwwqfz_kDGDi9ytgqUhl=ensa=Xei=Nt8TVfn9AcjLPZyXgKACved=0CFYQ6AEwBg#v=onepageq=hadoop
memory namenodef=false

BR,
AL

On 26 Mar 2015, at 11:16, Mich Talebzadeh m...@peridale.co.uk wrote:

Is there any parameter that sets the total memory that NameNode can use?

Thanks

Mich Talebzadeh

http://talebzadehmich.wordpress.com http://talebzadehmich.wordpress.com/

Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this message
shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

From: Mirko Kämpf [mailto:mirko.kae...@gmail.com
mailto:mirko.kae...@gmail.com]
Sent: 25 March 2015 16:08
To: user@hadoop.apache.org mailto:user@hadoop.apache.org;
m...@peridale.co.uk mailto:m...@peridale.co.uk
Subject: Re: can block size for namenode be different from wdatanode block
size?

Correct, let's say you run the NameNode with just 1GB of RAM.
This would be a very strong limitation for the cluster. For each file we need
about 200 bytes and for each block as well. Now we can estimate the max.
capacity depending on HDFS-Blocksize and average File size.

Cheers,
Mirko

2015-03-25 15:34 GMT+00:00 Mich Talebzadeh m...@peridale.co.uk
mailto:m...@peridale.co.uk:
Hi Mirko,

Thanks for feedback.

Since i have worked with in memory databases, this metadata caching sounds
more like an IMDB that caches data at start up from disk resident storage.

IMDBs tend to get issues when the cache cannot hold all data. Is this the
case the case with metada as well?

Regards,

Mich
Let your email find you with BlackBerry from Vodafone
From: Mirko Kämpf mirko.kae...@gmail.com mailto:mirko.kae...@gmail.com
Date: Wed, 25 Mar 2015 15:20:03 +
To: user@hadoop.apache.org
mailto:user@hadoop.apache.orguser@hadoop.apache.org
mailto:user@hadoop.apache.org
ReplyTo: user@hadoop.apache.org mailto:user@hadoop.apache.org
Subject: Re: can block size for namenode be different from datanode block
size?

Hi Mich,

please see the comments in your text.

2015-03-25 15:11 GMT+00:00 Dr Mich Talebzadeh m...@peridale.co.uk
mailto:m...@peridale.co.uk:

Hi,

The block size for HDFS is currently set to 128MB by defauilt. This is
configurable.
Correct, an HDFS client can overwrite the cfg-property and define a different
block size for HDFS blocks.

My point is that I assume this parameter in hadoop-core.xml sets the
block size for both namenode and datanode.
Correct, the block-size is a HDFS wide setting but in general the
HDFS-client makes the blocks.

However, the storage and
random access for metadata in nsamenode is different and suits smaller
block sizes.
HDFS blocksize has no impact here. NameNode metadata is held in memory. For
reliability it is dumped to local discs of the server.

For example in Linux the OS block size is 4k which means one HTFS blopck
size of 128MB can hold 32K OS blocks. For metadata this may not be
useful and smaller block size will be suitable and hence my question.
Remember, metadata is in memory. The fsimage-file, which contains the
metadata
is loaded on startup of the NameNode.

Please be not confused by the two types of block-sizes.

Hope this helps a bit.
Cheers,
Mirko

Thanks,

Mich

Re: Total memory available to NameNode

2015-03-26 Thread Alexander Alten-Lorenz

Ah, yes. Toms book is a good start, and Eric Sammers book Hadoop Operations too
:)

BR,
AL

On 26 Mar 2015, at 11:50, Mich Talebzadeh m...@peridale.co.uk wrote:

Many thanks AL. I believe you meant “Hadoop the definitive guide” J

Mich Talebzadeh

http://talebzadehmich.wordpress.com http://talebzadehmich.wordpress.com/

Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

From: Alexander Alten-Lorenz [mailto:wget.n...@gmail.com]
Sent: 26 March 2015 10:30
To: user@hadoop.apache.org
Subject: Re: Total memory available to NameNode

Hi Mich,

BR,
AL

On 26 Mar 2015, at 11:16, Mich Talebzadeh m...@peridale.co.uk
mailto:m...@peridale.co.uk wrote:

Is there any parameter that sets the total memory that NameNode can use?

Thanks

Mich Talebzadeh

http://talebzadehmich.wordpress.com http://talebzadehmich.wordpress.com/

Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

Correct, let's say you run the NameNode with just 1GB of RAM.
This would be a very strong limitation for the cluster. For each file we
need about 200 bytes and for each block as well. Now we can estimate the
max. capacity depending on HDFS-Blocksize and average File size.

Cheers,
Mirko

2015-03-25 15:34 GMT+00:00 Mich Talebzadeh m...@peridale.co.uk
mailto:m...@peridale.co.uk:
Hi Mirko,

Thanks for feedback.

Since i have worked with in memory databases, this metadata caching sounds
more like an IMDB that caches data at start up from disk resident storage.

IMDBs tend to get issues when the cache cannot hold all data. Is this the
case the case with metada as well?

Regards,

Hi Mich,

please see the comments in your text.

2015-03-25 15:11 GMT+00:00 Dr Mich Talebzadeh m...@peridale.co.uk
mailto:m...@peridale.co.uk:

Hi,

The block size for HDFS is currently set to 128MB by defauilt. This is
configurable.
Correct, an HDFS client can overwrite the cfg-property and define a
different block size for HDFS blocks.

For example in Linux

Re: Trusted-realm vs default-realm kerberos issue

2015-03-25 Thread Alexander Alten-Lorenz

Do you have mapping rules, which tells Hadoop that the trusted realm is allowed 
to login? 
http://mapredit.blogspot.de/2015/02/hadoop-and-trusted-mitv5-kerberos-with.html 
http://mapredit.blogspot.de/2015/02/hadoop-and-trusted-mitv5-kerberos-with.html

BR,
 Alex


 On 24 Mar 2015, at 18:21, Michael Segel michael_se...@hotmail.com wrote:
 
 So… 
 
 If I understand, you’re saying you have a one way trust set up so that the 
 cluster’s AD trusts the Enterprise AD? 
 
 And by AD you really mean KDC? 
 
 On Mar 17, 2015, at 2:22 PM, John Lilley john.lil...@redpoint.net 
 mailto:john.lil...@redpoint.net wrote:
 
 AD
 
 The opinions expressed here are mine, while they may reflect a cognitive 
 thought, that is purely accidental. 
 Use at your own risk. 
 Michael Segel
 michael_segel (AT) hotmail.com http://hotmail.com/

Re: Ambari based uninstall

2015-02-26 Thread Alexander Alten-Lorenz

Steve,

I wrote up that howto some month ago, worth to test?
http://mapredit.blogspot.de/2014/06/remove-hdp-and-ambari-completely.html 
http://mapredit.blogspot.de/2014/06/remove-hdp-and-ambari-completely.html

R,
 Alexander 


 On 26 Feb 2015, at 16:21, Steve Edison sediso...@gmail.com wrote:
 
 Team,
 
 I am using Ambari to install a cluster which now needs to be deleted and 
 re-installed.
 
 Is there a clean way to Uninstall the cluster, clean up all the binaries from 
 all the  nodes and do a fresh install ?  
 
 There is no data on the cluster, so nothing to worry.
 
 Thanks in advance

Re: Impala CREATE TABLE AS AVRO Requires Redundant Schema - Why?

2015-02-26 Thread Alexander Alten-Lorenz

Hi,

Impala is a product of Cloudera. You might request help per:
https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user 
https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user

BR, 
 Alex


 On 26 Feb 2015, at 17:15, Vitale, Tom thomas.vit...@credit-suisse.com wrote:
 
 I used sqoop to import an MS SQL Server table into an Avro file on HDFS.  No 
 problem. Then I tried to create an external Impala table using the following 
 DDL:
  
 CREATE EXTERNAL TABLE AvroTable
 STORED AS AVRO
 LOCATION '/tmp/AvroTable';
  
 I got the error “ERROR: AnalysisException: Error loading Avro schema: No Avro 
 schema provided in SERDEPROPERTIES or TBLPROPERTIES for table: 
 default.AvroTable”
  
 So I extracted the schema from the Avro file using the avro-tools-1.7.4.jar 
 (-getschema) into a JSON file, then per the recommendation above, changed the 
 DDL to point to it:
  
 CREATE EXTERNAL TABLE AvroTable
 STORED AS AVRO
 LOCATION '/tmp/AvroTable'
 TBLPROPERTIES(
 'serialization.format'='1',
 
 'avro.schema.url'='hdfs://...net/tmp/AvroTable.schema'
  hdfs://...net/tmp/AvroTable.schema'
 );
  
 This worked fine.  But my question is, why do you have to do this?  The 
 schema is already in the Avro file – that’s where I got the JSON schema file 
 that I point to in the TBLPROPERTIES parameter!
  
 Thanks, Tom
  
 Tom Vitale
 CREDIT SUISSE
 Information Technology | Infra Arch  Strategy NY, KIVP
 Eleven Madison Avenue | 10010-3629 New York | United States
 Phone +1 212 538 0708
 thomas.vit...@credit-suisse.com mailto:thomas.vit...@credit-suisse.com | 
 www.credit-suisse.com http://www.credit-suisse.com/
  
 
 
 
 ==
 Please access the attached hyperlink for an important electronic 
 communications disclaimer:
 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
 ==

Re: recombining split files after data is processed

2015-02-23 Thread Alexander Alten-Lorenz

You could attach the hadoop dfs command per bootstrap.
http://stackoverflow.com/questions/12055595/emr-how-to-join-files-into-one 
http://stackoverflow.com/questions/12055595/emr-how-to-join-files-into-one

BR,
 Alex


 On 23 Feb 2015, at 08:10, Jonathan Aquilina jaquil...@eagleeyet.net wrote:
 
 Thanks Alex. where would that command be placed in a mapper or reducer or run 
 as a command. Here at work we are looking to use Amazon EMR to do our number 
 crunching and we have access to the master node, but not really the rest of 
 the cluster. Can this be added as a step to be run after initial processing?
 
  
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 On 2015-02-23 08:05, Alexander Alten-Lorenz wrote:
 
 Hi,
  
 You can use an single reducer 
 (http://wiki.apache.org/hadoop/HowManyMapsAndReduces 
 http://wiki.apache.org/hadoop/HowManyMapsAndReduces) for smaller datasets, 
 or ‚getmerge': hadoop dfs -getmerge /hdfs/path local_file_name
 
  
 BR,
  Alex
  
 
 On 23 Feb 2015, at 08:00, Jonathan Aquilina jaquil...@eagleeyet.net 
 mailto:jaquil...@eagleeyet.net wrote:
 
 Hey all,
 
 I understand that the purpose of splitting files is to distribute the data 
 to multiple core and task nodes in a cluster. My question is that after the 
 output is complete is there a way one can combine all the parts into a 
 single file?
 
  
 -- 
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

Re: java.net.UnknownHostException on one node only

2015-02-22 Thread Alexander Alten-Lorenz

Important is, that the canonical name must be the FQDN of the server, see the 
example.

1.1.1.1 one.one.org one namenode
1.1.1.2 two.one.og two datanode

If DNS is used the system’s hostname must match the FQDN in forward as well as 
reverse name resolution.

BR,
 Alex


 On 22 Feb 2015, at 21:51, tesm...@gmail.com wrote:
 
 I am getting java.net.UnknownHost exception continuously on one node Hadoop 
 MApReduce execution.
 
 That node is accessible via SSH. This node is shown in yarn node -list and 
 hadfs dfsadmin -report queries.
 
 Below is the log from execution
 
 15/02/22 20:17:42 INFO mapreduce.Job: Task Id : 
 attempt_1424622614381_0008_m_43_0, Status : FAILED
 Container launch failed for container_1424622614381_0008_01_16 : 
 java.lang.IllegalArgumentException: java.net.UnknownHostException: 
 101-master10
   at 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373)
   at 
 org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:352)
   at 
 org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:237)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:218)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.net.UnknownHostException: 101-master10
   ... 12 more
 
 
 
 15/02/22 20:17:44 INFO
 
 Regards,
 Tariq

Re: Kerberos Security in Hadoop

2015-02-19 Thread Alexander Alten-Lorenz

I wrote a AD = MiTv5 trust tutorial a few days ago, just for actuality.
http://mapredit.blogspot.de/2015/02/hadoop-and-trusted-mitv5-kerberos-with.html 
http://mapredit.blogspot.de/2015/02/hadoop-and-trusted-mitv5-kerberos-with.html

BR
- Alexander


 On 19 Feb 2015, at 01:49, Krish Donald gotomyp...@gmail.com wrote:
 
 Hi,
 
 Has anybody worked on Kerberos security on Hadoop ?
 Can you please guide me , any document link will be appreciated ?
 
 Thanks
 Krish

Re: Apache Sentry

2015-02-19 Thread Alexander Alten-Lorenz

Hi,

Pros: Safe and easy to use for DWH based on Hadoop (Beeline, Metastore, 
HiveServer2, Impala, HDFS …)
Cons: works not with HCatalog, Pig, old HiveServer1, /usr/bin/hive (u have to 
use beeline), HBase - Hive Handler doesn’t work either

BR,
 Alex

 On 19 Feb 2015, at 02:53, Krish Donald gotomyp...@gmail.com wrote:
 
 Hi,
 
 Has anybody worked on Apache Sentry ?
 Is yes, what are the challenges, pros and cons ?
 
 Thanks
 Krish

Re: Yarn AM is abending job when submitting a remote job to cluster

2015-02-19 Thread Alexander Alten-Lorenz

Hi,

https://issues.apache.org/jira/browse/YARN-1116
https://issues.apache.org/jira/browse/YARN-1058

Looks like that the history server received a unclean shutdown or an previous
job doesn’t finished, or wasn’t cleaned up after finishing the job (2015-02-15
07:51:07,241 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind:
YARN_AM_RM_TOKEN, Service: , Ident:
(org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0
mailto:org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0) ….
Previous history file is at
hdfs://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist

http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15).

BR,
Alex

On 19 Feb 2015, at 13:27, Roland DePratti roland.depra...@cox.net wrote:

Daemeon,

Thanks for the reply. I have about 6 months exposure to Hadoop and new to
SSL so I did some digging after reading your message.

In the HDFS config, I have hadoop.ssl.enabled. using the default which is
‘false’ (which I understand sets it for all Hadoop daemons).

I assumed this meant that it is not in use and not a factor in job submission
(ssl certs not needed).

Do I misunderstand and are you saying that it needs to be set to ‘true’ with
valid certs and store setup for me to submit a remote job (this is a POC
setup without exposure to outside my environment)?

- rd

From: daemeon reiydelle [mailto:daeme...@gmail.com]
Sent: Wednesday, February 18, 2015 10:22 PM
To: user@hadoop.apache.org
Subject: Re: Yarn AM is abending job when submitting a remote job to cluster

I would guess you do not have your ssl certs set up, client or server, based
on the error.

...
“Life should not be a journey to the grave with the intention of arriving
safely in a
pretty and well preserved body, but rather to skid in broadside in a cloud of
smoke,
thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a
Ride!”
- Hunter Thompson

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Wed, Feb 18, 2015 at 5:19 PM, Roland DePratti roland.depra...@cox.net
mailto:roland.depra...@cox.net wrote:
I have been searching for a handle on a problem without very little clues.
Any help pointing me to the right direction will be huge.
I have not received any input form the Cloudera google groups. Perhaps this
is more Yarn based and I am hoping I have more luck here.
Any help is greatly appreciated.

I am running a Hadoop cluster using CDH5.3. I also have a client machine with
a standalone one node setup (VM).

All environments are running CentOS 6.6.

I have submitted some Java mapreduce jobs locally on both the cluster and the
standalone environment with successfully completions.

I can submit a remote HDFS job from client to cluster using -conf
hadoop-cluster.xml (see below) and get data back from the cluster with no
problem.

When submitted remotely the mapreduce jobs remotely, I get an AM error:

AM fails the job with the error:

SecretManager$InvalidToken: appattempt_1424003606313_0001_02
not found in AMRMTokenSecretManager

I searched /var/log/secure on the client and cluster with no unusual messages.

Here is the contents of hadoop-cluster.xml:

?xml version=1.0 encoding=UTF-8?

!--generated by Roland--
configuration
property
namefs.defaultFS/name
valuehdfs://mycluser:8020/value
/property
property
namemapreduce.jobtracker.address/name
valuehdfs://mycluster:8032/value
/property
property
nameyarn.resourcemanager.address/name
valuehdfs://mycluster:8032/value
/property

Here is the output from the job log on the cluster:

2015-02-15 07:51:06,544 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for
application appattempt_1424003606313_0001_02
2015-02-15 07:51:06,949 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
hadoop.ssl.require.client.cert; Ignoring.
2015-02-15 07:51:06,952 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval; Ignoring.
2015-02-15 07:51:06,952 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;
Ignoring.
2015-02-15 07:51:06,954 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:
hadoop.ssl.keystores.factory.class; Ignoring.
2015-02-15 07:51:06,957 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;
Ignoring.
2015-02-15 07:51:06,973 WARN [main] org.apache.hadoop.conf.Configuration:
job.xml:an attempt to override final parameter:

Re: Coding mappers and reducers in java

2015-02-18 Thread Alexander Alten-Lorenz

Hi,

http://www.slideshare.net/AmazonWebServices/amazon-elastic-mapreduce-deep-dive-and-best-practices-bdt404-aws-reinvent-2013
 
http://www.slideshare.net/AmazonWebServices/amazon-elastic-mapreduce-deep-dive-and-best-practices-bdt404-aws-reinvent-2013

seems quite good.

BR,
 AL

 On 18 Feb 2015, at 13:50, Jonathan Aquilina jaquil...@eagleeyet.net wrote:
 
 Hey guys,
 
 New to the list, but I am wondering does anyone have any good tutorials on 
 how to write mappers and reducers for Amazon EMR
 
  
 -- 
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

Re: Copying many files to HDFS

2015-02-13 Thread Alexander Alten-Lorenz

Kevin,

Slurper can help here:
https://github.com/alexholmes/hdfs-file-slurper 
https://github.com/alexholmes/hdfs-file-slurper

BR,
 Alexander 


 On 13 Feb 2015, at 14:28, Kevin kevin.macksa...@gmail.com wrote:
 
 Hi,
 
 I am setting up a Hadoop cluster (CDH5.1.3) and I need to copy a thousand or 
 so files into HDFS, which totals roughly 1 TB. The cluster will be isolated 
 on its own private LAN with a single client machine that is connected to the 
 Hadoop cluster as well as the public network. The data that needs to be 
 copied into HDFS is mounted as an NFS on the client machine.
 
 I can run `hadoop fs -put` concurrently on the client machine to try and 
 increase the throughput.
 
 If these files were able to be accessed by each node in the Hadoop cluster, 
 then I could write a MapReduce job to copy a number of files from the network 
 into HDFS. I could not find anything in the documentation saying that 
 `distcp` works with locally hosted files (its code in the tools package 
 doesn't tell any sign of it either) - but I wouldn't expect it to.
 
 In general, are there any other ways of copying a very large number of 
 client-local files to HDFS? I search the mail archives to find a similar 
 question and I didn't come across one. I'm sorry if this is a duplicate 
 question.
 
 Thanks for your time,
 Kevin

Re: Failed to start datanode due to bind exception

2015-02-12 Thread Alexander Alten-Lorenz

/var/run/hdfs-sockets has to be the right permissions. Per default 755 hdfs:hdfs

BR,
 Alexander 

 On 10 Feb 2015, at 19:39, Rajesh Thallam rajesh.thal...@gmail.com wrote:
 
 There are no contents in the hdfs-sockets directory 
 Apache Hadoop Base version if 2.5.0 (using CDH 5.3.0)
 
 On Tue, Feb 10, 2015 at 10:24 AM, Ted Yu yuzhih...@gmail.com 
 mailto:yuzhih...@gmail.com wrote:
 The exception came from DomainSocket so using netstat wouldn't reveal the 
 conflict.
 
 What's the output from:
 ls -l /var/run/hdfs-sockets/datanode
 
 Which hadoop release are you using ?
 
 Cheers
 
 On Tue, Feb 10, 2015 at 10:12 AM, Rajesh Thallam rajesh.thal...@gmail.com 
 mailto:rajesh.thal...@gmail.com wrote:
 I have been repeatedly trying to start datanode but it fails with bind 
 exception saying address is already in use even though port is free
 
 I used below commands to check
 
 netstat -a -t --numeric-ports -p | grep 500
 
  
 I have overridden default port 50070 with 50081 but the issue still persists.
 
 Starting DataNode with maxLockedMemory = 0
 Opened streaming server at /172.19.7.160:50081 http://172.19.7.160:50081/
 Balancing bandwith is 10485760 bytes/s
 Number threads for balancing is 5
 Waiting for threadgroup to exit, active threads is 0
 Shutdown complete.
 Exception in secureMain
 java.net.BindException: bind(2) error: Address already in use when trying to 
 bind to '/var/run/hdfs-sockets/datanode'
 at org.apache.hadoop.net.unix.DomainSocket.bind0(Native Method)
 at 
 org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:191)
 at 
 org.apache.hadoop.hdfs.net.DomainPeerServer.init(DomainPeerServer.java:40)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:907)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:873)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1066)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:411)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2297)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2184)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2231)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2407)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2431)
 Exiting with status 1
 
 hdfs-site.xml
   property
 namedfs.datanode.address/name
 valuehostname.dc.xx.org:50010 http://hostname.dc.xx.org:50010//value
   /property
   property
 namedfs.datanode.ipc.address/name
 valuehostname.dc.xx.org:50020 http://hostname.dc.xx.org:50020//value
   /property
   property
 namedfs.datanode.http.address/name
 valuehostname.dc.xx.org:50075 http://hostname.dc.xx.org:50075//value
   /property
 Regards,
 RT
 
 
 
 
 -- 
 Cheers,
 RT

Re: commodity hardware

2015-02-12 Thread Alexander Alten-Lorenz

Typically that term means standard hardware which should be present per default 
in an enterprise without any extras like RAID, highSpeed NICs, dual power 
supply and so on.
But that change more and more, since some new independent frameworks and tools 
enter the market, like Spark, Kafka, Storm etc. 

Conclusion: For me the term commodity” hardware says quite nothing anymore. 
Today you need to consider different types of hardware for different use cases. 
Depends on the goals you want to achieve. 

BR,
 Alexander 



 On 12 Feb 2015, at 17:45, Adaryl Wakefield adaryl.wakefi...@hotmail.com 
 wrote:
 
 Does anybody have a good definition of commodity hardware? I'm having a hard 
 time explaining it to people. I have no idea when a piece of HW is 
 commodity or whatever the opposite of commodity is.
  
 B.

Re: Transferring security tokens to remote machines

2015-02-12 Thread Alexander Alten-Lorenz

Hi Robert,

forgive me if I’m wrong, but so far as I understand Flink uses nearly the same 
model as HDFS (not at all). Means the master receives an action and distribute 
that to the workers (more or less ;)) 
HDFS as example uses not an push mechanism, the DN clients fetch the token from 
the NN when they need them. Could that be a solution, too?

https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/tools/DelegationTokenFetcher.java
 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/tools/DelegationTokenFetcher.java

MapReduce gets the token from the JT with getDelegationToken()

http://hadoop.apache.org/docs/r2.5.2/api/org/apache/hadoop/mapreduce/Cluster.html#getDelegationToken(org.apache.hadoop.io.Text)
 
http://hadoop.apache.org/docs/r2.5.2/api/org/apache/hadoop/mapreduce/Cluster.html#getDelegationToken(org.apache.hadoop.io.Text)


BR,
 Alexander 


 On 12 Feb 2015, at 15:28, Robert Metzger rmetz...@apache.org wrote:
 
 Hi,
 
 I'm a committer at the Apache Flink project.
 One of our users asked for adding support for reading from a secured HDFS 
 cluster.
 
 Flink has a master-worker model. Since its not really feasible for users to 
 login with their kerberos credentials on all workers, I wanted to acquire the 
 security token on the master and send it to all workers.
 For that, I wrote the following code to get the tokens in to a byte array:
 
 UserGroupInformation.setConfiguration(hdConf);
 Credentials credentials = new Credentials();
 UserGroupInformation currUsr = UserGroupInformation.getCurrentUser();
 
 CollectionToken? extends TokenIdentifier usrTok = currUsr.getTokens();
 for(Token? extends TokenIdentifier token : usrTok) {
final Text id = new Text(token.getIdentifier());
credentials.addToken(id, token);
 }
 DataOutputBuffer dob = new DataOutputBuffer();
 credentials.writeTokenStorageToStream(dob);
 dob.flush();
 However, the collection currUsr.getTokens() is empty, hence the output buffer 
 doesn't contain much data.
 I suspect that I didn't fully understand the Hadoop security concepts yet.
 It would be great if somebody from the list could clarify how to properly 
 acquire the tokens.
 
 Also, I was wondering if there is any document describing how the 
 UserGroupInformation class is working (when is it loading the credentials, 
 does it only work for Kerberos, ...)
 
 Best,
 Robert

Re: Stopping ntpd signals SIGTERM, then causes namenode exit

2015-02-09 Thread Alexander Alten-Lorenz

I would spot on 

Jan  7 14:52:48 host1 ntpd[44765]: no servers reachable

looks for me like an network / DNS issue. You can check per dmesg whats going 
on, too.

BR
- Alexander

 On 09 Feb 2015, at 17:57, daemeon reiydelle daeme...@gmail.com wrote:
 
 Absolutely a critical error to lose the configured ntpd time source in 
 Hadoop. The replication and many other services require absolutely 
 millisecond time sync between the nodes. Interesting that your SRE design 
 called for ntpd running on each node. Curious.
 
 What is the problem you are trying to solve by stopping ntpd on the local 
 host? Did someone not understand how ntpd works? Did someone configure it to 
 (I sure hope not) be free running?
 
 
 
 ...
 “Life should not be a journey to the grave with the intention of arriving 
 safely in a
 pretty and well preserved body, but rather to skid in broadside in a cloud of 
 smoke,
 thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a 
 Ride!” 
 - Hunter Thompson
 
 Daemeon C.M. Reiydelle
 USA (+1) 415.501.0198
 London (+44) (0) 20 8144 9872
 
 On Sun, Feb 8, 2015 at 7:30 PM, David chen c77...@163.com 
 mailto:c77...@163.com wrote:
 A shell script is deployed on every node of HDFS cluster, the script is 
 invoked hourly by crontab, and its content is as follows:
 #!/bin/bash
 service ntpd stop
 ntpdate 192.168.0.1 #it's a valid ntpd server in LAN
 service ntpd start
 chkconfig ntpd on
 
 After several days, NameNode crashed suddenly, but its log seemed no other 
 errors except the following:
 2015-01-07 14:00:00,709 ERROR 
 org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM
 
 Inspected the Linux log(Centos /var/log/messages), also found the following 
 clues:
 Jan  7 14:00:01 host1 ntpd[32101]: ntpd exiting on signal 15
 Jan  7 13:59:59 host1 ntpd[44764]: ntpd 4.2.4p8@1.1612-o Fri Feb 22 11:23:27 
 UTC 2013 (1)
 Jan  7 13:59:59 host1 ntpd[44765]: precision = 0.143 usec
 Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #0 wildcard, 
 0.0.0.0#123 Disabled
 Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #1 wildcard, ::#123 
 Disabled
 Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #2 lo, ::1#123 
 Enabled
 Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #3 em2, 
 fe80::ca1f:66ff:fee1:eed#123 Enabled
 Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #4 lo, 
 127.0.0.1#123 Enabled
 Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #5 em2, 
 192.168.1.151#123 Enabled
 Jan  7 13:59:59 host1 ntpd[44765]: Listening on routing socket on fd #22 for 
 interface updates
 Jan  7 13:59:59 host1 ntpd[44765]: kernel time sync status 2040
 Jan  7 13:59:59 host1 ntpd[44765]: frequency initialized 499.399 PPM from 
 /var/lib/ntp/drift
 Jan  7 14:00:01 host1 ntpd_initres[32103]: parent died before we finished, 
 exiting
 Jan  7 14:04:17 host1 ntpd[44765]: synchronized to 192.168.0.191, stratum 2
 Jan  7 14:04:17 host1 ntpd[44765]: kernel time sync status change 2001
 Jan  7 14:26:02 host1 snmpd[4842]: Received TERM or STOP signal...  shutting 
 down...
 Jan  7 14:26:02 host1 kernel: netlink: 12 bytes leftover after parsing 
 attributes.
 Jan  7 14:26:02 host1 snmpd[45667]: NET-SNMP version 5.5
 Jan  7 14:52:48 host1 ntpd[44765]: no servers reachable
 
 It looks likely that NameNode received the SIGTERM signal sent by stopping 
 ntpd command.
 Up to now, the problem has happened three times repeatedly, the time point 
 was Jan  7 14:00:00, Jan 14 14:00:00 and Feb  4 14:00:00 respectively.
 Although the script to synchronize time is a little improper, and i also know 
 the correct synchronized way. but i wonder why NameNode can receive the 
 SIGTERM signal sent by stopping ntpd command? and why three times all 
 happened at 14:00:00?
 Any ideas can be appreciated.

Re: tools.DistCp: Invalid arguments

2015-02-03 Thread Alexander Alten-Lorenz

Hi,

Can you please try webhdfs instead hdfs?

- Alexander 

 On 03 Feb 2015, at 12:05, xeonmailinglist xeonmailingl...@gmail.com wrote:
 
 Maybe this has to do with this error… I can’t do ls to my own machine using 
 the command below. Can this be related to the other problem? Shouldn't I list 
 the files with this command?
 vagrant@hadoop-coc-1:~$ hdfs dfs -ls hdfs://192.168.56.100/
 ls: Call From hadoop-coc-1/192.168.56.100 to hadoop-coc-1:8020 failed on 
 connection exception: java.net.ConnectException: Connection refused; For more 
 details see:  http://wiki.apache.org/hadoop/ConnectionRefused 
 http://wiki.apache.org/hadoop/ConnectionRefused
 On 02-02-2015 19:59, Alexander Alten-Lorenz wrote:
 
 
 
 Have a closer look:
 
 hdfs://hadoop-coc-2:50070/ hdfs://hadoop-coc-2:50070/
 No Path is given.
 
 
 On 02 Feb 2015, at 20:52, xeonmailinglist xeonmailingl...@gmail.com 
 mailto:xeonmailingl...@gmail.com wrote:
 
 Hi,
 
 I am trying to copy data using distcp but I get this error. Both hadoop 
 runtime are working properly. Why is this happening?
 
 
 vagrant@hadoop-coc-1:~/Programs/hadoop$ hadoop distcp 
 hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-1:50070/input1 
 hdfs://hadoop-coc-2:50070/ hdfs://hadoop-coc-2:50070/
 15/02/02 19:46:37 ERROR tools.DistCp: Invalid arguments: 
 java.io.IOException: Failed on local exception: 
 com.google.protobuf.InvalidProtocolBufferException: Protocol message 
 end-group tag did not match expected tag.; Host Details : local host is: 
 hadoop-coc-1/127.0.1.1; destination host is: hadoop-coc-2:50070; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
 at org.apache.hadoop.ipc.Client.call(Client.java:1472)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
 at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:188)
 at org.apache.hadoop.tools.DistCp.run(DistCp.java:111)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
 Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message end-group tag did not match expected tag.
 at 
 com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
 at 
 com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
 at 
 com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:202)
 at 
 com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
 at 
 com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
 at 
 com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
 at 
 com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
 at 
 org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:3167)
 at 
 org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
 Invalid arguments: Failed on local exception: 
 com.google.protobuf.InvalidProtocolBufferException: Protocol message 
 end-group tag did not match expected tag.; Host Details : local host is: 
 hadoop-coc-1/127.0.1.1; destination host is: hadoop-coc-2:50070; 
 usage: distcp OPTIONS [source_path...] target_path
 Thanks,

Re: tools.DistCp: Invalid arguments

2015-02-03 Thread Alexander Alten-Lorenz

Ah, good. Cross-posting :)

BR,
 Alex

 On 03 Feb 2015, at 12:41, xeonmailinglist xeonmailingl...@gmail.com wrote:
 
 I have found the problem. I started to use `webhdfs` and everything is ok.
 
 
 On 03-02-2015 10:40, xeonmailinglist wrote:
 What do you mean by no path is given? Even if I launch this command, I get 
 the same error…. What path should I put here?
 
 $ hadoop distcp
 hdfs://hadoop-coc-1:50070/input1
 hdfs://hadoop-coc-2:50070/input1
 
 Thanks,
 On 02-02-2015 19:59, Alexander Alten-Lorenz wrote:
 
 Have a closer look:
 
 hdfs://hadoop-coc-2:50070/ hdfs://hadoop-coc-2:50070/
 No Path is given.
 
 
 On 02 Feb 2015, at 20:52, xeonmailinglist xeonmailingl...@gmail.com 
 mailto:xeonmailingl...@gmail.com wrote:
 
 Hi,
 
 I am trying to copy data using distcp but I get this error. Both hadoop 
 runtime are working properly. Why is this happening?
 
 
 vagrant@hadoop-coc-1:~/Programs/hadoop$ hadoop distcp 
 hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-1:50070/input1 
 hdfs://hadoop-coc-2:50070/ hdfs://hadoop-coc-2:50070/
 15/02/02 19:46:37 ERROR tools.DistCp: Invalid arguments: 
 java.io.IOException: Failed on local exception: 
 com.google.protobuf.InvalidProtocolBufferException: Protocol message 
 end-group tag did not match expected tag.; Host Details : local host is: 
 hadoop-coc-1/127.0.1.1; destination host is: hadoop-coc-2:50070; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
 at org.apache.hadoop.ipc.Client.call(Client.java:1472)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
 at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:188)
 at org.apache.hadoop.tools.DistCp.run(DistCp.java:111)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
 Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message end-group tag did not match expected tag.
 at 
 com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
 at 
 com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
 at 
 com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:202)
 at 
 com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
 at 
 com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
 at 
 com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
 at 
 com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
 at 
 org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:3167)
 at 
 org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
 Invalid arguments: Failed on local exception: 
 com.google.protobuf.InvalidProtocolBufferException: Protocol message 
 end-group tag did not match expected tag.; Host Details : local host is: 
 hadoop-coc-1/127.0.1.1; destination host is: hadoop-coc-2:50070; 
 usage: distcp OPTIONS [source_path...] target_path
 Thanks,

Re: tools.DistCp: Invalid arguments

2015-02-02 Thread Alexander Alten-Lorenz

Have a closer look:

 hdfs://hadoop-coc-2:50070/

No Path is given.


 On 02 Feb 2015, at 20:52, xeonmailinglist xeonmailingl...@gmail.com wrote:
 
 Hi,
 
 I am trying to copy data using distcp but I get this error. Both hadoop 
 runtime are working properly. Why is this happening?
 
 
 vagrant@hadoop-coc-1:~/Programs/hadoop$ hadoop distcp 
 hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/
 15/02/02 19:46:37 ERROR tools.DistCp: Invalid arguments: 
 java.io.IOException: Failed on local exception: 
 com.google.protobuf.InvalidProtocolBufferException: Protocol message 
 end-group tag did not match expected tag.; Host Details : local host is: 
 hadoop-coc-1/127.0.1.1; destination host is: hadoop-coc-2:50070; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
 at org.apache.hadoop.ipc.Client.call(Client.java:1472)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
 at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:188)
 at org.apache.hadoop.tools.DistCp.run(DistCp.java:111)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
 Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message end-group tag did not match expected tag.
 at 
 com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
 at 
 com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
 at 
 com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:202)
 at 
 com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
 at 
 com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
 at 
 com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
 at 
 com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
 at 
 org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:3167)
 at 
 org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
 Invalid arguments: Failed on local exception: 
 com.google.protobuf.InvalidProtocolBufferException: Protocol message 
 end-group tag did not match expected tag.; Host Details : local host is: 
 hadoop-coc-1/127.0.1.1; destination host is: hadoop-coc-2:50070; 
 usage: distcp OPTIONS [source_path...] target_path
 Thanks,

Re: Multiple separate Hadoop clusters on same physical machines

2015-02-02 Thread Alexander Alten-Lorenz

http://blog.sequenceiq.com/blog/2014/06/19/multinode-hadoop-cluster-on-docker/ 
http://blog.sequenceiq.com/blog/2014/06/19/multinode-hadoop-cluster-on-docker/

Ambari based, but works quite well


 On 02 Feb 2015, at 10:33, Ashish Kumar9 ashis...@in.ibm.com wrote:
 
 Is there any good reference material available to follow to test docker and 
 hadoop integration . 
 
 
 
 
 From:hadoop.supp...@visolve.com 
 To:'Harun Reşit Zafer' harun.za...@tubitak.gov.tr, 
 user@hadoop.apache.org 
 Date:02/02/2015 02:57 PM 
 Subject:RE: Multiple separate Hadoop clusters on same physical 
 machines 
 
 
 
 Hello Harun, 
   
 Your question is very interesting and will be useful for future Hadoop setups 
 for startup/individuals too. 
   
 Normally for testing purposes, we prefer you to use pseudo-distributed 
 environments (i.e. installation of all cluster files in single node). You can 
 refer few links which will guide you through the whole process below for 
 reference: 
   
 https://districtdatalabs.silvrback.com/creating-a-hadoop-pseudo-distributed-environment
  
 https://districtdatalabs.silvrback.com/creating-a-hadoop-pseudo-distributed-environment
  
   
 Individual Pseudo Distributed Cluster Implementation: 
   
 http://www.thegeekstuff.com/2012/02/hadoop-pseudo-distributed-installation/ 
 http://www.thegeekstuff.com/2012/02/hadoop-pseudo-distributed-installation/ 
 http://hbase.apache.org/book.html#quickstart_pseudo 
 http://hbase.apache.org/book.html#quickstart_pseudo 
 and please check for others. 
   
 From our 20 years of Server  its related Industrial experience, we recommend 
 you to use VM/Instances for production  Business Critical environment. Other 
 way around, if you are developing some products related to Hadoop, you can 
 use docker  other related resources for development. As shipment to 
 production will become stress free with the use of these tools with cluster 
 environment setup. 
   
 Feel free to ask for further queries. 
   
 Thanks and Regards, 
 S.RagavendraGanesh 
 Hadoop Support Team 
 ViSolve Inc.|www.visolve.com http://www.visolve.com/ 
   
   
   
 From: Alexander Pivovarov [mailto:apivova...@gmail.com 
 mailto:apivova...@gmail.com] 
 Sent: Monday, February 02, 2015 12:56 PM
 To: user@hadoop.apache.org
 Subject: Re: Multiple separate Hadoop clusters on same physical machines 
   
 start several vms and install hadoop on each vm 
 keywords: kvm, QEMU 
   
 On Mon, Jan 26, 2015 at 1:18 AM, Harun Reşit Zafer 
 harun.za...@tubitak.gov.tr mailto:harun.za...@tubitak.gov.tr wrote: 
 Hi everyone,
 
 We have set up and been playing with Hadoop 1.2.x and its friends (Hbase, 
 pig, hive etc.) on 7 physical servers. We want to test Hadoop (maybe 
 different versions) and ecosystem on physical machines (virtualization is not 
 an option) from different perspectives.
 
 As a bunch of developer we would like to work in parallel. We want every team 
 member play with his/her own cluster. However we have limited amount of 
 servers (strong machines though).
 
 So the question is, by changing port numbers, environment variables and other 
 configuration parameters, is it possible to setup several independent 
 clusters on same physical machines. Is there any constraints? What are the 
 possible difficulties we are to face?
 
 Thanks in advance
 
 -- 
 Harun Reşit Zafer
 TÜBİTAK BİLGEM BTE
 Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
 T +90 262 675 3268 tel:%2B90%20262%20675%203268
 W  http://www.hrzafer.com http://www.hrzafer.com/

Re: Multiple separate Hadoop clusters on same physical machines

2015-02-01 Thread Alexander Alten-Lorenz

I see no possibility how federation may help to have different Clusters on 
_same_ machines. On top, federation isn’t production ready, since the NN can 
have massively issues with GC on high loaded systems, which will be the case 
here.
To have multiple, maybe single node, clusters the best way is to use cloud 
based solutions, e.g.. OpenStack with Docker containers. Also an mesos driven 
solution can help here, there are some good tutorials available.

BG,
 Alexander 

 On 26 Jan 2015, at 10:34, Azuryy Yu azury...@gmail.com wrote:
 
 Hi,
 
 I think the best way is deploy HDFS federation with Hadoop 2.x.
 
 On Mon, Jan 26, 2015 at 5:18 PM, Harun Reşit Zafer 
 harun.za...@tubitak.gov.tr mailto:harun.za...@tubitak.gov.tr wrote:
 Hi everyone,
 
 We have set up and been playing with Hadoop 1.2.x and its friends (Hbase, 
 pig, hive etc.) on 7 physical servers. We want to test Hadoop (maybe 
 different versions) and ecosystem on physical machines (virtualization is not 
 an option) from different perspectives.
 
 As a bunch of developer we would like to work in parallel. We want every team 
 member play with his/her own cluster. However we have limited amount of 
 servers (strong machines though).
 
 So the question is, by changing port numbers, environment variables and other 
 configuration parameters, is it possible to setup several independent 
 clusters on same physical machines. Is there any constraints? What are the 
 possible difficulties we are to face?
 
 Thanks in advance
 
 -- 
 Harun Reşit Zafer
 TÜBİTAK BİLGEM BTE
 Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
 T +90 262 675 3268 tel:%2B90%20262%20675%203268
 W  http://www.hrzafer.com http://www.hrzafer.com/

Re: Start Hadoop, ERROR security.UserGroupInformation: PriviledgedActionException

2015-01-17 Thread Alexander Alten-Lorenz

Jobtracker isn’t running, the log said. Check if your script startup one.

BR,
 AL


 On 16 Jan 2015, at 21:55, Ruhua Jiang ruhua.ji...@gmail.com wrote:
 
 15/01/14 12:21:08 ERROR security.UserGroupInformation: 
 PriviledgedActionException as:hpc-ruhua 
 cause:org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker is 
 not yet RUNNING

Re: issue about run MR job use system user in CDH5

2014-07-22 Thread Alexander Alten-Lorenz


Please post vendor specific questions to the mailinglists of the vendor:
https://groups.google.com/a/cloudera.org/forum/#!forum/cdh-user

Look closer at:
security.UserGroupInformation: PriviledgedActionException as:hbase 
(auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: 
Permission denied: user=hbase, access=EXECUTE, 
inode=/data:mapred:hadoop:drwxrwx---


/data hasn't the proper permissions.

- Alex


-- Originalnachricht --
Von: ch huang justlo...@gmail.com
An: user@hadoop.apache.org
Gesendet: 22.07.2014 08:14:50
Betreff: issue about run MR job use system user in CDH5


hi,maillist:

i set up CDH5 yarn cluster ,and set the following option in my 
mapred-site.xml file


property
 nameyarn.app.mapreduce.am.staging-dir/name
 value/data/value
/property


mapreduce history server will set history dir in the directory /data 
,but if i submit MR job use other user ,i get error , i add the user to 
hadoop group also no use ,why?how can i do it? thanks


2014-07-22 14:07:06,734 INFO  [main] mapreduce.TableOutputFormat: 
Created table instance for test_1
2014-07-22 14:07:06,765 WARN  [main] security.UserGroupInformation: 
PriviledgedActionException as:hbase (auth:SIMPLE) 
cause:org.apache.hadoop.security.AccessControlException: Permission 
denied: user=hbase, access=EXECUTE, 
inode=/data:mapred:hadoop:drwxrwx---
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:168)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5490)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3499)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)

Exception in thread main 
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=hbase, access=EXECUTE, inode=/data:mapred:hadoop:drwxrwx---
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:168)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5490)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3499)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)

Re[2]: HBase metadata

2014-07-08 Thread Alexander Alten-Lorenz


http://docs.cascading.org/tutorials/lingual-hbase/

is the most common, I guess. Phoenix is quite new, and Kiji was to much 
overhead in my case.


- Alex
mapredit.blogspot.com

-- Originalnachricht --
Von: Martin, Nick nimar...@pssd.com
An: user@hadoop.apache.org user@hadoop.apache.org
Gesendet: 08.07.2014 19:01:27
Betreff: RE: HBase metadata

Can’t speak for the rest of the Hadoop community but we use Lingual. 
Not sure if that’s common or not.




Maybe worth posting the same question to @hbase.



From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, July 08, 2014 12:57 PM
To:user@hadoop.apache.org
Subject: RE: HBase metadata



Sorry to be rude, but what does everyone actually use now?  We are an 
ISV and need to support the most common access pattern.


john



From: Martin, Nick [mailto:nimar...@pssd.com]
Sent: Tuesday, July 08, 2014 10:53 AM
To:user@hadoop.apache.org
Subject: RE: HBase metadata



Have you looked @ Lingual?



From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, July 08, 2014 12:43 PM
To:user@hadoop.apache.org
Subject: RE: HBase metadata



Those look intriguing.  But what do people actually use today?  Is it 
all application-specific coding?  Hive?


John





From: Mirko Kämpf [mailto:mirko.kae...@gmail.com]
Sent: Tuesday, July 08, 2014 10:12 AM
To:user@hadoop.apache.org
Subject: Re: HBase metadata



Hi John,

I suggest the project: http://www.kiji.org/



or even the brand new: http://phoenix.apache.org/

Cheers,

Mirko




2014-07-08 16:05 GMT+00:00 John Lilley john.lil...@redpoint.net:

Greetings!



We would like to support HBase in a general manner, having our software 
connect to any HBase table and read/write it in a row-oriented fashion. 
 However, as we explore HBase, the raw interface is at a very low level 
-- basically a map from binary record keys to named columns.  So my 
question about metadata standards.  What do users mostly do to use 
HBase for row-oriented access?  It is always going through Hive?




Thanks

john

Re: Conversion from MongoDB to hadoop

2014-05-12 Thread Alexander Alten-Lorenz

http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/ 
may help.


- Alexander

-- Originalnachricht --
Von: Ranjini Rathinam ranjinibe...@gmail.com
An: user@hadoop.apache.org
Gesendet: 12.05.2014 07:48:42
Betreff: Conversion from MongoDB to hadoop


Hi,

How to convert the Mapreduce from MongoDB  to Mapreduce Hadoop.

Please suggest .

Thanks in advance.

Ranjini.

Re: running YARN in Production

2014-04-24 Thread Alexander Alten-Lorenz

Matt, 

Apache Yarn is quite stable, and works in production well - so far. Additional, 
yarn is backward compatible, so MRv1 jobs running too. 
For vendor specific questions contact your distribution vendor. 

- Alex

Sent from my iPad

 On 24 Apr 2014, at 22:17, Matt K matvey1...@gmail.com wrote:
 
 We run a number of mission-critical MapReduce jobs daily in our production 
 cluster, mostly on top of HBase. In the past, we've hit a number of Hadoop 
 bugs, and found it difficult to maintain a solid SLA.
 
 We are now moving to CDH5 and evaluating if we should move to YARN or keep 
 running Hadoop 1. YARN is very compelling, but it's also relatively young. I 
 know that Cloudera recommends YARN over Hadoop 1 in CDH5, but I could use a 
 second opinion :)
 
 Can someone running YARN in a mission-critical Production environment share 
 their experience, specifically as it relates to stability? I realize that 
 this is a question that lends itself to a somewhat subjective answer.
 
 Thanks,
 -Matt

Re: State of Art in Hadoop Log aggregation

2013-10-11 Thread Alexander Alten-Lorenz

Hi,

http://flume.apache.org

- Alex

On Oct 11, 2013, at 7:36 AM, Sagar Mehta sagarme...@gmail.com wrote:

 Hi Guys,
 
 We have fairly decent sized Hadoop cluster of about 200 nodes and was 
 wondering what is the state of art if I want to aggregate and visualize 
 Hadoop ecosystem logs, particularly
 Tasktracker logs
 Datanode logs
 Hbase RegionServer logs
 One way is to use something like a Flume on each node to aggregate the logs 
 and then use something like Kibana - 
 http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make 
 them searchable.
 
 However I don't want to write another ETL for the hadoop/hbase logs  
 themselves. We currently log in to each machine individually to 'tail -F 
 logs' when there is an hadoop problem on a particular node.
 
 We want a better way to look at the hadoop logs themselves in a centralized 
 way when there is an issue without having to login to 100 different machines 
 and was wondering what is the state of are in this regard.
 
 Suggestions/Pointers are very welcome!!
 
 Sagar

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Sqoop and Hadoop

2013-07-10 Thread Alexander Alten-Lorenz

Moving to u...@sqoop.apache.org

For the original question - ONE! google search (sqoop postgres hdfs):
http://alexkehayias.tumblr.com/post/44153307024/importing-postgres-data-into-hadoop-hdfs

Regards

On Jul 10, 2013, at 9:59 AM, Fatih Haltas fatih.hal...@nyu.edu wrote:

 Hi Everyone, 
 
 I am trying to import data from postgresql to hdfs via sqoop, however, all 
 examples, i got on internet is talking about hive,hbase etc. kind of 
 system,running within hadoop.
 
 I am not using, any of these systems, isnt it possible to import data without 
 having those kind of systems,running on hadoop via sqoop?
 
 In other words, I am using hadoop and mapreduce systems itself alone, is it 
 possible to import data from postgresql to that basic hadoop system via sqoop?

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Migration needed when updating within an Hadoop release

2013-06-14 Thread Alexander Alten-Lorenz

Hi Björn,

 has it ever happened that a migration of persistent data has been needed (or 
 automatically executed) when updating a Hadoop installation within a release? 
 If so, where could I find information regarding such needed migration?

Normally, when you change the minor release, you need to upgrade HDFS 
(http://hadoop.apache.org/docs/stable/hdfs_user_guide.html#Upgrade+and+Rollback).
 This will happen when you switch major branches. 

 I would be interested because the runtime of such migration would probably 
 depend on the amount of managed data and had to be planned.
Depends how much data you've stored. Michael has written a excellent blog post 
about:
http://www.michael-noll.com/blog/2011/08/23/performing-an-hdfs-upgrade-of-an-hadoop-cluster/

Cheers,
 Alex

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Migration needed when updating within an Hadoop release

2013-06-14 Thread Alexander Alten-Lorenz

Excuse the typo, should be :
Normally, when you change the major release, you need to upgrade HDFS 
(http://hadoop.apache.org/docs/stable/hdfs_user_guide.html#Upgrade+and+Rollback).
 This will happen when you switch major branches. 

On Jun 14, 2013, at 12:10 PM, Alexander Alten-Lorenz wget.n...@gmail.com 
wrote:

 Hi Björn,
 
 has it ever happened that a migration of persistent data has been needed (or 
 automatically executed) when updating a Hadoop installation within a 
 release? If so, where could I find information regarding such needed 
 migration?
 
 Normally, when you change the minor release, you need to upgrade HDFS 
 (http://hadoop.apache.org/docs/stable/hdfs_user_guide.html#Upgrade+and+Rollback).
  This will happen when you switch major branches. 
 
 I would be interested because the runtime of such migration would probably 
 depend on the amount of managed data and had to be planned.
 Depends how much data you've stored. Michael has written a excellent blog 
 post about:
 http://www.michael-noll.com/blog/2011/08/23/performing-an-hdfs-upgrade-of-an-hadoop-cluster/
 
 Cheers,
 Alex
 
 --
 Alexander Alten-Lorenz
 http://mapredit.blogspot.com
 German Hadoop LinkedIn Group: http://goo.gl/N8pCF
 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Migration needed when updating within an Hadoop release

2013-06-14 Thread Alexander Alten-Lorenz

Hi Björn,

 - But what about minor updates, e. g. from 1.0.1 to 1.0.4? Has this ever 
 happened for such updates?

You will probably see log messages like 'RPC version mismatch', in this case 
you have to upgrade the filesystem. If not - all well :)

 - What about HBase minor releases in this context? Have data model changes/ 
 conversions/ migrations ever happened between minor releases?

Yes, but not regularly. Please see my next answer.

 - Is there a similar wiki page describing HBase upgrade?


http://hbase.apache.org/upgrading.html


Cheers,
 Alex


--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re:

2013-06-04 Thread Alexander Alten-Lorenz

Hi Matteo,

Are you able to add more space to your test machines? Also, what says the pi 
example (hadoop jar hadoop-examples pi 10 10 ?

- Alex

On Jun 4, 2013, at 4:34 PM, Lanati, Matteo matteo.lan...@lrz.de wrote:

 Hi again,
 
 unfortunately my problem is not solved.
 I downloaded Hadoop v. 1.1.2a and made a basic configuration as suggested in 
 [1].
 No security, no ACLs, default scheduler ... The files are attached.
 I still have the same error message. I also tried another Java version (6u45 
 instead of 7u21).
 How can I increase the debug level to have a deeper look?
 Thanks,
 
 Matteo
 
 
 [1] 
 http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html#Cluster+Restartability
 On Jun 4, 2013, at 3:52 AM, Azuryy Yu azury...@gmail.com wrote:
 
 Hi Harsh,
 
 I need to take care my eyes recently, I mis-read 1.2.0 to 1.0.2, so I said 
 upgrade. Sorry.
 
 
 On Tue, Jun 4, 2013 at 9:46 AM, Harsh J ha...@cloudera.com wrote:
 Azuryy,
 
 1.1.2  1.2.0. Its not an upgrade you're suggesting there. If you feel
 there's been a regression, can you comment that on the JIRA?
 
 On Tue, Jun 4, 2013 at 6:57 AM, Azuryy Yu azury...@gmail.com wrote:
 yes. hadoop-1.1.2 was released on Jan. 31st. just download it.
 
 
 On Tue, Jun 4, 2013 at 6:33 AM, Lanati, Matteo matteo.lan...@lrz.de wrote:
 
 Hi Azuryy,
 
 thanks for the update. Sorry for the silly question, but where can I
 download the patched version?
 If I look into the closest mirror (i.e.
 http://mirror.netcologne.de/apache.org/hadoop/common/), I can see that the
 Hadoop 1.1.2 version was last updated on Jan. 31st.
 Thanks in advance,
 
 Matteo
 
 PS: just to confirm that I tried a minimal Hadoop 1.2.0 setup, so without
 any security, and the problem is there.
 
 On Jun 3, 2013, at 3:02 PM, Azuryy Yu azury...@gmail.com wrote:
 
 can you upgrade to 1.1.2, which is also a stable release, and fixed the
 bug you facing now.
 
 --Send from my Sony mobile.
 
 On Jun 2, 2013 3:23 AM, Shahab Yunus shahab.yu...@gmail.com wrote:
 Thanks Harsh for the reply. I was confused too that why security is
 causing this.
 
 Regards,
 Shahab
 
 
 On Sat, Jun 1, 2013 at 12:43 PM, Harsh J ha...@cloudera.com wrote:
 Shahab - I see he has mentioned generally that security is enabled
 (but not that it happens iff security is enabled), and the issue here
 doesn't have anything to do with security really.
 
 Azurry - Lets discuss the code issues on the JIRA (instead of here) or
 on the mapreduce-dev lists.
 
 On Sat, Jun 1, 2013 at 10:05 PM, Shahab Yunus shahab.yu...@gmail.com
 wrote:
 HI Harsh,
 
 Quick question though: why do you think it only happens if the OP
 'uses
 security' as he mentioned?
 
 Regards,
 Shahab
 
 
 On Sat, Jun 1, 2013 at 11:49 AM, Harsh J ha...@cloudera.com wrote:
 
 Does smell like a bug as that number you get is simply
 Long.MAX_VALUE,
 or 8 exbibytes.
 
 Looking at the sources, this turns out to be a rather funny Java
 issue
 (there's a divide by zero happening and [1] suggests Long.MAX_VALUE
 return in such a case). I've logged a bug report for this at
 https://issues.apache.org/jira/browse/MAPREDUCE-5288 with a
 reproducible case.
 
 Does this happen consistently for you?
 
 [1]
 
 http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double)
 
 On Sat, Jun 1, 2013 at 7:27 PM, Lanati, Matteo matteo.lan...@lrz.de
 wrote:
 Hi all,
 
 I stumbled upon this problem as well while trying to run the
 default
 wordcount shipped with Hadoop 1.2.0. My testbed is made up of 2
 virtual
 machines: Debian 7, Oracle Java 7, 2 GB RAM, 25 GB hard disk. One
 node is
 used as JT+NN, the other as TT+DN. Security is enabled. The input
 file is
 about 600 kB and the error is
 
 2013-06-01 12:22:51,999 WARN
 org.apache.hadoop.mapred.JobInProgress: No
 room for map task. Node 10.156.120.49 has 22854692864 bytes free;
 but we
 expect map to take 9223372036854775807
 
 The logfile is attached, together with the configuration files. The
 version I'm using is
 
 Hadoop 1.2.0
 Subversion
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2
 -r
 1479473
 Compiled by hortonfo on Mon May  6 06:59:37 UTC 2013
 From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405
 This command was run using
 /home/lu95jib/hadoop-exmpl/hadoop-1.2.0/hadoop-core-1.2.0.jar
 
 If I run the default configuration (i.e. no securty), then the job
 succeeds.
 
 Is there something missing in how I set up my nodes? How is it
 possible
 that the envisaged value for the needed space is so big?
 
 Thanks in advance.
 
 Matteo
 
 
 
 Which version of Hadoop are you using. A quick search shows me a
 bug
 https://issues.apache.org/jira/browse/HADOOP-5241 that seems to
 show
 similar symptoms. However, that was fixed a long while ago.
 
 
 On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui 
 reduno1...@googlemail.com wrote:
 
 This the content of the jobtracker log file :
 2013-03-23 12:06:48,912 INFO
 org.apache.hadoop.mapred.JobInProgress:
 Input
 size for job job_201303231139_0001 =

Re: Source code jar for hadoop 1.2.0

2013-05-27 Thread Alexander Alten-Lorenz

Grab a mirror from http://www.apache.org/dyn/closer.cgi/hadoop/common/ and 
download hadoop-1.2.0.tar.gz 

On May 27, 2013, at 10:26 AM, Sznajder ForMailingList 
bs4mailingl...@gmail.com wrote:

 Hi,
 
 Where could I download the jar containing the source code of Hadoop 1.2.0?
 
 Many thanks
 
 Benjamin

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Apache Flume Properties File

2013-05-25 Thread Alexander Alten-Lorenz

Apache Flume is Apache Flume. Cloudera's Flume RPM's are Cloudera Flume RPM's. 
Simple, but true.

Cheers
- Alex

On May 25, 2013, at 1:19 AM, Raj Hadoop hadoop...@yahoo.com wrote:

 Hi,
 
 When I am reading all the stuff on internet on Flume, everything is mostly on 
 CDH distribution. I am aware that Flume is Cloudera's contribution but I am 
 using a strict Apache version in my research work. When I was reading all 
 this, I want to make sure from the forum that Apache flume if had any issues 
 with install etc., 
 
 So that is the reason why I had to sent it to the dist lists. My intention is 
 not to get a silver platter. I am not expecting that. Anyways - sorry for 
 inconvenience.
 
 Thanks,
 Raj
 
 
 
 
 From: Stephen Sprague sprag...@gmail.com
 To: u...@hive.apache.org; Raj Hadoop hadoop...@yahoo.com 
 Sent: Friday, May 24, 2013 6:32 PM
 Subject: Re: Apache Flume Properties File
 
 so you spammed three big lists there, eh? with a general question for 
 somebody to serve up a solution on a silver platter for you -- all before you 
 even read any documentation on the subject matter?
 
 nice job and good luck to you.
 
 
 On Fri, May 24, 2013 at 2:13 PM, Raj Hadoop hadoop...@yahoo.com wrote:
 Hi,
  
 I just installed Apache Flume 1.3.1 and trying to run a small example to 
 test. Can any one suggest me how can I do this? I am going through the 
 documentation right now.
  
 Thanks,
 Raj
 
 
 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: MS sql server hadoop connector

2013-02-25 Thread Alexander Alten-Lorenz

Hi,

+ us...@sqoop.apache.org
- user@hadoop.apache.org

Hi,

I moved the thread to the sqoop mailing list.
The main error indicates whats going wrong:
 mssqoop-sqlserver: java.io.IOException: the content of connector file must be 
 in form of key=value

From sqoop 1.4.2 on we support NVARCHAR for im/export, but sqoop hasn't a valid 
splitter for this kind of datatype. We had such a thread in past, follow the 
instructions here: 
http://mail-archives.apache.org/mod_mbox/sqoop-user/201210.mbox/%3C20121026221855.GG12835@jarcec-thinkpad%3E

- Alex


On Feb 25, 2013, at 10:33 PM, Swapnil Shinde swapnilushi...@gmail.com wrote:

 Hello
 I am newly trying to work with SQL server hadoop connector. We have installed 
 sqoop and SQL server connector properly. but i m getting below error while 
 running import command.
 I am not sure how to proceed with this so any help will be really great..
 
 13/02/25 16:18:32 ERROR sqoop.ConnFactory: Error loading ManagerFactory 
 information from file 
 /opt/mapr/sqoop/sqoop-1.4.2/bin/../conf/managers.d/mssqoop-sqlserver: 
 java.io.IOException: the content of connector file must be in form of 
 key=value
   at 
 org.apache.sqoop.ConnFactory.addManagersFromFile(ConnFactory.java:219)
   at 
 org.apache.sqoop.ConnFactory.loadManagersFromConfDir(ConnFactory.java:294)
   at 
 org.apache.sqoop.ConnFactory.instantiateFactories(ConnFactory.java:85)
   at org.apache.sqoop.ConnFactory.init(ConnFactory.java:62)
   at com.cloudera.sqoop.ConnFactory.init(ConnFactory.java:36)
   at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:201)
   at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:83)
   at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:464)
   at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
   at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
 
 13/02/25 16:18:32 INFO manager.SqlManager: Using default fetchSize of 1000
 13/02/25 16:18:32 INFO tool.CodeGenTool: Beginning code generation
 Feb 25, 2013 4:18:32 PM com.microsoft.sqlserver.jdbc.SQLServerConnection 
 init
 SEVERE: Java Runtime Environment (JRE) version 1.6 is not supported by this 
 driver. Use the sqljdbc4.jar class library, which provides support for JDBC 
 4.0.
 13/02/25 16:18:32 ERROR sqoop.Sqoop: Got exception running Sqoop: 
 java.lang.UnsupportedOperationException: Java Runtime Environment (JRE) 
 version 1.6 is not supported by this driver. Use the sqljdbc4.jar class 
 library, which provides support for JDBC 4.0.
 java.lang.UnsupportedOperationException: Java Runtime Environment (JRE) 
 version 1.6 is not supported by this driver. Use the sqljdbc4.jar class 
 library, which provides support for JDBC 4.0.
   at 
 com.microsoft.sqlserver.jdbc.SQLServerConnection.init(SQLServerConnection.java:238)
   at 
 com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:841)
   at java.sql.DriverManager.getConnection(DriverManager.java:582)
   at java.sql.DriverManager.getConnection(DriverManager.java:207)
   at 
 org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:663)
   at 
 org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:525)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:548)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:191)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:175)
   at 
 org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:262)
   at 
 org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1235)
   at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1060)
   at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
   at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:390)
   at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
   at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
   at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
 
 
 
 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Contribute to Hadoop Community

2013-02-19 Thread Alexander Alten-Lorenz

Hi,

http://wiki.apache.org/hadoop/HowToContribute

I forget to post this before.

Cheers,
 Alex


On Feb 19, 2013, at 8:23 AM, Varsha Raveendran varsha.raveend...@gmail.com 
wrote:

 Thank you very much for  a quick response.
 
 
 On Tue, Feb 19, 2013 at 12:12 PM, Alexander Alten-Lorenz 
 wget.n...@gmail.com wrote:
 Hey,
 
 Thank you for offer.
 Please open a account @https://issues.apache.org/jira/ and file a Jira about 
 your work, attach the patches and describe what changes you've done. To let 
 review the patches you've submitted please open a account @reviews.apache.org 
 and open a review request for your work.
 
 Thanks again,
  Alex
 
 On Feb 19, 2013, at 7:37 AM, Varsha Raveendran varsha.raveend...@gmail.com 
 wrote:
 
  Hello!
 
  I am at present working on supporting GA on Hadoop MapReduce. I have been 
  asked by my advisor to find out the process
   to contribute the project back to the  Hadoop community. I have looked 
  through 
  http://blog.cloudera.com/blog/2012/12/how-to-contribute-to-apache-hadoop-projects-in-24-minutes/
   and a couple of other sites but still I am unclear on how to initiate the 
  process.
 
  Please guide me.
 
 
  Thanks  Regards,
  Varsha
 
 --
 Alexander Alten-Lorenz
 http://mapredit.blogspot.com
 German Hadoop LinkedIn Group: http://goo.gl/N8pCF
 
 
 
 
 -- 
 -Varsha 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Contribute to Hadoop Community

2013-02-18 Thread Alexander Alten-Lorenz

Hey,

Thank you for offer.
Please open a account @https://issues.apache.org/jira/ and file a Jira about 
your work, attach the patches and describe what changes you've done. To let 
review the patches you've submitted please open a account @reviews.apache.org 
and open a review request for your work.

Thanks again,
 Alex

On Feb 19, 2013, at 7:37 AM, Varsha Raveendran varsha.raveend...@gmail.com 
wrote:

 Hello! 
 
 I am at present working on supporting GA on Hadoop MapReduce. I have been 
 asked by my advisor to find out the process
  to contribute the project back to the  Hadoop community. I have looked 
 through 
 http://blog.cloudera.com/blog/2012/12/how-to-contribute-to-apache-hadoop-projects-in-24-minutes/
  and a couple of other sites but still I am unclear on how to initiate the 
 process. 
 
 Please guide me. 
 
 
 Thanks  Regards,
 Varsha

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Need Information on Hadoop Cluster Set up

2013-02-13 Thread Alexander Alten-Lorenz

Hi,

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
http://mapredit.blogspot.de/p/get-hadoop-cluster-running-in-20.html

Just two of more blogs who describes a setup.

Hardware:
http://www.youtube.com/watch?v=UQJnJvwcsA8
http://books.google.de/books?id=Nff49D7vnJcCpg=PA259lpg=PA259dq=hadoop+commodity+hardware+specssource=blots=IihqWp8zRqsig=Dse6D7KO8XS5EcXQnCnShAl5Q70hl=ensa=Xei=0UwbUajSFozJswaXxoH4Awved=0CD4Q6AEwAg#v=onepageq=hadoop%20commodity%20hardware%20specsf=false

- Alex

On Feb 13, 2013, at 8:00 AM, Sandeep Jain sandeep_jai...@infosys.com wrote:

Dear Team,

We are in the initial phase for Hadoop learning and wanted to set up the
cluster for Hadoop Administration perspective.
Kindly help us on all the possible options for a makinga a hadoop cluster up
and running.
Do let us know the mimimum configurations of machines too.

Your help is much appreciated.

Regards,
Sandeep

CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are
not
to copy, disclose, or distribute this e-mail or its contents to any other
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has
taken
every reasonable precaution to minimize this risk, but is not liable for any
damage
you may sustain as a result of any virus in this e-mail. You should carry out
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: [OT] MapR m3

2013-02-11 Thread Alexander Alten-Lorenz

Please refer to a mapr mailinglist, thats a generic Apache Hadoop Users mailing 
list.  

On Feb 11, 2013, at 1:45 PM, Dhanasekaran Anbalagan bugcy...@gmail.com wrote:

 Hi Guys,
 
 I am new to MapR distribution. please share you guidance. we previously using 
 cloudera manger as set limitation. More than 50 nodes not support.
 please give idea, we planing to move production cluster.
 
 -Dhanasekaran.
 
 -Dhanasekaran. 
 Did I learn something today? If not, I wasted it.

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Support of RHEL version

2013-01-24 Thread Alexander Alten-Lorenz

Hi,

Moving the post to cdh-u...@cloudera.org 
(https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user)
as it is CDH4 you specifically are asking about. BCC'd user@hadoop
lists, lets carry forward the discussion on the CDH lists. My response
below.

RHEL 5.x and 6.x wil be supported. Means, RHEL 6.3 too.

Cheers,
 Alex

On Jan 24, 2013, at 6:28 PM, nilesh_sang...@dell.com wrote:

 Hi,
 
 We are working on implementing Cloudera distributed Hadoop (CDH 4.x) on our 
 environment.  Cloudera website talks about supporting RHEL 6.1 version with 
 challenges/issues with the newer version. It also though provides a 
 workaround for it. Wanted to hear from the community on the supported 
 versions of RedHat Linux and any guidance on which version to choose?
 
 -Nilesh
 
 
 https://ccp.cloudera.com/display/CDH4DOC/Known+Issues+and+Work+Arounds+in+CDH4
 
 
 Red Hat Linux (RHEL 6.2 and 6.3)
 - Poor performance running Hadoop on RHEL 6.2 or later when transparent 
 hugepage compaction is enabled
 RHEL 6.2 and 6.3 include a feature called transparent hugepage compaction 
 which interacts poorly with Hadoop workloads. This can cause a serious 
 performance regression compared to other operating system versions on the 
 same hardware.
 Symptom: top and other system monitoring tools show a large percentage of the 
 CPU usage classified as system CPU. If system CPU usage is 30% or more of 
 the total CPU usage, your system may be experiencing this issue.
 Bug: https://bugzilla.redhat.com/show_bug.cgi?id=805593
 Severity: Medium (up to 3x performance loss)
 Anticipated Resolution: Currently working with Red Hat to resolve for a 
 future RHEL update
 Workaround: Add the following command to /etc/rc.local to disable transparent 
 hugepage compaction:
 ?https://ccp.cloudera.com/display/CDH4DOC/Known+Issues+and+Work+Arounds+in+CDH4
 echo never  /sys/kernel/mm/redhat_transparent_hugepage/defrag
 
 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Support of RHEL version

2013-01-24 Thread Alexander Alten-Lorenz

Do you mean Transparent Huge Page Defrag 
(https://bugzilla.redhat.com/show_bug.cgi?id=805593)?

Do echo never  /sys/kernel/mm/redhat_transparent_hugepage/defrag

- Alex

On Jan 24, 2013, at 6:46 PM, Dheeren Bebortha dbebor...@salesforce.com wrote:

 I do not think this is a cdh specific issues. If this is a hbase compaction 
 issue it would be all pervading as long as it si RHEL62 and above!
 Am I reading it right?
 -Dheeren
 
 -Original Message-
 From: Alexander Alten-Lorenz [mailto:wget.n...@gmail.com] 
 Sent: Thursday, January 24, 2013 9:41 AM
 To: cdh-u...@cloudera.org
 Subject: Re: Support of RHEL version
 
 Hi,
 
 Moving the post to cdh-u...@cloudera.org
 (https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user)
 as it is CDH4 you specifically are asking about. BCC'd user@hadoop lists, 
 lets carry forward the discussion on the CDH lists. My response below.
 
 RHEL 5.x and 6.x wil be supported. Means, RHEL 6.3 too.
 
 Cheers,
 Alex
 
 On Jan 24, 2013, at 6:28 PM, nilesh_sang...@dell.com wrote:
 
 Hi,
 
 We are working on implementing Cloudera distributed Hadoop (CDH 4.x) on our 
 environment.  Cloudera website talks about supporting RHEL 6.1 version with 
 challenges/issues with the newer version. It also though provides a 
 workaround for it. Wanted to hear from the community on the supported 
 versions of RedHat Linux and any guidance on which version to choose?
 
 -Nilesh
 
 
 https://ccp.cloudera.com/display/CDH4DOC/Known+Issues+and+Work+Arounds
 +in+CDH4
 
 
 Red Hat Linux (RHEL 6.2 and 6.3)
 - Poor performance running Hadoop on RHEL 6.2 or later when 
 transparent hugepage compaction is enabled RHEL 6.2 and 6.3 include a 
 feature called transparent hugepage compaction which interacts poorly with 
 Hadoop workloads. This can cause a serious performance regression compared 
 to other operating system versions on the same hardware.
 Symptom: top and other system monitoring tools show a large percentage of 
 the CPU usage classified as system CPU. If system CPU usage is 30% or more 
 of the total CPU usage, your system may be experiencing this issue.
 Bug: https://bugzilla.redhat.com/show_bug.cgi?id=805593
 Severity: Medium (up to 3x performance loss) Anticipated Resolution: 
 Currently working with Red Hat to resolve for a future RHEL update
 Workaround: Add the following command to /etc/rc.local to disable 
 transparent hugepage compaction:
 ?https://ccp.cloudera.com/display/CDH4DOC/Known+Issues+and+Work+Aroun
 ds+in+CDH4 echo never  
 /sys/kernel/mm/redhat_transparent_hugepage/defrag
 
 
 
 --
 Alexander Alten-Lorenz
 http://mapredit.blogspot.com
 German Hadoop LinkedIn Group: http://goo.gl/N8pCF
 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: On a lighter note

2013-01-19 Thread Alexander Alten-Lorenz

Actually Der Untergang ;)

Alexander Alten-Lorenz
http://mapredit.blogspot.com
Twitter: @mapredit
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

On Jan 18, 2013, at 23:18, Ted Dunning tdunn...@maprtech.com wrote:

 Well, I think the actual name was untergang.  Same meaning. 
 
 Sent from my iPhone
 
 On Jan 17, 2013, at 8:09 PM, Mohammad Tariq donta...@gmail.com wrote:
 
 You are right Michael, as always :)
 
 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com
 
 
 On Fri, Jan 18, 2013 at 6:33 AM, Michael Segel michael_se...@hotmail.com 
 wrote:
 I'm thinking 'Downfall'
 
 But I could be wrong.
 
 On Jan 17, 2013, at 6:56 PM, Yongzhi Wang wang.yongzhi2...@gmail.com 
 wrote:
 
 Who can tell me what is the name of the original film? Thanks!
 
 Yongzhi
 
 
 On Thu, Jan 17, 2013 at 3:05 PM, Mohammad Tariq donta...@gmail.com wrote:
 I am sure you will suffer from severe stomach ache after watching this :)
 http://www.youtube.com/watch?v=hEqQMLSXQlY
 
 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com

Re: JobCache directory cleanup

2013-01-09 Thread Alexander Alten-Lorenz

Hi,

Per default (and not configurable) the logs will be persist for 30 days. This 
will be configurable in future 
(https://issues.apache.org/jira/browse/MAPREDUCE-4643).

- Alex

On Jan 9, 2013, at 3:41 PM, Ivan Tretyakov itretya...@griddynamics.com wrote:

 Hello!
 
 I've found that jobcache directory became very large on our cluster, e.g.:
 
 # du -sh /data?/mapred/local/taskTracker/user/jobcache
 465G/data1/mapred/local/taskTracker/user/jobcache
 464G/data2/mapred/local/taskTracker/user/jobcache
 454G/data3/mapred/local/taskTracker/user/jobcache
 
 And it stores information for about 100 jobs:
 
 # ls -1 /data?/mapred/local/taskTracker/persona/jobcache/  | sort | uniq |
 wc -l
 
 I've found that there is following parameter:
 
 property
  namemapreduce.jobtracker.retiredjobs.cache.size/name
  value1000/value
  descriptionThe number of retired job status to keep in the cache.
  /description
 /property
 
 So, if I got it right it intended to control job cache size by limiting
 number of jobs to store cache for.
 
 Also, I've seen that some hadoop users uses cron approach to cleanup
 jobcache:
 http://grokbase.com/t/hadoop/common-user/102ax9bze1/cleaning-jobcache-manually
 (
 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201002.mbox/%3c99484d561002100143s4404df98qead8f2cf687a7...@mail.gmail.com%3E
 )
 
 Are there other approaches to control jobcache size?
 What is more correct way to do it?
 
 Thanks in advance!
 
 P.S. We are using CDH 4.1.1.
 
 -- 
 Best Regards
 Ivan Tretyakov
 
 Deployment Engineer
 Grid Dynamics
 +7 812 640 38 76
 Skype: ivan.tretyakov
 www.griddynamics.com
 itretya...@griddynamics.com

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Google+ Hadoop Community

2012-12-15 Thread Alexander Alten-Lorenz

joined

On Dec 12, 2012, at 2:30 PM, anand sharma anand2sha...@gmail.com wrote:

 cool!
 
 
 On Wed, Dec 12, 2012 at 5:40 PM, Mohammad Tariq donta...@gmail.com wrote:
 
 Hello group,
 
   I have created a Google+ community, keeping those folks who are
 comparatively new to Hadoop, in mind.
 
 Although everybody, including me, knows that the official mailing lists
 are always the best place to get the remedies for all our queries, still if
 somebody feels like he/she needs some other place(only after the mailing
 list), can visit it here :
 https://plus.google.com/communities/113953784510797557961
 
 And I request rest of the guys to go there and provide your able guidance
 and valuable help to all the beginners, like you have always done.
 
 Regards,
Mohammad Tariq
 
 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Multiuser setup on Hive

2012-11-22 Thread Alexander Alten-Lorenz

You could use SASL / kerberos implementation within HiveServer2. Depends on a 
kerberosized cluster, too. Hive's metastore server provides the same mechanism, 
but isn't fully multi connect ready.
Here's a link:
http://ben-tech.blogspot.de/2012/10/hive-server-2-in-cdh41.html

- Alex

On Nov 22, 2012, at 7:46 AM, Austin Chungath austi...@gmail.com wrote:

 Hi,
 
 I had been trying to set up a multi user environment for hive.
 I have set up the hive metastore db in MySQL and hive works.
 
 Consider this scenario:
 
 user1 has created a database data1
 user2 has created a database data2
 
 Now user2 logs into hive and he is able to see and delete database data2
 
 How do I prevent this?
 
 Regards,
 Austin

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

57 matches

Mail list logo