Re: Anyone has tried accessing TDE using HDFS Java APIs

2018-01-29 Thread praveenesh kumar
Hi Ajay Did you get any chance to look into this. Thanks Regards Prav On Fri, Jan 26, 2018 at 8:48 AM, praveenesh kumar <praveen...@gmail.com> wrote: > Hi Ajay > > We are using HDP 2.5.5 with HDFS 2.7.1.2.5 > > Thanks > Prav > > On Thu, Jan 25, 2018 a

Re: Anyone has tried accessing TDE using HDFS Java APIs

2018-01-26 Thread praveenesh kumar
Hi Ajay We are using HDP 2.5.5 with HDFS 2.7.1.2.5 Thanks Prav On Thu, Jan 25, 2018 at 5:47 PM, Ajay Kumar <ajay.ku...@hortonworks.com> wrote: > Hi Praveenesh, > > > > What version of Hadoop you are using? > > > > Thanks, > > Ajay > > > > *Fro

Anyone has tried accessing TDE using HDFS Java APIs

2018-01-25 Thread praveenesh kumar
Hi We are trying to access TDE files using HDFS JAVA API. The user which is running the job has access to the TDE zone. We have tried accessing the file successfully in Hadoop FS Command shell. If we pass the same file in spark using the same user, it also gets read properly. Its just when we

Re: Teradata into hadoop Migration

2016-08-05 Thread praveenesh kumar
>From TD perspective have a look at this - https://youtu.be/NTTQdAfZMJA They are planning to opensource it. Perhaps you can get in touch with the team. Let me know if you are interested. If you are TD contacts, ask about this, they should be able to point to the right people. Again, this is not

Re: FileUtil.fullyDelete does ?

2016-07-26 Thread praveenesh kumar
https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/fs/FileUtil.html#fullyDelete(java.io.File) On Tue, Jul 26, 2016 at 12:09 PM, Divya Gehlot wrote: > Resending to right list > -- Forwarded message -- > From: "Divya Gehlot"

Best way to pass project resources in Java MR

2015-12-16 Thread praveenesh kumar
Hello folks Basic Java MR question. What is the right practice to pass any program specific files (sitting in src/main/resources) in your project folder to your mapper. I tried the following and it didn't work 1. Inside mapper setup() ->

Re: Falcon usecases

2015-12-04 Thread praveenesh kumar
ort forums. (This is generally true for any > vendor product that differentiates from the Apache distro.) > > I hope this helps. > > --Chris Nauroth > > From: praveenesh kumar <praveen...@gmail.com> > Date: Wednesday, December 2, 2015 at 10:01 AM > To: "u

Falcon usecases

2015-12-02 Thread praveenesh kumar
Hello hadoopers Just curious to understand what is the current state of falcon.. How much it is currently being adopted in the industry.. Anyone even using it other than the creators? There is not much information on the internet about falcon examples and use cases but then it is coming along in

Delete a folder name containing *

2014-08-20 Thread praveenesh kumar
Hi team I am in weird situation where I have following HDFS sample folders /data/folder/ /data/folder* /data/folder_day /data/folder_day/monday /data/folder/1 /data/folder/2 I want to delete /data/folder* without deleting its sub_folders. If I do hadoop fs -rmr /data/folder* it will delete

Re: Delete a folder name containing *

2014-08-20 Thread praveenesh kumar
With renaming - you would use the mv command hadoop fs -mv /data/folder* /data/new_folder. Won't it move all the sub_dirs along with that ? On Wed, Aug 20, 2014 at 12:00 PM, dileep kumar dileep...@gmail.com wrote: Just Rename the folder. On Wed, Aug 20, 2014 at 6:53 AM, praveenesh kumar

Re: Delete a folder name containing *

2014-08-20 Thread praveenesh kumar
On Wed, Aug 20, 2014 at 4:35 PM, praveenesh kumar praveen...@gmail.com wrote: With renaming - you would use the mv command hadoop fs -mv /data/folder* /data/new_folder. Won't it move all the sub_dirs along with that ? On Wed, Aug 20, 2014 at 12:00 PM, dileep kumar dileep...@gmail.com wrote

Re: Delete a folder name containing *

2014-08-20 Thread praveenesh kumar
-na On Wed, Aug 20, 2014 at 6:22 PM, praveenesh kumar praveen...@gmail.com wrote: No, I have tried all usual things like single quotes, double quotes, escape character.. but it is not working. I wonder what is escape char with Hadoop FS utility. On Wed, Aug 20, 2014 at 1:26 PM, Ritesh

Re: HDFS multi-tenancy and federation

2014-07-15 Thread praveenesh kumar
, praveenesh kumar praveen...@gmail.com wrote: Hi Shani, I haven't done any implementation on HDFS federation, but as far as I know, 1 namenode can handle only 1 namespace at this time. I hope that helps. Regards Prav On Wed, Feb 5, 2014 at 8:05 AM, Shani Ranasinghe shanir...@gmail.com

Re: Doubt

2014-03-19 Thread praveenesh kumar
Why not ? Its just a matter of installing 2 different packages. Depends on what do you want to use it for, you need to take care of few things, but as far as installation is concerned, it should be easily doable. Regards Prav On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com

Re: java.lang.OutOfMemoryError: Java heap space

2014-02-07 Thread praveenesh kumar
/ pig.maxCombinedSplitSize (128MB by default). So you can try to lower pig.maxCombinedSplitSize. But I admit Pig internal data types are not memory-efficient, and that is an optimization opportunity. Contribute! On Thu, Feb 6, 2014 at 2:54 PM, praveenesh kumar praveen...@gmail.com wrote: Its a normal join

Re: java.lang.OutOfMemoryError: Java heap space

2014-02-07 Thread praveenesh kumar
of those bags at some point. This trick saved me several times in the past particularly when I dealt with bags of large chararrays. Just a thought. Hope this is helpful. Thanks, Cheolsoo On Fri, Feb 7, 2014 at 7:37 AM, praveenesh kumar praveen...@gmail.com wrote: Thanks Park for sharing

java.lang.OutOfMemoryError: Java heap space

2014-02-06 Thread praveenesh kumar
Hi all, I am running a Pig Script which is running fine for small data. But when I scale the data, I am getting the following error at my map stage. Please refer to the map logs as below. My Pig script is doing a group by first, followed by a join on the grouped data. Any clues to understand

Re: java.lang.OutOfMemoryError: Java heap space

2014-02-06 Thread praveenesh kumar
iPhone On Feb 6, 2014, at 11:25 AM, praveenesh kumar praveen...@gmail.com wrote: Hi all, I am running a Pig Script which is running fine for small data. But when I scale the data, I am getting the following error at my map stage. Please refer to the map logs as below. My Pig script

Re: HDFS multi-tenancy and federation

2014-02-05 Thread praveenesh kumar
Hi Shani, I haven't done any implementation on HDFS federation, but as far as I know, 1 namenode can handle only 1 namespace at this time. I hope that helps. Regards Prav On Wed, Feb 5, 2014 at 8:05 AM, Shani Ranasinghe shanir...@gmail.comwrote: Hi, Any help on this please? On Mon, Feb

Re: HDFS copyToLocal and get crc option

2014-01-31 Thread praveenesh kumar
Hi Tom, My hint is your BLOCKSIZE should be multiple of CRC. Check your property dfs.block.size - convert it into bytes, then divide it with the checksum value that is set, usually its dfs.bytes-per-checksum property that tells this value or you can get the checksum value from the error message

Re: DistributedCache deprecated

2014-01-30 Thread praveenesh kumar
on that for my better understanding, it will be great help. Thanks Amit On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar praveen...@gmail.comwrote: Hi Amit, I am not sure how are they linked with DistributedCache.. Job configuration is not uploading any data in memory.. As far as I am aware

Re: DistributedCache deprecated

2014-01-29 Thread praveenesh kumar
I think you can use the Job class. http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html Regards Prav On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael michael.giord...@vistronix.com wrote: I noticed that in Hadoop 2.2.0

Re: DistributedCache deprecated

2014-01-29 Thread praveenesh kumar
that its present and local on ALL nodes. I didnt know it was deprecated, but there must be some replacement for it - or maybe it just got renamed and moved? SO ... what is the future of the DistributedCache for mapreduce jobs? On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar praveen

Re: DistributedCache deprecated

2014-01-29 Thread praveenesh kumar
the API. I hope that makes sense. Regards Prav On Wed, Jan 29, 2014 at 9:41 PM, praveenesh kumar praveen...@gmail.comwrote: @Jay - I don't know how Job class is replacing the DistributedCache class , but I remember trying distributed cache functions like void *addArchiveToClassPath http

Re: DistributedCache deprecated

2014-01-29 Thread praveenesh kumar
of the Job() constructors have also been marked deprecated. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html How do you create a new Job instance? Is there a factory class? Thanks, Mike G. -- *From:* praveenesh kumar praveen

Oozie dynamic action

2013-09-17 Thread praveenesh kumar
Hi, I have a scenario in which I want to trigger a hive uploading script every day. I have a set of folders created for a set of customer ids everyday. My hive script will read the customer id from the path, checks whether the table for the customer id exits and if not create a table and will

Hadoop Streaming - how to specify mapper scripts hosting on HDFS

2013-03-28 Thread praveenesh kumar
Hi, I am trying to run a hadoop streaming job, where I want to specify my mapper script residing on HDFS. Currently its trying to locate the script on local FS only. Is there a option available through which I can specify hadoop streaming to look for the mapper script on HDFS, not on local FS.

Re: how to find top N values using map-reduce ?

2013-02-02 Thread praveenesh kumar
: there's one thing i want to clarify that you can use multi-reducers to sort the data globally and then cat all the parts to get the top n records. The data in all parts are globally in order. Then you may find the problem is much easier. 在 2013-2-2 下午3:18,praveenesh kumar praveen...@gmail.com写

Re: how to find top N values using map-reduce ?

2013-02-02 Thread praveenesh kumar
: there's one thing i want to clarify that you can use multi-reducers to sort the data globally and then cat all the parts to get the top n records. The data in all parts are globally in order. Then you may find the problem is much easier. 在 2013-2-2 下午3:18,praveenesh kumar praveen...@gmail.com写

how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar
I am looking for a better solution for this. 1 way to do this would be to find top N values from each mappers and then find out the top N out of them in 1 reducer. I am afraid that this won't work effectively if my N is larger than number of values in my inputsplit (or mapper input). Otherway

Re: how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar
dataset * How many mappers you have * Do input splits correlate with the sorting criterion for top N? Depending on the answers, very different strategies will be optimal. On Fri, Feb 1, 2013 at 9:05 PM, praveenesh kumar praveen...@gmail.comwrote: I am looking for a better solution

Re: how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar
://gist.github.com/4696443 https://github.com/linkedin/datafu On Fri, Feb 1, 2013 at 11:17 PM, praveenesh kumar praveen...@gmail.comwrote: Actually what I am trying to find to top n% of the whole data. This n could be very large if my data is large. Assuming I have uniform rows of equal size

Re: how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar
dataset * How many mappers you have * Do input splits correlate with the sorting criterion for top N? Depending on the answers, very different strategies will be optimal. On Fri, Feb 1, 2013 at 9:05 PM, praveenesh kumar praveen...@gmail.comwrote: I am looking for a better solution

Re: how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar
://gist.github.com/4696443 https://github.com/linkedin/datafu On Fri, Feb 1, 2013 at 11:17 PM, praveenesh kumar praveen...@gmail.comwrote: Actually what I am trying to find to top n% of the whole data. This n could be very large if my data is large. Assuming I have uniform rows of equal size

Re: MRBench Maps strange behaviour

2012-08-29 Thread praveenesh kumar
Then the question arises how MRBench is using the parameters : According to the mail he send... he is running MRBench with the following parameter: * hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10 * I guess he is assuming the MRbench to launch 10 mappers and 10

Measuring Shuffle time for MR job

2012-08-27 Thread praveenesh kumar
Is there a way to know the total shuffle time of a map-reduce job - I mean some command or output that can tell that ? I want to measure total map, total shuffle and total reduce time for my MR job -- how can I achieve it ? I am using hadoop 0.20.205 Regards, Praveenesh

Re: Dealing with low space cluster

2012-06-14 Thread praveenesh kumar
I don't know whether this will work or not.. but you can give it a shot..(I am assuming you are having 8 nodes as hadoop cluster) 1. Mount 1 TB hard disk to one of the DN. 2. Put it to HDFS. I think once its on HDFS.. it will automatically gets distributed. Regards, Praveenesh On Thu, Jun 14,

Re: Dealing with low space cluster

2012-06-14 Thread praveenesh kumar
@Harsh --- I was wondering...although it doesn't make much/any sense --- if a person wants to store the files only on HDFS (something like a backup) consider the above hardware scenario --- no MR processing, In that case, it should be possible to have a file with a size more than 20 GB to be

Hadoop cluster hardware configuration

2012-06-04 Thread praveenesh kumar
Hello all, I am looking forward to build a 5 node hadoop cluster with the following configurations per machine. -- 1. Intel Xeon E5-2609 (2.40GHz/4-core) 2. 32 GB RAM (8GB 1Rx4 PC3) 3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB storage/machine) 4. Ethernet 1GbE connection I would like the

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
I am not sure what could be the exact issue but when configuring secondary NN to NN, you need to tell your SNN where the actual NN resides. Try adding - dfs.http.address on your secondary namenode machine having value as NN:port on hdfs-site.xml Port should be on which your NN url is opening -

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: lunes, 04 de junio de 2012 13:15 To: common-user@hadoop.apache.org Subject: Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException I am not sure what could be the exact issue but when configuring

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
problem on my real cluster. Will try to explicitly configure starting IP for this SNN. -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: lunes, 04 de junio de 2012 14:02 To: common-user@hadoop.apache.org Subject: Re: SecondaryNameNode not connecting

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
, 2012 at 5:42 PM, ramon@accenture.com wrote: /etc/hosts 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.0.10 hadoop00 192.168.0.11 hadoop01 192.168.0.12 hadoop02 -Original Message- From: praveenesh kumar

Re: Hadoop cluster hardware configuration

2012-06-04 Thread praveenesh kumar
:57 PM, praveenesh kumar praveen...@gmail.com wrote: Hello all, I am looking forward to build a 5 node hadoop cluster with the following configurations per machine. -- 1. Intel Xeon E5-2609 (2.40GHz/4-core) 2. 32 GB RAM (8GB 1Rx4 PC3) 3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
hadoop02 -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: lunes, 04 de junio de 2012 14:09 To: common-user@hadoop.apache.org Subject: Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException Also can you share your /etc/hosts file

Re: What happens when I do not output anything from my mapper

2012-06-04 Thread praveenesh kumar
You can control your map outputs based on any condition you want. I have done that - it worked for me. It could be your code problem that its not working for you. Can you please share your map code or cross-check whether your conditions are correct ? Regards, Praveenesh On Mon, Jun 4, 2012 at

Re: Trying to put 16gb file onto hdfs

2012-06-04 Thread praveenesh kumar
Check your Datanode logs.. or do hadoop fsck / or hadoop dfsadmin -report to get more details about your HDFS. Seems like DN is down. Regards, Praveenesh On Tue, Jun 5, 2012 at 12:13 AM, ramon@accenture.com wrote: Hi Sean, It seems your HDFS has not properly started. Go through your

Re: Text Analysis

2012-04-26 Thread praveenesh kumar
Rhive uses Hive Thrift server to connect with Hive. You can execute hive queries and get results back into R data frames. and then play around with it using R libraries. Its pretty interesting project, given that you have Hive setup on top of hadoop. Regards, Praveenesh On Thu, Apr 26, 2012 at

Re: Pre-requisites for hadoop 0.23/CDH4

2012-04-19 Thread praveenesh kumar
max container size to 1024M (max given to NM) by setting yarn.scheduler.capacity.maximum-allocation-mb Arun On Apr 18, 2012, at 8:00 PM, praveenesh kumar wrote: Hi, Sweet.. Can you please elaborate how can I tweak my configs to make CDH4/hadoop-0.23 run in 1.5GB RAM VM. Regards

Re: Pre-requisites for hadoop 0.23/CDH4

2012-04-18 Thread praveenesh kumar
be a good value to use for RAM if available (1.0 will do too, if you make sure to tweak your configs to not use too much heap memory). Single processor should do fine for testing purposes. On Tue, Apr 17, 2012 at 8:51 PM, praveenesh kumar praveen...@gmail.com wrote: I am looking to test

Pre-requisites for hadoop 0.23/CDH4

2012-04-17 Thread praveenesh kumar
I am looking to test hadoop 0.23 or CDH4 beta on my local VM. I am looking to execute the sample example codes in new architecture, play around with the containers/resource managers. Is there any pre-requisite on default memory/CPU/core settings I need to keep in mind before setting up the VM.

Re: How can I configure oozie to submit different workflows from different users ?

2012-04-02 Thread praveenesh kumar
for the proxyuser (hosts/groups) settings. You have to use explicit hosts/groups. Thxs. Alejandro PS: please follow up this thread in the oozie-us...@incubator.apache.org On Mon, Apr 2, 2012 at 2:15 PM, praveenesh kumar praveen...@gmail.com wrote: Hi all, I want to use oozie to submit

Re: How can I configure oozie to submit different workflows from different users ?

2012-04-02 Thread praveenesh kumar
that valid values for proxyuser groups, as the property name states are GROUPS, not USERS. thxs. Alejandro On Mon, Apr 2, 2012 at 2:27 PM, praveenesh kumar praveen...@gmail.com wrote: How can I specify multiple users /groups for proxy user setting ? Can I give comma separated values

MR job launching is slower

2012-03-20 Thread praveenesh kumar
I have 10 node cluster ( around 24 CPUs, 48 GB RAM, 1 TB HDD, 10 GB ethernet connection) After triggering any MR job, its taking like 3-5 seconds to launch ( I mean the time when I can see any MR job completion % on the screen). I know internally its trying to launch the job,intialize mappers,

Re: Need help on hadoop eclipse plugin

2012-02-28 Thread praveenesh kumar
help me in debugging this issue ? Thanks, Praveenesh On Tue, Feb 28, 2012 at 1:12 PM, praveenesh kumar praveen...@gmail.comwrote: Hi all, I am trying to use hadoop eclipse plugin on my windows machine to connect to the my remote hadoop cluster. I am currently using putty to login

Need help on hadoop eclipse plugin

2012-02-27 Thread praveenesh kumar
Hi all, I am trying to use hadoop eclipse plugin on my windows machine to connect to the my remote hadoop cluster. I am currently using putty to login to the cluster. So ssh is enable and my windows machine is able to listen to my hadoop cluster. I am using hadoop 0.20.205, hadoop-eclipse plugin

Re: Hadoop eclipse plugin on IBM RAD 8.0

2012-02-26 Thread praveenesh kumar
Okay, I figured it out. I need to put the hadoop-eclipse plugin.jar file in $RAD_INSTALLED_DIR/features directory. Please comment if you feel I am doing something wrong. Thanks, Praveenesh On Mon, Feb 27, 2012 at 11:31 AM, praveenesh kumar praveen...@gmail.comwrote: Is there a way to make IBM

Re: Security at file level in Hadoop

2012-02-22 Thread praveenesh kumar
You can probably use hadoop fs - chmod permission filename as suggested above. You can provide r/w permissions as you provide for general unix files. Can you please share your experiences on this thing ? Thanks, Praveenesh On Wed, Feb 22, 2012 at 4:37 PM, Ben Smithers

Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-22 Thread praveenesh kumar
If I am correct : For setting mappers/node --- mapred.tasktracker.map.tasks.maximum For setting reducers/node --- mapred.tasktracker.reduce.tasks.maximum For setting mappers/job mapred.map.tasks (applicable for whole cluster) For setting reducers/job mapred.reduce.tasks(same) You can

Number of Under-Replicated Blocks ?

2012-02-19 Thread praveenesh kumar
Hi, I am suddenly seeing some under-replicated blocks on my cluster. Although its not causing any problems, but It seems like few blocks are not replicated properly. Number of Under-Replicated Blocks : 147 Is it okay behavior on hadoop. If no, How can I know what are the files with

Re: Number of Under-Replicated Blocks ?

2012-02-19 Thread praveenesh kumar
replicas for all MR job submit data), or bit rot of existing blocks on HDDs around the cluster, etc. -- You can mostly spot the pattern of files causing it by running the fsck and obtaining the listing. On Mon, Feb 20, 2012 at 11:43 AM, praveenesh kumar praveen...@gmail.com wrote: Hi, I am

Regression on Hadoop ?

2012-02-08 Thread praveenesh kumar
Guys, Is there any regression API/tool that is developed on top of hadoop *(APART from mahout) *? Thanks, Praveenesh

Re: Retail receipt analysis

2012-02-03 Thread praveenesh kumar
You can also use R-hadoop package that allows you to run R statistical algos on hadoop. Thanks, Praveenesh On Fri, Feb 3, 2012 at 10:54 PM, Harsh J ha...@cloudera.com wrote: You may want to check out Apache Mahout: http://mahout.apache.org On Fri, Feb 3, 2012 at 10:31 PM, Fabio Pitzolu

How to convert sequence file into normal text file

2012-02-01 Thread praveenesh kumar
I am running SimpleKmeansClustering sample code from mahout in action. How can I convert sequence file written using SequenceFile.Writer into plain HDFS file so that I can read it properly. I know mahout has seqdumper tool to read it. But I want to create normal text file rather than sequence file

Re: How to convert sequence file into normal text file

2012-02-01 Thread praveenesh kumar
be an identity job with sequence-file input format and text output format. On Wed, Feb 1, 2012 at 3:28 PM, praveenesh kumar praveen...@gmail.com wrote: I am running SimpleKmeansClustering sample code from mahout in action. How can I convert sequence file written using SequenceFile.Writer into plain

Re: Why $HADOOP_PREFIX ?

2012-02-01 Thread praveenesh kumar
: $HADOOP_HOME is deprecated is always there. whether the variable is set or not. Why? Because the hadoop-config is sourced in all scripts. And all it does is sets HADOOP_PREFIX as HADOOP_HOME. I think this can be reported as a bug. -P On Wed, Feb 1, 2012 at 5:46 PM, praveenesh kumar praveen...@gmail.com

Re: Best practices for hadoop shuffling/tunning ?

2012-01-31 Thread praveenesh kumar
Can anyone please eyeball the config parameters as defined below and share their thoughts on this ? Thanks, Praveenesh On Mon, Jan 30, 2012 at 6:20 PM, praveenesh kumar praveen...@gmail.comwrote: Hey guys, Just wanted to ask, are there any sort of best practices to be followed for hadoop

Re: Killing hadoop jobs automatically

2012-01-30 Thread praveenesh kumar
{mapred.task.timeout} of Reporter to your desired value. Good Luck. On 01/30/2012 04:14 PM, praveenesh kumar wrote: Yeah, I am aware of that, but it needs you to explicity monitor the job and look for jobid and then hadoop job -kill command. What I want to know - Is there anyway to do all

Best practices for hadoop shuffling/tunning ?

2012-01-30 Thread praveenesh kumar
Hey guys, Just wanted to ask, are there any sort of best practices to be followed for hadoop shuffling improvements ? I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs with 48 GB RAM. I have set the following parameters : fs.inmemory.size.mb=2000 io.sort.mb=2000

Re: Namenode service not running on the Configured IP address

2012-01-30 Thread praveenesh kumar
Have you configured your hostname and localhost with your IP in /etc/hosts file. Thanks, Praveenesh On Tue, Jan 31, 2012 at 3:18 AM, anil gupta anilgupt...@gmail.com wrote: Hi All, I am using hadoop-0.20.2 and doing a fresh installation of a distributed Hadoop cluster along with Hbase.I am

Re: Any info on R+Hadoop

2012-01-29 Thread praveenesh kumar
not like being engrossed with hassles that hadoop streaming can bring. -P P.S. I am not endorsing anyone. It's just my view. On Sun, Jan 29, 2012 at 12:54 PM, praveenesh kumar praveen...@gmail.com wrote: Does anyone has done any work with R + Hadoop ? I know there are some flavors of R

Killing hadoop jobs automatically

2012-01-29 Thread praveenesh kumar
Is there anyway through which we can kill hadoop jobs that are taking enough time to execute ? What I want to achieve is - If some job is running more than _some_predefined_timeout_limit, it should be killed automatically. Is it possible to achieve this, through shell scripts or any other way ?

Re: Killing hadoop jobs automatically

2012-01-29 Thread praveenesh kumar
get killed automatically Thanks, Praveenesh On Mon, Jan 30, 2012 at 12:38 PM, Prashant Kommireddi prash1...@gmail.comwrote: You might want to take a look at the kill command : hadoop job -kill jobid. Prashant On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar praveen...@gmail.com wrote

Any info on R+Hadoop

2012-01-28 Thread praveenesh kumar
Does anyone has done any work with R + Hadoop ? I know there are some flavors of R+Hadoop available such as rmr,rhdfs, RHIPE, R-hive But as far as I know submitting jobs using Hadoop Streaming is the best way right now available. Am I right ? Any info on R on Hadoop ? Thanks, Praveenesh

Reduce copy at 0.00 MB/s

2012-01-25 Thread praveenesh kumar
Hey, Can anyone explain me what is reduce copy phase in the reducer section ? The (K,List(V)), is passed to the reducer. Is reduce copy representing copying of K,List(V) on the reducer from all mappers ? I am monitoring my jobs on the cluster, using Jobtracker url. I am seeing for most of my

Understanding fair schedulers

2012-01-25 Thread praveenesh kumar
Understanding Fair Schedulers better. Can we create mulitple pools in Fair Schedulers. I guess Yes. Please correct me. Suppose I have 2 pools in my fair-scheduler.xml 1. Hadoop-users : Min map : 10, Max map : 50, Min Reduce : 10, Max Reduce : 50 2. Admin-users: Min map : 20, Max map : 80, Min

Re: Understanding fair schedulers

2012-01-25 Thread praveenesh kumar
name while running the job. By default, mapred.faircheduler.poolnameproperty set to user.name ( each job run by user is allocated to his named pool ) and you can also change this property to group.name. Srinivas -- Also, you can set On Wed, Jan 25, 2012 at 6:24 AM, praveenesh kumar praveen

Re: Reduce copy at 0.00 MB/s

2012-01-25 Thread praveenesh kumar
, instead of default 5%. This helps your MR performance overall, if you run multiple jobs at a time, as the reduce slots aren't wasted. On Wed, Jan 25, 2012 at 3:34 PM, praveenesh kumar praveen...@gmail.com wrote: Hey, Can anyone explain me what is reduce copy phase in the reducer section

Re: Understanding fair schedulers

2012-01-25 Thread praveenesh kumar
config described in http://hadoop.apache.org/common/docs/current/fair_scheduler.html#Allocation+File+%28fair-scheduler.xml%29 On Wed, Jan 25, 2012 at 7:01 PM, praveenesh kumar praveen...@gmail.com wrote: I am running pig jobs, how can I specify on which pool, it should run ? Also do you

Re: Understanding fair schedulers

2012-01-25 Thread praveenesh kumar
On Wed, Jan 25, 2012 at 7:55 PM, praveenesh kumar praveen...@gmail.comwrote: I am looking for the solution where we can do it permanently without specify these things inside jobs. I want to keep these things hidden from the end-user. End-user would just write pig scripts and all the jobs submitted

Re: Understanding fair schedulers

2012-01-25 Thread praveenesh kumar
into groups, and use group.name identifier to be the poolnameproperty. Would this work for you instead? On Wed, Jan 25, 2012 at 8:00 PM, praveenesh kumar praveen...@gmail.com wrote: Also, with the above mentioned method, my problem is I am having one pool/user (thats obviously not a good way

How to set file permissions internally on hadoop

2012-01-22 Thread praveenesh kumar
Hey guys, How can I configure HDFS so that internally I can set permissions on the data. I know there is a parameter called dfs.permissions that needs to be true, otherwise permissions won't work. Actually I had set it true previously, so that any user can use the HDFS data to run jobs on it.

Re: Best practices to recover from Corrupt Namenode

2012-01-20 Thread praveenesh kumar
a replica completely. HDFS should be very robust. With Yahoo's r=3, for a large cluster, the probability of losing a block during one year is less than 0.005 - Sameer On Wed, Jan 18, 2012 at 11:19 PM, praveenesh kumar praveen...@gmail.com wrote: Hi everyone, Any ideas on how to tackle

Re: Best practices to recover from Corrupt Namenode

2012-01-18 Thread praveenesh kumar
Hi everyone, Any ideas on how to tackle this kind of situation. Thanks, Praveenesh On Tue, Jan 17, 2012 at 1:02 PM, praveenesh kumar praveen...@gmail.comwrote: I have a replication factor of 2, because of the reason that I can not afford 3 replicas on my cluster. fsck output was saying block

Best practices to recover from Corrupt Namenode

2012-01-16 Thread praveenesh kumar
Hi guys, I just faced a weird situation, in which one of my hard disks on DN went down. Due to which when I restarted namenode, some of the blocks went missing and it was saying my namenode is CORRUPT and in safe mode, which doesn't allow you to add or delete any files on HDFS. I know , we can

Re: Best practices to recover from Corrupt Namenode

2012-01-16 Thread praveenesh kumar
generally refers to the fsimage or edits getting corrupted). Did your files not have adequate replication that they could not withstand the loss of one DN's disk? What exactly did fsck output? Did all block replicas go missing for your files? On 17-Jan-2012, at 12:08 PM, praveenesh kumar wrote

JobTracker webUI stopped showing suddenly

2012-01-11 Thread praveenesh kumar
Jobtracker webUI suddenly stopped showing. It was working fine before. What could be the issue ? Can anyone guide me how can I recover my WebUI ? Thanks, Praveenesh

Re: JobTracker webUI stopped showing suddenly

2012-01-11 Thread praveenesh kumar
. Please guide me.. why its happening like this ? Thanks, Praveenesh On Wed, Jan 11, 2012 at 7:32 PM, praveenesh kumar praveen...@gmail.comwrote: Its running,. I am running jobs on hadoop. they are running fine, Thanks, Praveenesh On Wed, Jan 11, 2012 at 7:20 PM, hadoop hive hadooph

Does tunning requires re-formatting Namenode ?

2012-01-08 Thread praveenesh kumar
Hey Guys, Do I need to format the namenode again if I am changing some HDFS configurations like blocksize, checksum, compression codec etc or is there any other way to enforce these new changes in the present cluster setup ? Thanks, Praveenesh

Hive starting error

2011-12-30 Thread praveenesh kumar
Hi, I am using Hive 0.7.1 on hadoop 0.20.205 While running hive. its giving me following error : Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.login(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation;

Re: Multi user Hadoop 0.20.205 ?

2011-12-29 Thread praveenesh kumar
On Dec 29, 2011, at 2:49, praveenesh kumar praveen...@gmail.com wrote:. Guys, Did someone try this thing ? Thanks On Tue, Dec 27, 2011 at 4:36 PM, praveenesh kumar praveen...@gmail.com wrote: Hey guys, How we can make hadoop as multiuser ? One way to think as whatever group

Unable to build pig from Trunk

2011-12-29 Thread praveenesh kumar
Hi everyone, I am trying to build Pig from SVN trunk on hadoop 0.20.205. While doing that, I am getting the following error : Any idea why its happening ? Thanks, Praveenesh root@lxe [/usr/local/hadoop/pig/new/trunk] $ -- ant jar-withouthadoop -verbose Apache Ant version 1.6.5 compiled on June

Re: Unable to build pig from Trunk

2011-12-29 Thread praveenesh kumar
.jar to see if your server can connect to that URL. If not you have some kind of connection issue with outgoing requests. --Joey On Thu, Dec 29, 2011 at 11:28 PM, praveenesh kumar praveen...@gmail.com wrote: Hi everyone, I am trying to build Pig from SVN trunk on hadoop 0.20.205. While

Re: Unable to build pig from Trunk

2011-12-29 Thread praveenesh kumar
, Dec 30, 2011 at 11:11 AM, praveenesh kumar praveen...@gmail.comwrote: When I am pinging its saying Unknown host... Is there any kind of proxy setting we need to do, when building from ant ? Thanks, Praveenesh On Fri, Dec 30, 2011 at 11:02 AM, Joey Krabacher jkrabac...@gmail.comwrote: Try

Re: Unable to build pig from Trunk

2011-12-29 Thread praveenesh kumar
at 11:54 PM, praveenesh kumar praveen...@gmail.com wrote: I set up proxy, Now I am getting the following error : root@lxe9700 [/usr/local/hadoop/pig/new/trunk] $ -- ant jar-withouthadoop -verbose Apache Ant version 1.6.5 compiled on June 5 2007 Buildfile: build.xml Detected Java

Re: Multi user Hadoop 0.20.205 ?

2011-12-28 Thread praveenesh kumar
Guys, Did someone try this thing ? Thanks On Tue, Dec 27, 2011 at 4:36 PM, praveenesh kumar praveen...@gmail.comwrote: Hey guys, How we can make hadoop as multiuser ? One way to think as whatever group we currently assigned to use hadoop, add users to same group and change permissions

Multi user Hadoop 0.20.205 ?

2011-12-27 Thread praveenesh kumar
Hey guys, How we can make hadoop as multiuser ? One way to think as whatever group we currently assigned to use hadoop, add users to same group and change permissions to hadoop.tmp.dir, mapred.system.dir, dfs.data.dir, and what not. I was playing on hadoop 0.20.205 and I observed we can't change

Secondary Namenode on hadoop 0.20.205 ?

2011-12-26 Thread praveenesh kumar
Hey people, How can we setup another machine in the cluster as Secondary Namenode in hadoop 0.20.205 ? Can a DN also act as SNN, any pros and cons of having this configuration ? Thanks, Praveenesh

Re: Secondary Namenode on hadoop 0.20.205 ?

2011-12-26 Thread praveenesh kumar
the SNN automatically as it starts DN and NN as well. also you can see http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/ Regards, Uma From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, December 26, 2011 5:05 PM

Wierd problem in installing hadoop on 2 machines.

2011-12-23 Thread praveenesh kumar
Hello people, So I am trying to install hadoop .20.205 on 2 machines Individually I am able to run hadoop on each machines. Now when I am configuring one machine as slave and other as master, and tryin to start hadoop, its not able to even execute hadoop-run commands on slave machine I am getting

  1   2   >