UserGroupInformation.getLoginUser: failure to login.

2016-06-16 Thread jay vyas
for UserGrouInfo or anything like that?)... -- jay vyas

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

2015-10-19 Thread jay vyas
l.com> wrote: >> >>> Hi all ! >>> >>> I started to use hadoop with aws, and a big question appears in front of >>> me! >>> >>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried >>> some trivial examples, and before moving forward i have one question. >>> >>> What is the better option for using Hadoop on AWS? >>> - Build it from scratch on a EC2 instance >>> - Use MapR distribution of Hadoop >>> - Use Amazon distribution of Hadoop >>> >>> Sorry if my question is too broad. >>> >>> Bye! >>> Jose >>> >>> >>> >>> >>> >> > -- jay vyas

Re: spark

2015-08-17 Thread Jay Vyas
For a start compare sparks word count with mapreduce word count. Then compare sparksql with hive. If you get that far for the final exersize, Find out for yourself by running bigpetstore-mapreduce and bigpetstore-spark side by side :). They are two similar applications which generate data

Re: How to test DFS?

2015-05-26 Thread jay vyas
been distributed successfully to all datanodes? I would like to demonstrate this capability in a short briefing for my colleagues. Can I access the file from the datanode itself (todate I can only access the files from the master node, not the slaves)? Thank you, Caesar. -- jay

Re: What skills to Learn to become Hadoop Admin

2015-03-07 Thread jay vyas
using Ambari, Cloudera Manager and Apache Hadoop. I have installed the services like hive, oozie, zookeeper etc. I have done a web log integration using flume and twitter sentiment analysis. I wanted to understand what are the other skills I should learn ? Thanks Krish -- jay vyas

Re: Interview Questions asked for Hadoop Admin

2015-02-12 Thread jay vyas
interview questions which was asked during their interview on Hadoop admin role? I found few on internet but if somebody who has attended the interview can give us an idea , that will be great. Thanks Krish -- jay vyas

Re: Home for Apache Big Data Solutions?

2015-02-09 Thread Jay Vyas
Bigtop.. Yup! Mr Asanjar : why don't you post an email about what your doing on the Apache bigtop list, we'd love to hear from you. There could possibly be some overlap and our goal is to plumb the hadoop ecosystem as well On Feb 9, 2015, at 4:41 PM, Artem Ervits artemerv...@gmail.com

Re: Any working VM of Apache Hadoop ?

2015-01-18 Thread Jay Vyas
Also BigTop has a very flexible vagrant infrastructure: https://github.com/apache/bigtop/tree/master/bigtop-deploy/vm/vagrant-puppet On Jan 18, 2015, at 3:37 PM, Andre Kelpe ake...@concurrentinc.com wrote: Try our vagrant setup: https://github.com/Cascading/vagrant-cascading-hadoop-cluster

Re: HDFS-based database for Big and Small data?

2015-01-03 Thread Jay Vyas
1) Phoenix can be used on top of hbase for richer querying semantics. That combo might be good for complex workloads. 2) SolrCloud also might fit the bill here ? Solr can be backed by any HAdoop compatible FS including HDFS, and it's resiliant by that mechanism, and offers sophisticated

Re: New to this group.

2015-01-02 Thread Jay Vyas
Many demos out there are for the business community... For a demonstration of hadoop at a finer grained level, how it's deployed, packaged, installed and used, for a developer who wants to learn hadoop the hard way, I'd suggest : 1 - Getting Apache bigtop stood up on VMs, and 2 - running

Re: hadoop / hive / pig setup directions

2014-12-16 Thread Jay Vyas
Hi bhupendra, The Apache BigTop project was born to solve the general problem of dealing with and verifying the functionality of various components in the hadoop ecosystem. Also, it creates rpm , apt repos for installing hadoop and puppet recipes for initializing the file system and

Re: Hadoop Learning Environment

2014-11-04 Thread jay vyas
.. easy enough to do). Failing that, what are some other free/cheap solutions for setting up a hadoop learning environment? Thanks, Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- jay vyas

Re: Hadoop Learning Environment

2014-11-04 Thread jay vyas
prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- jay vyas

Re: TestDFSIO with FS other than defaultFS

2014-10-02 Thread Jay Vyas
Hi jeff. Wrong fs means that your configuration doesn't know how to bind ofs to the OrangeFS file system class. You can debug the configuration using fs.dumpConfiguration(), and you will likely see references to hdfs in there. By the way, have you tried our bigtop hcfs tests yet? We now

Re:

2014-09-26 Thread jay vyas
, Naga Huawei Technologies Co., Ltd. Phone: Fax: Mobile: +91 9980040283 Email: naganarasimh...@huawei.com Huawei Technologies Co., Ltd. http://www.huawei.com -- jay vyas

Re: To Generate Test Data in HDFS (PDGF)

2014-09-22 Thread Jay Vyas
While on the subject, You can also use the bigpetstore application to do this, in apache bigtop. This data is suited well for hbase ( semi structured, transactional, and features some global patterns which can make for meaningful queries and so on). Clone apache/bigtop cd bigtop-bigpetstore

Re: how to setup Kerberozed Hadoop ?

2014-09-15 Thread jay vyas
share any guide lines or instructions on how to setup a Kerberozed hadoop env ? Thanks. Sophia -- jay vyas

Re: Tez and MapReduce

2014-09-01 Thread jay vyas
is faster especially on complex queries. On Aug 31, 2014 10:33 PM, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: Can Tez and MapReduce live together and get along in the same cluster? B. -- jay vyas

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

2014-08-15 Thread jay vyas
-- Harsh J -- jay vyas

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread Jay Vyas
also, consider apache bigtop. That is the apache upstream Hadoop initiative, and it comes with smoke tests+ Puppet recipes for setting up your own Hadoop distro from scratch. IMHO ... If learning or building your own tooling around Hadoop , bigtop is ideal. If interested in purchasing support

Re: Bench-marking Hadoop Performance

2014-07-22 Thread jay vyas
options? Also, I have searched safari books online including rough cuts, but not seeing books for the 2.4 release. If you know of a book for this release, please share. Thank you. -- jay vyas

Re: Hadoop 2.4 test jar files.

2014-07-22 Thread jay vyas
/2.4.0/share/hadoop/tools/sources/hadoop-distcp-2.4.0-test-sources.jar /a01/hadoop/2.4.0/share/hadoop/tools/sources/hadoop-rumen-2.4.0-test-sources.jar -- jay vyas

Re: clarification on HBASE functionality

2014-07-15 Thread Jay Vyas
Hbase is not harcoded to hdfs: it works on any file system that implements the file system interface, we've run it on glusterfs for example. I assume some have also run it on s3 and other alternative file systems . ** However ** For best performance, direct block io hooks on hdfs can boost

Re: Hadoop virtual machine

2014-07-06 Thread jay vyas
helpful material are appreciated. Manar, -- -- André Kelpe an...@concurrentinc.com http://concurrentinc.com -- jay vyas

Re: Hadoop with SAN

2014-06-15 Thread Jay Vyas
You can either use san to back your datanodes, or implement a custom FileSystem over your san storage. Either would have different drawbacks depending on your requirements.

Re: No job can run in YARN (Hadoop-2.2)

2014-05-11 Thread Jay Vyas
Sounds oddSo (1) you got a filenotfound exception and (2) you fixed it by commenting out memory specific config parameters? Not sure how that would work... Any other details or am I missing something else? On May 11, 2014, at 4:16 AM, Tao Xiao xiaotao.cs@gmail.com wrote: I'm sure

Yarn hangs @Scheduled

2014-04-24 Thread Jay Vyas
attempt.RMAppAttemptImpl: appattempt_1398370674313_0004_01 State change from SUBMITTED to SCHEDULED -- Jay Vyas http://jayunit100.blogspot.com

Re: Yarn hangs @Scheduled

2014-04-24 Thread Jay Vyas
appattempt_1398370674313_0004_01 to scheduler from user: yarn 14/04/24 16:20:33 INFO attempt.RMAppAttemptImpl: appattempt_1398370674313_0004_01 State change from SUBMITTED to SCHEDULED -- Jay Vyas http://jayunit100.blogspot.com CONFIDENTIALITY NOTICE NOTICE: This message is intended

Re: Strange error in Hadoop 2.2.0: FileNotFoundException: file:/tmp/hadoop-hadoop/mapred/

2014-04-22 Thread Jay Vyas
of? Perhaps some permissions issues? Thank you, Natalia -- Jay Vyas http://jayunit100.blogspot.com

Re: Shuffle Error after enabling Kerberos authentication

2014-04-19 Thread Jay Vyas
works fine if Kerberos authentication is disabled. Any idea what what the problem could be? Thanks, Terance. -- Jay Vyas http://jayunit100.blogspot.com

Re: MapReduce for complex key/value pairs?

2014-04-08 Thread Jay Vyas
using Java, btw. Thank you, Natalia Connolly -- Harsh J -- Jay Vyas http://jayunit100.blogspot.com

Re: Hadoop Serialization mechanisms

2014-03-31 Thread Jay Vyas
see a gain in using a more efficient data serialisation format for data files. On Sun, Mar 30, 2014 at 9:09 PM, Jay Vyas jayunit...@gmail.com wrote: Those are all great questions, and mostly difficultto answer.I havent played with serialization APIs in some time, but let me try to give some

Jobs fail immediately in local mode ?

2014-03-29 Thread Jay Vyas
. -- Jay Vyas http://jayunit100.blogspot.com

Re: Doubt

2014-03-19 Thread Jay Vyas
of few things, but as far as installation is concerned, it should be easily doable. Regards Prav On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com wrote: Hi all, is it possible to install Mongodb on the same VM which consists hadoop? -- amiable harsha -- Jay Vyas http

Re: What if file format is dependent upon first few lines?

2014-02-27 Thread Jay Vyas
is a must to parse each log line. It means log file could NOT be simply splitted, otherwise the second split would lost the file format information. How could each mapper get the first few lines in the file? -- Harsh J -- Jay Vyas http://jayunit100.blogspot.com

YARN job exits fast without failure, but does nothing

2014-02-17 Thread Jay Vyas
-- Jay Vyas http://jayunit100.blogspot.com

Re: How to ascertain why LinuxContainer dies?

2014-02-14 Thread Jay Vyas
that you can check under the container's work directory after it fails? On Fri, Feb 14, 2014 at 9:46 AM, Jay Vyas jayunit...@gmail.com wrote: I have a linux container that dies. The nodemanager logs only say: WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception

Re: How to ascertain why LinuxContainer dies?

2014-02-14 Thread Jay Vyas
Feb 4 17:27 .. drwx--x--- 2 htf htf 4096 Feb 4 17:27 . -rw-rw-r-- 1 htf htf 50471 Feb 4 17:31 syslog Regards ./g -Original Message- From: Jay Vyas [mailto:jayunit...@gmail.com] Sent: Friday, February 14, 2014 7:02 AM To: user@hadoop.apache.org Cc: user

How to ascertain why LinuxContainer dies?

2014-02-13 Thread Jay Vyas
) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) where can i find the root cause of the non-zero exit code ? -- Jay Vyas http://jayunit100

Re: Test hadoop code on the cloud

2014-02-12 Thread Jay Vyas
is the simplest way to do this on the cloud? Is there any way to do it for free? Thank in advance -- Jay Vyas http://jayunit100.blogspot.com

YARN FSDownload: How did Mr1 do it ?

2014-02-11 Thread Jay Vyas
Im noticing that resource localization is much more complex in YARN than MR1, in particular, the timestamps need to be identical, or else, an exception is thrown. i never saw that in MR1. How did MR1 JobTrackers handle resource localization differently than MR2 App Masters? -- Jay Vyas http

Re: performance of hadoop fs -put

2014-01-29 Thread Jay Vyas
No , im using a glob pattern, its all done in one put statement On Tue, Jan 28, 2014 at 9:22 PM, Harsh J ha...@cloudera.com wrote: Are you calling one command per file? That's bound to be slow as it invokes a new JVM each time. On Jan 29, 2014 7:15 AM, Jay Vyas jayunit...@gmail.com wrote

Re: DistributedCache deprecated

2014-01-29 Thread Jay Vyas
) Add a file to be localized and it works fine. The same way you were using DC before.. Well I am not sure what would be the best answer, but if you are trying to use DC , I was able to do it with Job class itself. Regards Prav On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas jayunit

Re: Passing data from Client to AM

2014-01-29 Thread Jay Vyas
this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Jay Vyas http://jayunit100.blogspot.com

Re: Hadoop-2.2.0 and Pig-0.12.0 - error IBM_JAVA

2014-01-28 Thread Jay Vyas
issue with hadoop and pig. I'm using Java version - *1.6.0_31* Please help me out. -- Regards, Viswa.J -- Jay Vyas http://jayunit100.blogspot.com

performance of hadoop fs -put

2014-01-28 Thread Jay Vyas
into this issue. ** Is hadoop fs -put inherently slower than a unix cpaction, regardless of filesystem -- and if so , why? ** -- Jay Vyas http://jayunit100.blogspot.com

Strange rpc exception in Yarn

2014-01-27 Thread Jay Vyas
Hi folks: At the **end** of a successful job, im getting some strange stack traces this when using pig, however, it doesnt seem to be pig specific from the stacktrace. Rather, it appears that the job client is attempting to do something funny. Anyone ever see this sort of exception in

Re: Shutdown hook for FileSystems

2014-01-21 Thread Jay Vyas
what is happening when you remove the shutdown hook ?is that supposed to trigger an exception -

Re: What is the difference between Hdfs and DistributedFileSystem?

2014-01-13 Thread Jay Vyas
印 liyin.lian...@aliyun-inc.com wrote: What is the difference between Hdfs.java and DistributedFileSystem.java in Hadoop2? Best Regards, Liyin Liang Tel: 78233 Email: liyin.lian...@alibaba-inc.com -- Jay Vyas http://jayunit100.blogspot.com

Re: Ways to manage user accounts on hadoop cluster when using kerberos security

2014-01-08 Thread Jay Vyas
I recently found a pretty simple and easy way to set ldap up for my machines on rhel and wrote it up using jumpbox and authconfig. If you are in the cloud and only need a quick easy ldap idh and nssswitch setup, this is I think the easiest / cheapest way to do it. I know rhel and fedora come

Re: Debug Hadoop Junit Test in Eclipse

2013-12-16 Thread Jay Vyas
are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email. -- Jay Vyas http://jayunit100

Re: Debug Hadoop Junit Test in Eclipse

2013-12-16 Thread Jay Vyas
it uploads files. So I am only looking to trace fs commands through the DFS shell. I believe this should be require less work in debugging than actually going to mapred VMs! -- Best Regards, Karim Ahmed Awara On Mon, Dec 16, 2013 at 5:57 PM, Jay Vyas jayunit...@gmail.com wrote: Excellent

Pluggable distribute cache impl

2013-12-15 Thread Jay Vyas
are there any ways to plug in an alternate distributed cache implantation (I.e when nodes of a cluster already have an nfs mount or other local data service...)?

Re: multiusers in hadoop through LDAP

2013-12-10 Thread Jay Vyas
OS? So that if a user is authenticated by the LDAP ,who will also access the HDFS directory? Regards -- Jay Vyas http://jayunit100.blogspot.com

Write a file to local disks on all nodes of a YARN cluster.

2013-12-08 Thread Jay Vyas
/a); So that afterwards, all nodes in the cluster have a file a in /tmp. -- Jay Vyas http://jayunit100.blogspot.com

FSMainOperations FSContract tests?

2013-12-05 Thread Jay Vyas
Mainly @steveloughran Is it safe to say that *old* fs semantics are in FSContract test, and *new* fs semantics in FSMainOps tests ? I ask this because it seems that you had tests in your swift filesystem tests which used the FSContract libs, as well as the FSMainOps.. Not sure why you need

Re: how to prevent JAVA HEAP OOM happen in shuffle process in a MR job?

2013-12-02 Thread Jay Vyas
version is rewally important here.. - If 1.x, then Where (NN , JT , TT ?) - if 2.x, then where? (AM, NM, ... ?) -- probably less likely here, since the resources are ephemeral. I know that some older 1x versions had an issue with the jobtracker having an ever-expanding hashmap or something like

Re: Hadoop Test libraries: Where did they go ?

2013-11-25 Thread Jay Vyas
On 21 November 2013 23:28, Jay Vyas jayunit...@gmail.com wrote: It appears to me that http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-test Is no longer updated Where does hadoop now package the test libraries? Looking in the .//hadoop-common-project/hadoop-common/pom.xml

Hadoop Test libraries: Where did they go ?

2013-11-21 Thread Jay Vyas
is packaged into a jar anymore... but i fear it is not. -- Jay Vyas http://jayunit100.blogspot.com

Re: how to stream the video from hdfs

2013-11-13 Thread Jay Vyas
I believe there is a FUSE mount for hdfs which will allow you to open files normally in your streaming app rather than requiring using the jav API. Also consider that For Media and highly available binary data for a front end I would guess that hdfs might be overkill because of the

YARN And NTP

2013-10-24 Thread Jay Vyas
, but depending on underlying filesystem the semantics of this last modified time might vary. Any thoughts on this? -- Jay Vyas http://jayunit100.blogspot.com

Re: Uploading a file to HDFS

2013-10-01 Thread Jay Vyas
are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email. -- Jay Vyas http

Re: Retrieve and compute input splits

2013-09-27 Thread Jay Vyas
the input splits. Any help please. Thanks Sai -- Jay Vyas http://jayunit100.blogspot.com

Re: Extending DFSInputStream class

2013-09-26 Thread Jay Vyas
inaccessible for developers, or am I missing something? regards tmp -- Jay Vyas http://jayunit100.blogspot.com

Re: Extending DFSInputStream class

2013-09-26 Thread Jay Vyas
The way we have gotten around this in the past is extending and then copying the private code and creating a brand new implementation. On Thu, Sep 26, 2013 at 10:50 AM, Jay Vyas jayunit...@gmail.com wrote: This is actually somewhat common in some of the hadoop core classes : Private

Re: Concatenate multiple sequence files into 1 big sequence file

2013-09-10 Thread Jay Vyas
iirc sequence files can be concatenated as is and read as one large file but maybe im forgetting something.

RawLocalFileSystem, getPos and NullPointerException

2013-09-09 Thread Jay Vyas
-rawlocalfilesystem-and-getpos -- Jay Vyas http://jayunit100.blogspot.com

Re: hadoop cares about /etc/hosts ?

2013-09-09 Thread Jay Vyas
this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Jay Vyas http://jayunit100.blogspot.com

MultiFileLineRecodrReader vs CombineFileRecordReader

2013-09-07 Thread Jay Vyas
major difference between these classes, and why the redundancy ? I'm thinking maybe it was retro added at some point, based on some git detective work which I tried... But I figured it might just be easier to ask here :) -- Jay Vyas http://jayunit100.blogspot.com

examples of HADOOP REST API

2013-08-20 Thread Jay Vyas
between this and the ambari REST services, but not sure where to start digging. I want to run some rest calls at the end of some jobs to query how many tasks failed, etc... Hopefully, I could get this in JSON rather than scraping HTML. Thanks! -- Jay Vyas http://jayunit100.blogspot.com

Re: e-Science app on Hadoop

2013-08-16 Thread Jay Vyas
an e-Science application to run on Hadoop? Thanks. Felipe -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia* -- Jay Vyas http://jayunit100.blogspot.com

Mapred.system.dir: should JT start without it?

2013-08-15 Thread Jay Vyas
Is there a startup for contract mapreduce making its own mapred.system.dir ? Also, it seems that the jobtracker can startup even if this directory was not created / doesn't exist - I'm thinking that if that's the case, JT should fail up front.

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Jay Vyas
Then is this a bug? Synchronization in absence of any race condition is normally considered bad. In any case id like to know why this writer is synchronized whereas the other one are not.. That is, I think, then point at issue: either other writers should be synchronized or else this one

Re: solr -Reg

2013-07-28 Thread Jay Vyas
True that it deserves some posting on solr, but i think It's still partially relevant... The SolrInputFormat and SolrOutputFormat handle this for you and will be used in your map reduce jobs . They will output one core. per reducer, where each reducer corresponds to a core.. This is

Re: Staging directory ENOTDIR error.

2013-07-12 Thread Jay Vyas
” configuration in client with the HDFS. ** ** Thanks Devaraj k ** ** *From:* Jay Vyas [mailto:jayunit...@gmail.com] *Sent:* 12 July 2013 04:12 *To:* common-u...@hadoop.apache.org *Subject:* Staging directory ENOTDIR error. ** ** Hi , I'm getting an ungoogleable

Re: CompositeInputFormat

2013-07-11 Thread Jay Vyas
someone help me out? It would be much appreciated. J ** ** Thanks in advance, ** ** Andrew -- Jay Vyas http://jayunit100.blogspot.com

Staging directory ENOTDIR error.

2013-07-11 Thread Jay Vyas
) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116) -- Jay Vyas http://jayunit100.blogspot.com

Data node EPERM not permitted.

2013-07-06 Thread Jay Vyas
(DataNode.java:1575) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1598) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1751) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1772) -- Jay Vyas http

starting Hadoop, the new way

2013-07-05 Thread Jay Vyas
is and if I need to set any particular env variables when doing so. -- Jay Vyas http://jayunit100.blogspot.com

Re: HDFS interfaces

2013-06-04 Thread Jay Vyas
the data is located. * What are these interfaces and where they are in the source code? Is there any manual for the interfaces? Regards, Mahmood -- Jay Vyas http://jayunit100.blogspot.com

Re: Install hadoop on multiple VMs in 1 laptop like a cluster

2013-05-31 Thread Jay Vyas
...@yahoo.in wrote: Just wondering if anyone has any documentation or references to any articles how to simulate a multi node cluster setup in 1 laptop with hadoop running on multiple ubuntu VMs. any help is appreciated. Thanks Sai -- Jay Vyas http://jayunit100.blogspot.com

Re: What else can be built on top of YARN.

2013-05-30 Thread Jay Vyas
efficiently using MRv2 jobs. thanks, Rahul -- Jay Vyas http://jayunit100.blogspot.com

Re: understanding souce code structure

2013-05-27 Thread Jay Vyas
Hi! a few weeks ago I had the same question... Tried a first iteration at documenting this by going through the classes starting with key/value pairs in the blog post below. http://jayunit100.blogspot.com/2013/04/the-kv-pair-salmon-run-in-mapreduce-hdfs.html Note it's not perfect yet but I

Re: Configuring SSH - is it required? for a psedo distriburted mode?

2013-05-16 Thread Jay Vyas
to implement on a single node: cat ~/.ssh/id_rsa.pub /root/.ssh/authorized_keys On Thu, May 16, 2013 at 11:31 AM, Jay Vyas jayunit...@gmail.com wrote: Yes it is required -- in psuedodistributed node the jobtracker is not necessarily aware that the task trackers / data nodes are on the same

partition as block?

2013-04-30 Thread Jay Vyas
traffic... but maybe (1) or (2) will be a precise way to use partitions as a poor mans block. Just a thought - not sure if anyone has tried (1) or (2) before in order to simulate blocks and increase locality by utilizing the partition API. -- Jay Vyas http://jayunit100.blogspot.com

Re: partition as block?

2013-04-30 Thread Jay Vyas
/ cloudfront.blogspot.com On Wed, May 1, 2013 at 12:16 AM, Jay Vyas jayunit...@gmail.com wrote: Hi guys: Im wondering - if I'm running mapreduce jobs on a cluster with large block sizes - can i increase performance with either: 1) A custom FileInputFormat 2) A custom partitioner 3) -DnumReducers Clearly

Re: partition as block?

2013-04-30 Thread Jay Vyas
it be a bottleneck from 'disk' point of view??Are you not going away from the distributed paradigm?? Am I taking it in the correct way. Please correct me if I am getting it wrong. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Wed, May 1, 2013 at 12:34 AM, Jay Vyas jayunit

Re: partition as block?

2013-04-30 Thread Jay Vyas
it considerably high will definitely give you some boost. But it'll require a high level tinkering. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Wed, May 1, 2013 at 1:29 AM, Jay Vyas jayunit...@gmail.com wrote: Yes it is a problem at the first stage. What I'm wondering

Re: Maven dependency

2013-04-24 Thread Jay Vyas
reduce dependencies and I am not sure which to pick. Are there other dependencies need (such as JobConf)? What are the imports needed? During the construction of the configuration what heuristics are used to find the configuration for the Hadoop cluster? ** ** Thank you. -- Jay Vyas

Re: Append MR output file to an exitsted HDFS file

2013-04-21 Thread Jay Vyas
OutputFormat is going to have to find the corresponding file to append. On Sun, Apr 21, 2013 at 10:54 PM, YouPeng Yang yypvsxf19870...@gmail.comwrote: Hi All Can I append a MR output file to an existed file on HDFS. I‘m using CDH4.1.2 vs MRv2 Regards -- Jay Vyas http

Re: Writing intermediate key,value pairs to file and read it again

2013-04-20 Thread Jay Vyas
How many intermediate keys? If small enough, you can keep them in memory. If large, you can just wait for the job to finish and siphon them into your job as input with the MultipleInputs API. On Apr 20, 2013, at 10:43 AM, Vikas Jadhav vikascjadha...@gmail.com wrote: Hello, Can anyone help

JobSubmissionFiles: past , present, and future?

2013-04-12 Thread Jay Vyas
Hi guys: I'm curious about the changes and future of the JobSubmissionFiles class. Grepping around on the web I'm finding some code snippets that suggest that hadoop security is not handled the same way on the staging directory as before:

Re: JobSubmissionFiles: past , present, and future?

2013-04-12 Thread Jay Vyas
Something to keep in mind - if you see the fixing staging permissions error message alot Then there might be a more systemic problem in your fs... At least, that was the case for us. On Apr 12, 2013, at 6:11 AM, Jay Vyas jayunit...@gmail.com wrote: Hi guys: I'm curious about the changes and future

Re: No bulid.xml when to build FUSE

2013-04-10 Thread Jay Vyas
the wrong codes, Or is there any other ways to bulid fuse-dfs? * * Please guide me . * * * * *Thanks * regards -- Jay Vyas http://jayunit100.blogspot.com

Re: Copy Vs DistCP

2013-04-10 Thread Jay Vyas
/some_location/file /new_location/ Thanks, your responses are appreciated. -- Kay -- Jay Vyas http://jayunit100.blogspot.com

Re: Distributed cache: how big is too big?

2013-04-09 Thread Jay Vyas
Hmmm.. maybe im missing something.. but (@bjorn) Why would you use hdfs as a replacement for the distributed cache? After all - the distributed cache is just a file with replication over the whole cluster, which isn't in hdfs. Cant you Just make the cache size big and store the file there?

The Job.xml file

2013-04-09 Thread Jay Vyas
into individual tasks So, my (related) questions are: Is there a way to start a job directly from a job.xml file? What components depend on and read the job.xml file? Where is the job.xml defined/documented (if anywhere)? -- Jay Vyas http://jayunit100.blogspot.com

MVN repository for hadoop trunk

2013-04-06 Thread Jay Vyas
Hi guys: Is there a mvn repo for hadoop's 3.0.0 trunk build? Clearly the hadoop pom.xml allows us to build hadoop from scratch and installs it as 3.0.0-SNAPSHOT -- but its not clear wether there is a published version of this snapshot jar somewhere. -- Jay Vyas http://jayunit100.blogspot.com

Re: MVN repository for hadoop trunk

2013-04-06 Thread Jay Vyas
: https://repository.apache.org/content/groups/snapshots -Giri On Sat, Apr 6, 2013 at 2:00 PM, Harsh J ha...@cloudera.com wrote: I don't think we publish nightly or rolling jars anywhere on maven central from trunk builds. On Sun, Apr 7, 2013 at 2:17 AM, Jay Vyas jayunit

cannot find /usr/lib/hadoop/mapred/

2013-03-06 Thread Jay Vyas
:1522) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3821) -- Jay Vyas http://jayunit100.blogspot.com

  1   2   >