for UserGrouInfo or anything like that?)...
--
jay vyas
l.com> wrote:
>>
>>> Hi all !
>>>
>>> I started to use hadoop with aws, and a big question appears in front of
>>> me!
>>>
>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>>> some trivial examples, and before moving forward i have one question.
>>>
>>> What is the better option for using Hadoop on AWS?
>>> - Build it from scratch on a EC2 instance
>>> - Use MapR distribution of Hadoop
>>> - Use Amazon distribution of Hadoop
>>>
>>> Sorry if my question is too broad.
>>>
>>> Bye!
>>> Jose
>>>
>>>
>>>
>>>
>>>
>>
>
--
jay vyas
For a start compare sparks word count with mapreduce word count.
Then compare sparksql with hive.
If you get that far for the final exersize, Find out for yourself by running
bigpetstore-mapreduce and bigpetstore-spark side by side :). They are two
similar applications which generate data
been distributed
successfully to all datanodes?
I would like to demonstrate this capability in a short briefing for my
colleagues.
Can I access the file from the datanode itself (todate I can only access
the files from the master node, not the slaves)?
Thank you, Caesar.
--
jay
using Ambari, Cloudera Manager and Apache Hadoop.
I have installed the services like hive, oozie, zookeeper etc.
I have done a web log integration using flume and twitter sentiment
analysis.
I wanted to understand what are the other skills I should learn ?
Thanks
Krish
--
jay vyas
interview questions which was asked during their
interview on Hadoop admin role?
I found few on internet but if somebody who has attended the interview
can give us an idea , that will be great.
Thanks
Krish
--
jay vyas
Bigtop.. Yup!
Mr Asanjar : why don't you post an email about what your doing on the Apache
bigtop list, we'd love to hear from you.
There could possibly be some overlap and our goal is to plumb the hadoop
ecosystem as well
On Feb 9, 2015, at 4:41 PM, Artem Ervits artemerv...@gmail.com
Also BigTop has a very flexible vagrant infrastructure:
https://github.com/apache/bigtop/tree/master/bigtop-deploy/vm/vagrant-puppet
On Jan 18, 2015, at 3:37 PM, Andre Kelpe ake...@concurrentinc.com wrote:
Try our vagrant setup:
https://github.com/Cascading/vagrant-cascading-hadoop-cluster
1) Phoenix can be used on top of hbase for richer querying semantics. That
combo might be good for complex workloads.
2) SolrCloud also might fit the bill here ?
Solr can be backed by any HAdoop compatible FS including HDFS, and it's
resiliant by that mechanism, and offers sophisticated
Many demos out there are for the business community...
For a demonstration of hadoop at a finer grained level, how it's deployed,
packaged, installed and used, for a developer who wants to learn hadoop the
hard way,
I'd suggest :
1 - Getting Apache bigtop stood up on VMs, and
2 - running
Hi bhupendra,
The Apache BigTop project was born to solve the general problem of dealing with
and verifying the functionality of various components in the hadoop ecosystem.
Also, it creates rpm , apt repos for installing hadoop and puppet recipes for
initializing the file system and
.. easy enough to do).
Failing that, what are some other free/cheap solutions for setting up a
hadoop learning environment?
Thanks,
Tim
--
GPG me!!
gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
--
jay vyas
prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
--
jay vyas
Hi jeff. Wrong fs means that your configuration doesn't know how to bind ofs
to the OrangeFS file system class.
You can debug the configuration using fs.dumpConfiguration(), and you will
likely see references to hdfs in there.
By the way, have you tried our bigtop hcfs tests yet? We now
,
Naga
Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile: +91 9980040283
Email: naganarasimh...@huawei.com
Huawei Technologies Co., Ltd.
http://www.huawei.com
--
jay vyas
While on the subject,
You can also use the bigpetstore application to do this, in apache bigtop.
This data is suited well for hbase ( semi structured, transactional, and
features some global patterns which can make for meaningful queries and so on).
Clone apache/bigtop
cd bigtop-bigpetstore
share any guide lines or instructions on how to setup a
Kerberozed hadoop env ?
Thanks.
Sophia
--
jay vyas
is faster especially on complex queries.
On Aug 31, 2014 10:33 PM, Adaryl Bob Wakefield, MBA
adaryl.wakefi...@hotmail.com wrote:
Can Tez and MapReduce live together and get along in the same cluster?
B.
--
jay vyas
--
Harsh J
--
jay vyas
also, consider apache bigtop. That is the apache upstream Hadoop initiative,
and it comes with smoke tests+ Puppet recipes for setting up your own Hadoop
distro from scratch.
IMHO ... If learning or building your own tooling around Hadoop , bigtop is
ideal. If interested in purchasing support
options?
Also, I have searched safari books online including rough cuts, but not
seeing books for the 2.4 release. If you know of a book for this release,
please share.
Thank you.
--
jay vyas
/2.4.0/share/hadoop/tools/sources/hadoop-distcp-2.4.0-test-sources.jar
/a01/hadoop/2.4.0/share/hadoop/tools/sources/hadoop-rumen-2.4.0-test-sources.jar
--
jay vyas
Hbase is not harcoded to hdfs: it works on any file system that implements the
file system interface, we've run it on glusterfs for example. I assume some
have also run it on s3 and other alternative file systems .
** However **
For best performance, direct block io hooks on hdfs can boost
helpful material are appreciated.
Manar,
--
--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com
--
jay vyas
You can either use san to back your datanodes, or implement a custom FileSystem
over your san storage. Either would have different drawbacks depending on your
requirements.
Sounds oddSo (1) you got a filenotfound exception and (2) you fixed it by
commenting out memory specific config parameters?
Not sure how that would work... Any other details or am I missing something
else?
On May 11, 2014, at 4:16 AM, Tao Xiao xiaotao.cs@gmail.com wrote:
I'm sure
attempt.RMAppAttemptImpl:
appattempt_1398370674313_0004_01 State change from SUBMITTED to
SCHEDULED
--
Jay Vyas
http://jayunit100.blogspot.com
appattempt_1398370674313_0004_01 to scheduler from user: yarn
14/04/24 16:20:33 INFO attempt.RMAppAttemptImpl:
appattempt_1398370674313_0004_01 State change from SUBMITTED to SCHEDULED
--
Jay Vyas
http://jayunit100.blogspot.com
CONFIDENTIALITY NOTICE
NOTICE: This message is intended
of? Perhaps some permissions issues?
Thank you,
Natalia
--
Jay Vyas
http://jayunit100.blogspot.com
works fine if Kerberos
authentication is disabled. Any idea what what the problem could be?
Thanks,
Terance.
--
Jay Vyas
http://jayunit100.blogspot.com
using Java, btw.
Thank you,
Natalia Connolly
--
Harsh J
--
Jay Vyas
http://jayunit100.blogspot.com
see a gain in using a more efficient data
serialisation format for data files.
On Sun, Mar 30, 2014 at 9:09 PM, Jay Vyas jayunit...@gmail.com wrote:
Those are all great questions, and mostly difficultto answer.I havent
played with serialization APIs in some time, but let me try to give some
.
--
Jay Vyas
http://jayunit100.blogspot.com
of few
things, but as far as installation is concerned, it should be easily doable.
Regards
Prav
On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com wrote:
Hi all,
is it possible to install Mongodb on the same VM which consists hadoop?
--
amiable harsha
--
Jay Vyas
http
is a must to parse
each
log line. It means log file could NOT be simply splitted, otherwise the
second split would lost the file format information.
How could each mapper get the first few lines in the file?
--
Harsh J
--
Jay Vyas
http://jayunit100.blogspot.com
--
Jay Vyas
http://jayunit100.blogspot.com
that
you can check under the container's work directory after it fails?
On Fri, Feb 14, 2014 at 9:46 AM, Jay Vyas jayunit...@gmail.com wrote:
I have a linux container that dies. The nodemanager logs only say:
WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Exception
Feb 4 17:27 ..
drwx--x--- 2 htf htf 4096 Feb 4 17:27 .
-rw-rw-r-- 1 htf htf 50471 Feb 4 17:31 syslog
Regards
./g
-Original Message-
From: Jay Vyas [mailto:jayunit...@gmail.com]
Sent: Friday, February 14, 2014 7:02 AM
To: user@hadoop.apache.org
Cc: user
)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
where can i find the root cause of the non-zero exit code ?
--
Jay Vyas
http://jayunit100
is the simplest way to do this on the cloud?
Is there any way to do it for free?
Thank in advance
--
Jay Vyas
http://jayunit100.blogspot.com
Im noticing that resource localization is much more complex in YARN than
MR1, in particular, the timestamps need to be identical, or else, an
exception is thrown.
i never saw that in MR1.
How did MR1 JobTrackers handle resource localization differently than MR2
App Masters?
--
Jay Vyas
http
No , im using a glob pattern, its all done in one put statement
On Tue, Jan 28, 2014 at 9:22 PM, Harsh J ha...@cloudera.com wrote:
Are you calling one command per file? That's bound to be slow as it
invokes a new JVM each time.
On Jan 29, 2014 7:15 AM, Jay Vyas jayunit...@gmail.com wrote
)
Add a file to be localized
and it works fine. The same way you were using DC before.. Well I am not
sure what would be the best answer, but if you are trying to use DC , I was
able to do it with Job class itself.
Regards
Prav
On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas jayunit
this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
--
Jay Vyas
http://jayunit100.blogspot.com
issue with hadoop and
pig.
I'm using Java version - *1.6.0_31*
Please help me out.
--
Regards,
Viswa.J
--
Jay Vyas
http://jayunit100.blogspot.com
into this issue.
** Is hadoop fs -put inherently slower than a unix cpaction, regardless
of filesystem -- and if so , why? **
--
Jay Vyas
http://jayunit100.blogspot.com
Hi folks:
At the **end** of a successful job, im getting some strange stack traces
this when using pig, however, it doesnt seem to be pig specific from
the stacktrace. Rather, it appears that the job client is attempting to do
something funny.
Anyone ever see this sort of exception in
what is happening when you remove the shutdown hook ?is that supposed
to trigger an exception -
印 liyin.lian...@aliyun-inc.com wrote:
What is the difference between Hdfs.java and DistributedFileSystem.java
in Hadoop2?
Best Regards,
Liyin Liang
Tel: 78233
Email: liyin.lian...@alibaba-inc.com
--
Jay Vyas
http://jayunit100.blogspot.com
I recently found a pretty simple and easy way to set ldap up for my machines on
rhel and wrote it up using jumpbox and authconfig.
If you are in the cloud and only need a quick easy ldap idh and nssswitch
setup, this is I think the easiest / cheapest way to do it.
I know rhel and fedora come
are not the intended recipient or have
received this message in error, please notify me immediately and delete
this message from your computer system. Any unauthorized use or
distribution is prohibited. Please consider the environment before printing
this email.
--
Jay Vyas
http://jayunit100
it uploads
files. So I am only looking to trace fs commands through the DFS shell. I
believe this should be require less work in debugging than actually going
to mapred VMs!
--
Best Regards,
Karim Ahmed Awara
On Mon, Dec 16, 2013 at 5:57 PM, Jay Vyas jayunit...@gmail.com wrote:
Excellent
are there any ways to plug in an alternate distributed cache implantation (I.e
when nodes of a cluster already have an nfs mount or other local data
service...)?
OS? So that if a user is authenticated by the LDAP
,who will also access the HDFS directory?
Regards
--
Jay Vyas
http://jayunit100.blogspot.com
/a);
So that afterwards, all nodes in the cluster have a file a in /tmp.
--
Jay Vyas
http://jayunit100.blogspot.com
Mainly @steveloughran Is it safe to say that *old* fs semantics are in
FSContract test, and *new* fs semantics in FSMainOps tests ?
I ask this because it seems that you had tests in your swift filesystem tests
which used the FSContract libs, as well as the FSMainOps..
Not sure why you need
version is rewally important here..
- If 1.x, then Where (NN , JT , TT ?)
- if 2.x, then where? (AM, NM, ... ?) -- probably less likely here, since
the resources are ephemeral.
I know that some older 1x versions had an issue with the jobtracker having
an ever-expanding hashmap or something like
On 21 November 2013 23:28, Jay Vyas jayunit...@gmail.com wrote:
It appears to me that
http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-test
Is no longer updated
Where does hadoop now package the test libraries?
Looking in the .//hadoop-common-project/hadoop-common/pom.xml
is packaged into
a jar anymore... but i fear it is not.
--
Jay Vyas
http://jayunit100.blogspot.com
I believe there is a FUSE mount for hdfs which will allow you to open files
normally in your streaming app rather than requiring using the jav API.
Also consider that For Media and highly available binary data for a front end I
would guess that hdfs might be overkill because of the
, but depending on underlying
filesystem the semantics of this last modified time might vary.
Any thoughts on this?
--
Jay Vyas
http://jayunit100.blogspot.com
are not the intended recipient or have
received this message in error, please notify me immediately and delete
this message from your computer system. Any unauthorized use or
distribution is prohibited. Please consider the environment before printing
this email.
--
Jay Vyas
http
the input splits.
Any help please.
Thanks
Sai
--
Jay Vyas
http://jayunit100.blogspot.com
inaccessible for developers, or am I missing something?
regards
tmp
--
Jay Vyas
http://jayunit100.blogspot.com
The way we have gotten around this in the past is extending and then
copying the private code and creating a brand new implementation.
On Thu, Sep 26, 2013 at 10:50 AM, Jay Vyas jayunit...@gmail.com wrote:
This is actually somewhat common in some of the hadoop core classes :
Private
iirc sequence files can be concatenated as is and read as one large file
but maybe im forgetting something.
-rawlocalfilesystem-and-getpos
--
Jay Vyas
http://jayunit100.blogspot.com
this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
--
Jay Vyas
http://jayunit100.blogspot.com
major difference between these classes, and why the redundancy
? I'm thinking maybe it was retro added at some point, based on some git
detective work which I tried...
But I figured it might just be easier to ask here :)
--
Jay Vyas
http://jayunit100.blogspot.com
between this and the
ambari REST services, but not sure where to start digging.
I want to run some rest calls at the end of some jobs to query how many
tasks failed, etc...
Hopefully, I could get this in JSON rather than scraping HTML.
Thanks!
--
Jay Vyas
http://jayunit100.blogspot.com
an e-Science application to run on Hadoop?
Thanks.
Felipe
--
*--
-- Felipe Oliveira Gutierrez
-- felipe.o.gutier...@gmail.com
-- https://sites.google.com/site/lipe82/Home/diaadia*
--
Jay Vyas
http://jayunit100.blogspot.com
Is there a startup for contract mapreduce making its own mapred.system.dir ?
Also, it seems that the jobtracker can startup even if this directory was not
created / doesn't exist - I'm thinking that if that's the case, JT should fail
up front.
Then is this a bug? Synchronization in absence of any race condition is
normally considered bad.
In any case id like to know why this writer is synchronized whereas the other
one are not.. That is, I think, then point at issue: either other writers
should be synchronized or else this one
True that it deserves some posting on solr, but i think It's still partially
relevant...
The SolrInputFormat and SolrOutputFormat handle this for you and will be used
in your map reduce jobs .
They will output one core. per reducer, where each reducer corresponds to a
core.. This is
”
configuration in client with the HDFS.
** **
Thanks
Devaraj k
** **
*From:* Jay Vyas [mailto:jayunit...@gmail.com]
*Sent:* 12 July 2013 04:12
*To:* common-u...@hadoop.apache.org
*Subject:* Staging directory ENOTDIR error.
** **
Hi , I'm getting an ungoogleable
someone help me out? It would be much appreciated. J
** **
Thanks in advance,
** **
Andrew
--
Jay Vyas
http://jayunit100.blogspot.com
)
at
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
--
Jay Vyas
http://jayunit100.blogspot.com
(DataNode.java:1575)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1598)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1751)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1772)
--
Jay Vyas
http
is and if I need to set any
particular env variables when doing so.
--
Jay Vyas
http://jayunit100.blogspot.com
the data is located. *
What are these interfaces and where they are in the source code? Is
there any manual for the interfaces?
Regards,
Mahmood
--
Jay Vyas
http://jayunit100.blogspot.com
...@yahoo.in wrote:
Just wondering if anyone has any documentation or references to any
articles how to simulate a multi node cluster setup in 1 laptop with
hadoop
running on multiple ubuntu VMs. any help is appreciated.
Thanks
Sai
--
Jay Vyas
http://jayunit100.blogspot.com
efficiently using MRv2 jobs.
thanks,
Rahul
--
Jay Vyas
http://jayunit100.blogspot.com
Hi! a few weeks ago I had the same question... Tried a first iteration at
documenting this by going through the classes starting with key/value pairs in
the blog post below.
http://jayunit100.blogspot.com/2013/04/the-kv-pair-salmon-run-in-mapreduce-hdfs.html
Note it's not perfect yet but I
to implement on a single node:
cat ~/.ssh/id_rsa.pub /root/.ssh/authorized_keys
On Thu, May 16, 2013 at 11:31 AM, Jay Vyas jayunit...@gmail.com wrote:
Yes it is required -- in psuedodistributed node the jobtracker is not
necessarily aware that the task trackers / data nodes are on the same
traffic... but maybe (1) or (2) will be a precise way to use
partitions as a poor mans block.
Just a thought - not sure if anyone has tried (1) or (2) before in order to
simulate blocks and increase locality by utilizing the partition API.
--
Jay Vyas
http://jayunit100.blogspot.com
/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 12:16 AM, Jay Vyas jayunit...@gmail.com wrote:
Hi guys:
Im wondering - if I'm running mapreduce jobs on a cluster with large
block sizes - can i increase performance with either:
1) A custom FileInputFormat
2) A custom partitioner
3) -DnumReducers
Clearly
it be a bottleneck from 'disk'
point of view??Are you not going away from the distributed paradigm??
Am I taking it in the correct way. Please correct me if I am getting it
wrong.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 12:34 AM, Jay Vyas jayunit
it considerably high will definitely give you some boost.
But it'll require a high level tinkering.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 1:29 AM, Jay Vyas jayunit...@gmail.com wrote:
Yes it is a problem at the first stage. What I'm wondering
reduce
dependencies and I am not sure which to pick. Are there other dependencies
need (such as JobConf)? What are the imports needed? During the
construction of the configuration what heuristics are used to find the
configuration for the Hadoop cluster?
** **
Thank you.
--
Jay Vyas
OutputFormat is going to have to
find the corresponding file to append.
On Sun, Apr 21, 2013 at 10:54 PM, YouPeng Yang yypvsxf19870...@gmail.comwrote:
Hi All
Can I append a MR output file to an existed file on HDFS.
I‘m using CDH4.1.2 vs MRv2
Regards
--
Jay Vyas
http
How many intermediate keys? If small enough, you can keep them in memory. If
large, you can just wait for the job to finish and siphon them into your job as
input with the MultipleInputs API.
On Apr 20, 2013, at 10:43 AM, Vikas Jadhav vikascjadha...@gmail.com wrote:
Hello,
Can anyone help
Hi guys:
I'm curious about the changes and future of the JobSubmissionFiles class.
Grepping around on the web I'm finding some code snippets that suggest that
hadoop security is not handled the same way on the staging directory as before:
Something to keep in mind - if you see the fixing staging permissions error
message alot
Then there might be a more systemic problem in your fs... At least, that was
the case for us.
On Apr 12, 2013, at 6:11 AM, Jay Vyas jayunit...@gmail.com wrote:
Hi guys:
I'm curious about the changes and future
the wrong codes, Or is there any other ways to bulid
fuse-dfs?
* * Please guide me .
* *
*
*
*Thanks *
regards
--
Jay Vyas
http://jayunit100.blogspot.com
/some_location/file /new_location/
Thanks, your responses are appreciated.
-- Kay
--
Jay Vyas
http://jayunit100.blogspot.com
Hmmm.. maybe im missing something.. but (@bjorn) Why would you use hdfs as a
replacement for the distributed cache?
After all - the distributed cache is just a file with replication over the
whole cluster, which isn't in hdfs. Cant you Just make the cache size big and
store the file there?
into individual tasks
So, my (related) questions are:
Is there a way to start a job directly from a job.xml file?
What components depend on and read the job.xml file?
Where is the job.xml defined/documented (if anywhere)?
--
Jay Vyas
http://jayunit100.blogspot.com
Hi guys:
Is there a mvn repo for hadoop's 3.0.0 trunk build?
Clearly the hadoop pom.xml allows us to build hadoop from scratch and
installs it as 3.0.0-SNAPSHOT -- but its not clear wether there is a
published version of this
snapshot jar somewhere.
--
Jay Vyas
http://jayunit100.blogspot.com
:
https://repository.apache.org/content/groups/snapshots
-Giri
On Sat, Apr 6, 2013 at 2:00 PM, Harsh J ha...@cloudera.com wrote:
I don't think we publish nightly or rolling jars anywhere on maven
central from trunk builds.
On Sun, Apr 7, 2013 at 2:17 AM, Jay Vyas jayunit
:1522)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3821)
--
Jay Vyas
http://jayunit100.blogspot.com
1 - 100 of 173 matches
Mail list logo