[rg] <https://www.redpointglobal.com/>
John Lilley
Data Management Chief Architect, Redpoint Global Inc.
888 Worcester Street, Suite 200 Wellesley, MA 02482
M: +1 7209385761 |
john.lil...@redpointglobal.com<mailto:john.lil...@redpointglobal.com>
PLEASE NOTE: This e-mail f
by trial-and-error, and then needing to figure out how to get the
thread-context-classloader set at the appropriate place and time.
John Lilley
-Original Message-
From: Sean Busbey
Sent: Tuesday, February 5, 2019 10:42 AM
To: John Lilley
Cc: user@hadoop.apache.org
Subject: Re: Multiple
ere a list of such classes?
Thanks,
John Lilley
and MapReduce do
this somehow, but we haven't been able to dig out the specifics from the code.
John Lilley
, coupling your input readers directly
with output writers. Instead, put the writers in separate threads and push
byte arrays to be written to them via a queue.
John Lilley
From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
Sent: Wednesday, May 25, 2016 9:12 PM
To: user@hadoop.apache.org
That's it! Thanks.
John Lilley
From: Chris Nauroth [mailto:cnaur...@hortonworks.com]
Sent: Tuesday, May 24, 2016 10:24 AM
To: John Lilley <john.lil...@redpoint.net>; 'user@hadoop.apache.org'
<user@hadoop.apache.org>
Subject: Re: Filing a JIRA
Something is definitely odd
ere:
https://issues.apache.org/jira/browse/HADOOP/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
But still, creating a bug using the "Bug -> Create Detailed" menu files it
against Atlas.
John Lilley
From: Chris Nauroth [mailto:cnaur...@hortonworks.com]
Sent: Monday, May 9,
I am having trouble filing a bug report. I was trying this:
https://issues.apache.org/jira/servicedesk/customer/portal/5/create/27
But this doesn't seem right and refuses my request. Can someone point me to
the right place?
Thanks
John Lilley
2048MB. We are using the default Capacity
Scheduler.
Is this a configuration error on our part or has the Resource Manager somehow
returned the wrong size container allocation? Should we simply reject small
containers and wait for the RM to find a larger one for us?
John Lilley
haps?
John Lilley
-Original Message-----
From: John Lilley
Sent: Thursday, October 1, 2015 10:22 AM
To: Varun Vasudev <vvasu...@apache.org>
Subject: RE: Ubuntu open file limits
That's the frustrating thing. Apparently on Ubuntu (maybe just 12.04?),
services do not get their limits
Greetings,
We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting
the "open file limits" problem. Unfortunately setting this system-wide for
ubuntu seems difficult -- no matter what we try, YARN tasks always show the
result of ulimit -n as 1024 (or if we attempt to
(),
which works just fine.
John Lilley
From: Zheng, Kai [mailto:kai.zh...@intel.com]
Sent: Monday, August 17, 2015 5:40 PM
To: user@hadoop.apache.org
Subject: RE: UserGroupInformation and login with password
Hi John,
Login from keytab is mostly expected for services. For end users, yes they use
this should
be something simple. We do need to support password, because many of our
customers do not allow keytabs.
Thanks
John Lilley
attachment: winmail.dat
Follow-up, this is indeed a YARN bug and I've filed a JIRA, which has garnered
a lot of attention and a patch.
john
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Friday, April 17, 2015 1:01 PM
To: 'user@hadoop.apache.org'
Subject: Error in YARN localization with Active Directory user
...@hotmail.com wrote:
So…
If I understand, you’re saying you have a one way trust set up so that the
cluster’s AD trusts the Enterprise AD?
And by AD you really mean KDC?
On Mar 17, 2015, at 2:22 PM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
AD
The opinions
\.COM$/office\\$1/g
Thanks
John Lilley
attachment: winmail.dat
Greetings,
We encountered the issue reported here:
https://issues.cloudera.org/browse/DISTRO-526
It seems that someone should have figured out a workaround by now, does anyone
know what it is?
The problem is, the Hadoop installation uses an Edge AD controller as its
KDC, and that Edge AD
We are seeing a warning deep in HDFS code, I was wondering if anyone knows of
this or if a JIRA has been filed or fixed? Searching on the warning text
didn't crop up anything. It is not harmful AFAIK.
John
[Dynamic-linking native method java.net.NetworkInterface.init ... JNI]
WARNING in
++ thread calling AttachCurrentThread() must then
fetch the system class loader and set it to be the thread's context class
loader:
java.lang.Thread.currentThread().setContextClassLoader(java.lang.ClassLoader.getSystemClassLoader());
john
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent
and see if changing this value has any
impact.
Correction: Default balancer bandwidth is 1MB/s, not 1KB/s as I mentioned in my
previous post. Sorry for the typo.
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: 03 September 2014 17:38
To: user@hadoop.apache.orgmailto:user
Can you run the load from an edge node that is not a DataNode?
john
John Lilley
Chief Architect, RedPoint Global Inc.
1515 Walnut Street | Suite 300 | Boulder, CO 80302
T: +1 303 541 1516 | M: +1 720 938 5761 | F: +1 781-705-2077
Skype: jlilley.redpoint | john.lil...@redpoint.net
I have also found that neither
dfsadmin - setBalanacerBandwidth
nor
dfs.datanode.balance.bandwidthPerSec’
have any notable effect on apparent balancer rate. This is on Hadoop 2.2.0
john
From: cho ju il [mailto:tjst...@kgrid.co.kr]
Sent: Wednesday, September 03, 2014 12:55 AM
To:
We have a YARN task that is core-dumping, and the JVM error log says:
# Core dump written. Default location:
/data2/hadoop/yarn/local/usercache/jlilley/appcache/application_1405724043176_2453/container_1405724043176_2453_01_02/core
or core.14801
However when I look at the node, everything
, 2014 2:13 PM
To: user@hadoop.apache.org
Subject: Re: YARN userapp cache lifetime: can't find core dump
Perhaps the following? I get the application logs from here after job
completion. This is path on hdfs.
yarn.nodemanager.remote-app-log-dir
Regards,
Shahab
On Tue, Sep 2, 2014 at 4:02 PM, John
addDelegationTokens() to extract delegated Credentials for HDFS
and keep them around.
Once this has been done, it appears tha tall is well. We can use those
Credentials in the YARN application master launch context.
john
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Sunday, August 24, 2014 11:05 AM
Following up on this, I was able to extract a winutils.exe and Hadoop.dll from
a Hadoop install for Windows, and set up HADDOP_HOME and PATH to find them. It
makes no difference to security, apparently.
John
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Saturday, August 23, 2014 2
Up to this point, we've been able to run as a Hadoop client application (HDFS +
YARN) from Windows without winutils.exe, despite always seeing messages
complaining about it in the logs. However, we are now integrating with secure
clusters and are having some mysterious errors. Before these
We are encountering a really strange issue accessing Hadoop securely as a
client. We go through the motions of calling setting the security
configuration:
YarnConfiguration conf = new YarnConfiguration();
conf.set(DFSConfigKeys.DFS_NAMENODE_USER_NAME_KEY, nnPrincipal);
The only way we've found to write an application master is by example. However,
the distributed-shell example in 2.2 was not very good; it is much improved in
2.4. I would start with that and edit, rather than create from scratch.
john
-Original Message-
From: Никитин Константин
Greetings!
We would like to support HBase in a general manner, having our software connect
to any HBase table and read/write it in a row-oriented fashion. However, as we
explore HBase, the raw interface is at a very low level -- basically a map from
binary record keys to named columns. So my
://www.kiji.org/
or even the brand new: http://phoenix.apache.org/
Cheers,
Mirko
2014-07-08 16:05 GMT+00:00 John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net:
Greetings!
We would like to support HBase in a general manner, having our software connect
to any HBase table
?
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, July 08, 2014 12:43 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: RE: HBase metadata
Those look intriguing. But what do people actually use today? Is it all
application-specific coding? Hive?
John
From
, Nick
nimar...@pssd.commailto:nimar...@pssd.com wrote:
Can’t speak for the rest of the Hadoop community but we use Lingual. Not sure
if that’s common or not.
Maybe worth posting the same question to @hbase.
From: John Lilley
[mailto:john.lil...@redpoint.netmailto:john.lil...@redpoint.net]
Sent
that helps.
thanks,
Arun
On Wed, Jun 25, 2014 at 1:48 PM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
We are an ISV that currently ships a data-quality/integration suite running as
a native YARN application. We are finding several use cases that would benefit
We are an ISV that currently ships a data-quality/integration suite running as
a native YARN application. We are finding several use cases that would benefit
from being able to manage a per-node persistent service. MapReduce has its
shuffle auxiliary service, but it isn't straightforward to
Hadoop performs best when each node has exclusive use of its disk. Therefore,
if you have a choice, try to provision exclusive use of individual spindles on
your SAN and map each one to a separate mount on your Hadoop nodes. Anything
other than that will tend to produce poor performance due
We've found that much of the Hadoop samples assume that running is being done
form a cluster node, and that the connection information can be gleaned
directly from a configuration object. However, we always run our client from a
remote computer, and our users must manually specify the NN/RM
To second Mirko, HDFS isn’t concerned with content or formats. That would be
analogous to asking specific content to end up on specific disk sectors in a
normal file. If you want to partition data by content, use MapReduce/Pig/Hive
etc to segregate the data into files, perhaps naming the
Hi, can anyone help me with this?
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Sunday, April 20, 2014 3:40 PM
To: user@hadoop.apache.org
Subject: HDFS and YARN security and interface impacts
We have an application that interfaces directly to HDFS and YARN (no
MapReduce). It does
We have an application that interfaces directly to HDFS and YARN (no
MapReduce). It does not currently support any Hadoop security other than the
insecure trust the client defaults. I've been doing some reading about
Hadoop security, but it mostly assumes that applications will be MapReduce.
Does netstat -an | grep LISTEN show these ports being listened on?
Can you stat hdfs from the command line e.g.:
hdfs dfsadmin -report
hdfs fsck /
hdfs dfs -ls /
Also, check out /var/log/hadoop or /var/log/hdfs for more details.
john
From: Mahmood Naderan [mailto:nt_mahm...@yahoo.com]
Sent:
Could you have a pmem-vs-vmem issue as in:
http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop
john
From: praveenesh kumar [mailto:praveen...@gmail.com]
Sent: Tuesday, March 25, 2014 7:38 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop Takes 6GB Memory to run one
This discussion may also be relevant to your question:
http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits
Do you actually need to specify that -Xmx6000m for java heap or could it be one
of the other issues discussed?
John
From: John Lilley [mailto:john.lil
It seems to take a very long time to timeout a connection to an invalid NN URI.
Our application is interactive so the defaults of taking many minutes don't
work well. I've tried setting:
conf.set(ipc.client.connect.max.retries, 2);
conf.set(ipc.client.connect.timeout, 7000);
before calling
Running a non-MapReduce YARN application, one of the containers launched by the
AM is failing with an error message I've never seen. Any ideas? I'm not sure
who exactly is running nice or why its argument list would be too long.
Thanks
john
Container for appattempt_1395755163053_0030_01
We do have a fairly long container command-line. Not huge, around 200
characters.
John
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Wednesday, March 26, 2014 4:38 PM
To: user@hadoop.apache.org
Subject: Getting error message from AM container launch
Running a non-MapReduce YARN
On further examination they appear to be 369 characters long. I've read about
similar issues showing when the environment exceeds 132KB, but we aren't
putting anything significant in the environment.
John
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Wednesday, March 26, 2014 4:41
年3月27日, at 6:55, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
On further examination they appear to be 369 characters long. I’ve read about
similar issues showing when the environment exceeds 132KB, but we aren’t
putting anything significant in the environment.
John
Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
On further examination they appear to be 369 characters long. I’ve read about
similar issues showing when the environment exceeds 132KB, but we aren’t
putting anything significant in the environment.
John
From: John Lilley
We have this erratic behavior where every so often the RM will shutdown with an
UnknownHostException. The odd thing is, the host it complains about have been
in use for days at that point without problem. Any ideas?
Thanks,
John
2014-03-13 14:38:14,746 INFO rmapp.RMAppImpl
Never mind... we figured out its DNS entry was going missing.
john
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Thursday, March 13, 2014 2:52 PM
To: user@hadoop.apache.org
Subject: ResourceManager shutting down
We have this erratic behavior where every so often the RM will shutdown
,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]
On Fri, Mar 7, 2014 at 1:46 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
How would I go about fetching configuration values (e.g. yarn-site.xml) from
the cluster via the API
How would I go about fetching configuration values (e.g. yarn-site.xml) from
the cluster via the API from an application not running on a cluster node?
Thanks
John
Our cluster has a node that reboot randomly. So I've gone to Ambari,
decommissioned its HDFS service, stopped all services, and deleted the node
from the cluster. I expected and fsck to immediately show under-replicated
blocks, but everything comes up fine. How do I tell the cluster that
OK, restarting all services now fsck shows under-replication. Was it the
NameNode restart?
John
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, March 04, 2014 5:47 AM
To: user@hadoop.apache.org
Subject: decommissioning a node
Our cluster has a node that reboot randomly. So
I have a file system with some missing/corrupt blocks. However, running hdfs
fsck -delete also fails with errors. How do I get around this?
Thanks
John
[hdfs@metallica yarn]$ hdfs fsck -delete
/rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld
Connecting to namenode via
(SelectChannelEndPoint.java:410)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, March 04, 2014 6:08 AM
To: user@hadoop.apache.org
Subject: Need help: fsck FAILs, refuses to clean up corrupt fs
I have
Ah... found the answer. I had to manually leave safe mode to delete the
corrupt files.
john
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Tuesday, March 04, 2014 9:33 AM
To: user@hadoop.apache.org
Subject: RE: Need help: fsck FAILs, refuses to clean up corrupt fs
More information
We had a DN/NM node that went offline for a while and been removed from the
cluster via Ambari without decommissioning (because it was offline).
When the node came back up, its NM attempted connection to the RM.
Later the RM failed with this exception (dokken is the errant node):
2014-03-03
Actually it does show in Ambari, but the only time I've seen it is when adding
a new host it shows in the other registered hosts list.
john
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Saturday, March 01, 2014 12:32 PM
To: user@hadoop.apache.org
Subject: how to remove a dead node
of underlying ipc retry is
controlled by ipc.client.connect.max.retries.
Did you try setting both ?
Jian
On Wed, Feb 12, 2014 at 8:36 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
Setting
conf.set(yarn.resourcemanager.connect.max-wait.mshttp
I would like to explore Call Data Record (CDR aka Call Detail Record) analysis,
and to that end I'm looking for a large (GB+) CDR file or a program to
synthesize a somewhat-realistic sample file. Does anyone know where to find
such a thing?
Thanks
John
Subject: Re: large CDR data samples
Have you looked at
http://www.gedis-studio.com/online-call-detail-records-cdr-generator.html ?
On Sat, Mar 1, 2014 at 7:39 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
I would like to explore Call Data Record (CDR aka Call Detail
We have a node that died and had to be rebuilt. However, its status is still
showing in the dfsadmin report
hdfs dfsadmin -report
[...]
Dead datanodes:
Name: 192.168.57.104:50010 (192.168.57.104)
Hostname: tarantula.office.datalever.com
Decommission Status : Decommissioned
Configured Capacity:
that can be set?
Thanks
John
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Monday, February 10, 2014 12:12 PM
To: user@hadoop.apache.org
Subject: RE: very long timeout on failed RM connect
I tried:
conf.set(yarn.resourcemanager.connect.max-wait.ms, 1);
conf.set
Our application (running outside the Hadoop cluster) connects to the RM through
YarnClient. This works fine, except we've found that if the RM address or port
is misconfigured in our software, or a firewall blocks access, the first call
into the client (in this case getNodeReports) hangs for a
.
yarn.resourcemanager.connect.retry-interval.mshttp://yarn.resourcemanager.connect.retry-interval.ms/
controls How often to try connecting to the ResourceManager.
Jian
On Mon, Feb 10, 2014 at 6:44 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
Our application
DataInputStream}
You can utilize this method:
@InterfaceAudience.LimitedPrivate({HDFS})
public InputStream getWrappedStream() {
return in;
And cast the return value to DFSInputStream
Cheers
On Mon, Jan 27, 2014 at 11:07 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote
DistributedFileSystem ignores it though.
On Sat, Jan 25, 2014 at 6:09 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
There is this in FileSystem.java, which would appear to use the default buffer
size of 4096 in the create() call unless otherwise specified in
io.file.buffer.size
that is holding the specific block for any
errors?
On Jan 27, 2014 8:37 PM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
I am getting this perplexing error. Our YARN application launches tasks that
attempt to simultaneously open a large number of files for merge. There seems
I am getting this perplexing error. Our YARN application launches tasks that
attempt to simultaneously open a large number of files for merge. There seems
to be a load threshold in terms of number of simultaneous tasks attempting to
open a set of HDFS files on a four-node cluster. The
limitations.
On Jan 26, 2014 11:03 PM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
I have an application that wants to open a large set of files in HDFS
simultaneously. Are there hard or practical limits to what can be opened at
once by a single process
...@hadoop.apache.org
Subject: Re: HDFS read stats
Please take a look at DFSInputStream#ReadStatistics which contains four metrics
including local bytes read.
You can obtain ReadStatistics through getReadStatistics()
Cheers
On Sun, Jan 26, 2014 at 4:00 PM, John Lilley
john.lil
access to the same set of files.
3) Replication=1 having an influence.
Any ideas? I am not seeing any errors in the datanode logs.
I will run some other tests with replication=3 to see what happens.
John
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Monday, January 27, 2014 8:41
buffer sizes
I don't think that value is used either except in the legacy block reader which
is turned off by default.
On Fri, Jan 24, 2014 at 6:34 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
Ah, I see... it is a constant
CommonConfigurationKeysPublic.java
There are no short-circuit writes, only reads, AFAIK.
Is it necessary to transfer from HDFS to local disk? Can you read from HDFS
directly using the FileSystem interface?
john
From: Shekhar Sharma [mailto:shekhar2...@gmail.com]
Sent: Saturday, January 25, 2014 3:44 AM
To: user@hadoop.apache.org
To: user@hadoop.apache.org
Subject: Re: HDFS buffer sizes
HDFS does not appear to use dfs.stream-buffer-size.
On Thu, Jan 23, 2014 at 6:57 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
What is the interaction between dfs.stream-buffer-size and
dfs.client-write-packet
What is the interaction between dfs.stream-buffer-size and
dfs.client-write-packet-size?
I see that the default for dfs.stream-buffer-size is 4K. Does anyone have
experience using larger buffers to optimize large writes?
Thanks
John
I am writing temp files to HDFS with replication=1, so I expect the blocks to
be stored on the writing node. Are there any tips, in general, for optimizing
write performance to HDFS? I use 128K buffers in the write() calls. Are there
any parameters that can be set on the connection or in
on why this is desired?
+Vinod
On Jan 11, 2014, at 11:37 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
We have a YARN application that we want to automatically terminate if the YARN
client disconnects or crashes. Is it possible to configure the YarnClient-RM
be able to
make it work by giving it a config with the values it needs.
On Mon, Oct 21, 2013 at 8:49 PM, John Lilley john.lil...@redpoint.net wrote:
Thanks again. This gives me a lot of options; we will see what works.
Do you know if there are any permissions issues if we directly access
Isn't this is what you'd normally do in the Mapper?
My understanding of the combiner is that it is like a mapper-side pre-reducer
and operates on blocks of data that have already been sorted by key, so mucking
with the keys doesn't *seem* like a good idea.
john
From: Amit Sela
: Saturday, January 11, 2014 4:58 PM
To: common-u...@hadoop.apache.org
Subject: Re: FileSystem iterative or limited alternative to listStatus()
Can you utilize the following API ?
public FileStatus[] listStatus(Path f, PathFilter filter)
Cheers
On Sat, Jan 11, 2014 at 3:52 PM, John Lilley
We have a YARN application that we want to automatically terminate if the YARN
client disconnects or crashes. Is it possible to configure the YarnClient-RM
connection so that if the client terminates the RM automatically terminates the
AM? Or do we need to build our own logic (e.g. a direct
Is there an HDFS file system method for listing a directory contents
iteratively, or at least stopping at some limit? We have an application in
which the user can type wildcards, which our app expands. However, during
interactive phases we only want the first matching file to check its
-default.xml
yarn.nodemanager.localizer.cache.target-size-mb
10240
Target size of localizer cache in MB, per local directory.
On Sat, Jan 4, 2014 at 12:02 PM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
Are there any limits on the total size of LocalResources
at 2.4.0 release and YARN-1524 is Open.
From the link you posted, I see:
amContainerLogs The URL of the application master container logs
On Sat, Jan 4, 2014 at 9:18 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
Ted,
Thanks for the pointer. But when I read about
Are there any limits on the total size of LocalResources that a YARN app
requests? Do the PUBLIC ones age out of cache over time? Are there settable
controls?
Thanks
John
of task container logs?
Thanks
john
From: Ted Yu [mailto:yuzhih...@gmail.com]
See:
https://issues.apache.org/jira/browse/YARN-649
https://issues.apache.org/jira/browse/YARN-1524
On Fri, Jan 3, 2014 at 8:50 AM, John Lilley john.lil...@redpoint.net wrote:
Is there a programmatic or HTTP interface
Is there a programmatic or HTTP interface for querying the logs of a YARN
application run? Ideally I would start with the AppID, query the AppMaster
log, and then descend into the task logs.
Thanks
John
17, 2013 at 6:39 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
Thanks! I do call FileSytem.getFileBlockLocations() now to map tasks to local
data blocks; is there any advantage to using listLocatedStatus() instead? I
guess one call instead of two...
John
,
Chris Nauroth
Hortonworks
http://hortonworks.com/
On Mon, Dec 16, 2013 at 4:21 PM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
Our YARN application would benefit from maximal bandwidth on HDFS reads.
But I'm unclear on how short-circuit reads are enabled
Our YARN application would benefit from maximal bandwidth on HDFS reads.
But I'm unclear on how short-circuit reads are enabled.
Are they on by default?
Can our application check programmatically to see if the short-circuit read is
enabled?
Thanks,
john
RE:
there will be some external
communication between AM and container tasks. I am trying to implement a
Hadoop-like system to Yarn and I wanted to draw a high-level steps before
starting the work. thanks,
On Thu, Nov 21, 2013 at 3:27 PM, John Lilley
john.lil...@redpoint.netmailto:john.lil
MapReduce also communicates outside of what is directly supported by YARN.
In a YARN application, there is very little direct communication between the
client and the AM, and between the AM and container tasks.
I think that an AM can update to the client two pieces of information --
state and
We've found that using either
$VARIABLE
Or
`command`
As part of a YARN command-line doesn't really work, because those two things
are replaced in the application master context. Is there a way to escape these
elements so that they are passed through to the container launch?
Thanks
John
The distributed shell example is the best one that I know of.
John
From: Bill Q [mailto:bill.q@gmail.com]
Sent: Tuesday, November 12, 2013 12:51 PM
To: user@hadoop.apache.org
Subject: Re: Writing an application based on YARN 2.2
Hi Lohit,
Thanks a lot. Is there any updated docs that would
Thanks!
john
From: Arun C Murthy [mailto:a...@hortonworks.com]
Sent: Monday, November 11, 2013 7:25 PM
To: user@hadoop.apache.org
Subject: Re: worker affinity and YARN scheduling
On Nov 11, 2013, at 6:28 AM, John Lilley
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
I would
We are trying to get the appropriate jars into our AMs CLASSPATH, but running
into an issue. We are building on the distributed shell sample code. Feel
free to direct me to the right way to do this, if our approach is incorrect
or the best practice has been revised. All we need are the
First, this documentation:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataInputStream.html
claims that FSDataInputStream has a seek() method, but javap doesn't show one:
$ javap -classpath [haddoopjars] org.apache.hadoop.fs.FSDataInputStream
Compiled from
1 - 100 of 220 matches
Mail list logo