unsubscribe

2023-02-02 Thread John Lilley
[rg] <https://www.redpointglobal.com/> John Lilley Data Management Chief Architect, Redpoint Global Inc. 888 Worcester Street, Suite 200 Wellesley, MA 02482 M: +1 7209385761 | john.lil...@redpointglobal.com<mailto:john.lil...@redpointglobal.com> PLEASE NOTE: This e-mail f

RE: Multiple classloaders and Hadoop APIs

2019-02-15 Thread John Lilley
by trial-and-error, and then needing to figure out how to get the thread-context-classloader set at the appropriate place and time. John Lilley -Original Message- From: Sean Busbey Sent: Tuesday, February 5, 2019 10:42 AM To: John Lilley Cc: user@hadoop.apache.org Subject: Re: Multiple

Multiple classloaders and Hadoop APIs

2019-02-01 Thread John Lilley
ere a list of such classes? Thanks, John Lilley

Example of YARN delegation token for hiveserver2

2018-09-19 Thread John Lilley
and MapReduce do this somehow, but we haven't been able to dig out the specifics from the code. John Lilley

RE: What is the recommended way to append to files on hdfs?

2016-05-31 Thread John Lilley
, coupling your input readers directly with output writers. Instead, put the writers in separate threads and push byte arrays to be written to them via a queue. John Lilley From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] Sent: Wednesday, May 25, 2016 9:12 PM To: user@hadoop.apache.org

RE: Filing a JIRA

2016-05-24 Thread John Lilley
That's it! Thanks. John Lilley From: Chris Nauroth [mailto:cnaur...@hortonworks.com] Sent: Tuesday, May 24, 2016 10:24 AM To: John Lilley <john.lil...@redpoint.net>; 'user@hadoop.apache.org' <user@hadoop.apache.org> Subject: Re: Filing a JIRA Something is definitely odd

RE: Filing a JIRA

2016-05-24 Thread John Lilley
ere: https://issues.apache.org/jira/browse/HADOOP/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel But still, creating a bug using the "Bug -> Create Detailed" menu files it against Atlas. John Lilley From: Chris Nauroth [mailto:cnaur...@hortonworks.com] Sent: Monday, May 9,

Filing a JIRA

2016-05-09 Thread John Lilley
I am having trouble filing a bug report. I was trying this: https://issues.apache.org/jira/servicedesk/customer/portal/5/create/27 But this doesn't seem right and refuses my request. Can someone point me to the right place? Thanks John Lilley

Can a YARN container request be fulfilled by a container with less memory?

2015-12-11 Thread John Lilley
2048MB. We are using the default Capacity Scheduler. Is this a configuration error on our part or has the Resource Manager somehow returned the wrong size container allocation? Should we simply reject small containers and wait for the RM to find a larger one for us? John Lilley

RE: Ubuntu open file limits

2015-10-02 Thread John Lilley
haps? John Lilley -Original Message----- From: John Lilley Sent: Thursday, October 1, 2015 10:22 AM To: Varun Vasudev <vvasu...@apache.org> Subject: RE: Ubuntu open file limits That's the frustrating thing. Apparently on Ubuntu (maybe just 12.04?), services do not get their limits

Ubuntu open file limits

2015-09-30 Thread John Lilley
Greetings, We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult -- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we attempt to

RE: UserGroupInformation and login with password

2015-08-19 Thread John Lilley
(), which works just fine. John Lilley From: Zheng, Kai [mailto:kai.zh...@intel.com] Sent: Monday, August 17, 2015 5:40 PM To: user@hadoop.apache.org Subject: RE: UserGroupInformation and login with password Hi John, Login from keytab is mostly expected for services. For end users, yes they use

UserGroupInformation and login with password

2015-08-17 Thread John Lilley
this should be something simple. We do need to support password, because many of our customers do not allow keytabs. Thanks John Lilley attachment: winmail.dat

RE: Error in YARN localization with Active Directory user -- inconsistent directory name escapement

2015-05-04 Thread John Lilley
Follow-up, this is indeed a YARN bug and I've filed a JIRA, which has garnered a lot of attention and a patch. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Friday, April 17, 2015 1:01 PM To: 'user@hadoop.apache.org' Subject: Error in YARN localization with Active Directory user

RE: Trusted-realm vs default-realm kerberos issue

2015-04-19 Thread John Lilley
...@hotmail.com wrote: So… If I understand, you’re saying you have a one way trust set up so that the cluster’s AD trusts the Enterprise AD? And by AD you really mean KDC? On Mar 17, 2015, at 2:22 PM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: AD The opinions

Error in YARN localization with Active Directory user -- inconsistent directory name escapement

2015-04-17 Thread John Lilley
\.COM$/office\\$1/g Thanks John Lilley attachment: winmail.dat

Trusted-realm vs default-realm kerberos issue

2015-03-17 Thread John Lilley
Greetings, We encountered the issue reported here: https://issues.cloudera.org/browse/DISTRO-526 It seems that someone should have figured out a workaround by now, does anyone know what it is? The problem is, the Hadoop installation uses an Edge AD controller as its KDC, and that Edge AD

JNI native-method warning in HDP 2.0

2014-10-16 Thread John Lilley
We are seeing a warning deep in HDFS code, I was wondering if anyone knows of this or if a JIRA has been filed or fixed? Searching on the warning text didn't crop up anything. It is not harmful AFAIK. John [Dynamic-linking native method java.net.NetworkInterface.init ... JNI] WARNING in

RE: Issue with Hadoop/Kerberos security as client

2014-09-18 Thread John Lilley
++ thread calling AttachCurrentThread() must then fetch the system class loader and set it to be the thread's context class loader: java.lang.Thread.currentThread().setContextClassLoader(java.lang.ClassLoader.getSystemClassLoader()); john From: John Lilley [mailto:john.lil...@redpoint.net] Sent

RE: How can I increase the speed balancing?

2014-09-16 Thread John Lilley
and see if changing this value has any impact. Correction: Default balancer bandwidth is 1MB/s, not 1KB/s as I mentioned in my previous post. Sorry for the typo. From: John Lilley [mailto:john.lil...@redpoint.net] Sent: 03 September 2014 17:38 To: user@hadoop.apache.orgmailto:user

RE: HDFS balance

2014-09-03 Thread John Lilley
Can you run the load from an edge node that is not a DataNode? john John Lilley Chief Architect, RedPoint Global Inc. 1515 Walnut Street | Suite 300 | Boulder, CO 80302 T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077 Skype: jlilley.redpoint | john.lil...@redpoint.net

RE: How can I increase the speed balancing?

2014-09-03 Thread John Lilley
I have also found that neither dfsadmin - setBalanacerBandwidth nor dfs.datanode.balance.bandwidthPerSec’ have any notable effect on apparent balancer rate. This is on Hadoop 2.2.0 john From: cho ju il [mailto:tjst...@kgrid.co.kr] Sent: Wednesday, September 03, 2014 12:55 AM To:

YARN userapp cache lifetime: can't find core dump

2014-09-02 Thread John Lilley
We have a YARN task that is core-dumping, and the JVM error log says: # Core dump written. Default location: /data2/hadoop/yarn/local/usercache/jlilley/appcache/application_1405724043176_2453/container_1405724043176_2453_01_02/core or core.14801 However when I look at the node, everything

RE: YARN userapp cache lifetime: can't find core dump

2014-09-02 Thread John Lilley
, 2014 2:13 PM To: user@hadoop.apache.org Subject: Re: YARN userapp cache lifetime: can't find core dump Perhaps the following? I get the application logs from here after job completion. This is path on hdfs. yarn.nodemanager.remote-app-log-dir Regards, Shahab On Tue, Sep 2, 2014 at 4:02 PM, John

RE: winutils and security

2014-08-26 Thread John Lilley
addDelegationTokens() to extract delegated Credentials for HDFS and keep them around. Once this has been done, it appears tha tall is well. We can use those Credentials in the YARN application master launch context. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Sunday, August 24, 2014 11:05 AM

RE: winutils and security

2014-08-24 Thread John Lilley
Following up on this, I was able to extract a winutils.exe and Hadoop.dll from a Hadoop install for Windows, and set up HADDOP_HOME and PATH to find them. It makes no difference to security, apparently. John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Saturday, August 23, 2014 2

winutils and security

2014-08-23 Thread John Lilley
Up to this point, we've been able to run as a Hadoop client application (HDFS + YARN) from Windows without winutils.exe, despite always seeing messages complaining about it in the logs. However, we are now integrating with secure clusters and are having some mysterious errors. Before these

Issue with Hadoop/Kerberos security as client

2014-08-19 Thread John Lilley
We are encountering a really strange issue accessing Hadoop securely as a client. We go through the motions of calling setting the security configuration: YarnConfiguration conf = new YarnConfiguration(); conf.set(DFSConfigKeys.DFS_NAMENODE_USER_NAME_KEY, nnPrincipal);

RE: Issues with documentation on YARN

2014-07-27 Thread John Lilley
The only way we've found to write an application master is by example. However, the distributed-shell example in 2.2 was not very good; it is much improved in 2.4. I would start with that and edit, rather than create from scratch. john -Original Message- From: Никитин Константин

HBase metadata

2014-07-08 Thread John Lilley
Greetings! We would like to support HBase in a general manner, having our software connect to any HBase table and read/write it in a row-oriented fashion. However, as we explore HBase, the raw interface is at a very low level -- basically a map from binary record keys to named columns. So my

RE: HBase metadata

2014-07-08 Thread John Lilley
://www.kiji.org/ or even the brand new: http://phoenix.apache.org/ Cheers, Mirko 2014-07-08 16:05 GMT+00:00 John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net: Greetings! We would like to support HBase in a general manner, having our software connect to any HBase table

RE: HBase metadata

2014-07-08 Thread John Lilley
? From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Tuesday, July 08, 2014 12:43 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: RE: HBase metadata Those look intriguing. But what do people actually use today? Is it all application-specific coding? Hive? John From

RE: HBase metadata

2014-07-08 Thread John Lilley
, Nick nimar...@pssd.commailto:nimar...@pssd.com wrote: Can’t speak for the rest of the Hadoop community but we use Lingual. Not sure if that’s common or not. Maybe worth posting the same question to @hbase. From: John Lilley [mailto:john.lil...@redpoint.netmailto:john.lil...@redpoint.net] Sent

RE: persisent services in Hadoop

2014-06-27 Thread John Lilley
that helps. thanks, Arun On Wed, Jun 25, 2014 at 1:48 PM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: We are an ISV that currently ships a data-quality/integration suite running as a native YARN application. We are finding several use cases that would benefit

persisent services in Hadoop

2014-06-25 Thread John Lilley
We are an ISV that currently ships a data-quality/integration suite running as a native YARN application. We are finding several use cases that would benefit from being able to manage a per-node persistent service. MapReduce has its shuffle auxiliary service, but it isn't straightforward to

RE: Hadoop SAN Storage reuse

2014-06-12 Thread John Lilley
Hadoop performs best when each node has exclusive use of its disk. Therefore, if you have a choice, try to provision exclusive use of individual spindles on your SAN and map each one to a separate mount on your Hadoop nodes. Anything other than that will tend to produce poor performance due

Gathering connection information

2014-06-04 Thread John Lilley
We've found that much of the Hadoop samples assume that running is being done form a cluster node, and that the connection information can be gleaned directly from a configuration object. However, we always run our client from a remote computer, and our users must manually specify the NN/RM

RE: partition file by content based through HDFS

2014-05-11 Thread John Lilley
To second Mirko, HDFS isn’t concerned with content or formats. That would be analogous to asking specific content to end up on specific disk sectors in a normal file. If you want to partition data by content, use MapReduce/Pig/Hive etc to segregate the data into files, perhaps naming the

HDFS and YARN security and interface impacts

2014-04-25 Thread John Lilley
Hi, can anyone help me with this? From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Sunday, April 20, 2014 3:40 PM To: user@hadoop.apache.org Subject: HDFS and YARN security and interface impacts We have an application that interfaces directly to HDFS and YARN (no MapReduce). It does

HDFS and YARN security and interface impacts

2014-04-20 Thread John Lilley
We have an application that interfaces directly to HDFS and YARN (no MapReduce). It does not currently support any Hadoop security other than the insecure trust the client defaults. I've been doing some reading about Hadoop security, but it mostly assumes that applications will be MapReduce.

RE: ipc.Client: Retrying connect to server

2014-03-27 Thread John Lilley
Does netstat -an | grep LISTEN show these ports being listened on? Can you stat hdfs from the command line e.g.: hdfs dfsadmin -report hdfs fsck / hdfs dfs -ls / Also, check out /var/log/hadoop or /var/log/hdfs for more details. john From: Mahmood Naderan [mailto:nt_mahm...@yahoo.com] Sent:

RE: Hadoop Takes 6GB Memory to run one mapper

2014-03-27 Thread John Lilley
Could you have a pmem-vs-vmem issue as in: http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop john From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Tuesday, March 25, 2014 7:38 AM To: user@hadoop.apache.org Subject: Re: Hadoop Takes 6GB Memory to run one

RE: Hadoop Takes 6GB Memory to run one mapper

2014-03-27 Thread John Lilley
This discussion may also be relevant to your question: http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits Do you actually need to specify that -Xmx6000m for java heap or could it be one of the other issues discussed? John From: John Lilley [mailto:john.lil

reducing HDFS FS connection timeouts

2014-03-27 Thread John Lilley
It seems to take a very long time to timeout a connection to an invalid NN URI. Our application is interactive so the defaults of taking many minutes don't work well. I've tried setting: conf.set(ipc.client.connect.max.retries, 2); conf.set(ipc.client.connect.timeout, 7000); before calling

Getting error message from AM container launch

2014-03-26 Thread John Lilley
Running a non-MapReduce YARN application, one of the containers launched by the AM is failing with an error message I've never seen. Any ideas? I'm not sure who exactly is running nice or why its argument list would be too long. Thanks john Container for appattempt_1395755163053_0030_01

RE: Getting error message from AM container launch

2014-03-26 Thread John Lilley
We do have a fairly long container command-line. Not huge, around 200 characters. John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Wednesday, March 26, 2014 4:38 PM To: user@hadoop.apache.org Subject: Getting error message from AM container launch Running a non-MapReduce YARN

RE: Getting error message from AM container launch

2014-03-26 Thread John Lilley
On further examination they appear to be 369 characters long. I've read about similar issues showing when the environment exceeds 132KB, but we aren't putting anything significant in the environment. John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Wednesday, March 26, 2014 4:41

RE: Getting error message from AM container launch

2014-03-26 Thread John Lilley
年3月27日, at 6:55, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: On further examination they appear to be 369 characters long. I’ve read about similar issues showing when the environment exceeds 132KB, but we aren’t putting anything significant in the environment. John

RE: Getting error message from AM container launch

2014-03-26 Thread John Lilley
Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: On further examination they appear to be 369 characters long. I’ve read about similar issues showing when the environment exceeds 132KB, but we aren’t putting anything significant in the environment. John From: John Lilley

ResourceManager shutting down

2014-03-13 Thread John Lilley
We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException. The odd thing is, the host it complains about have been in use for days at that point without problem. Any ideas? Thanks, John 2014-03-13 14:38:14,746 INFO rmapp.RMAppImpl

RE: ResourceManager shutting down

2014-03-13 Thread John Lilley
Never mind... we figured out its DNS entry was going missing. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Thursday, March 13, 2014 2:52 PM To: user@hadoop.apache.org Subject: ResourceManager shutting down We have this erratic behavior where every so often the RM will shutdown

RE: Fetching configuration values from cluster

2014-03-07 Thread John Lilley
, Stanley Shi, [http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png] On Fri, Mar 7, 2014 at 1:46 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: How would I go about fetching configuration values (e.g. yarn-site.xml) from the cluster via the API

Fetching configuration values from cluster

2014-03-06 Thread John Lilley
How would I go about fetching configuration values (e.g. yarn-site.xml) from the cluster via the API from an application not running on a cluster node? Thanks John

decommissioning a node

2014-03-04 Thread John Lilley
Our cluster has a node that reboot randomly. So I've gone to Ambari, decommissioned its HDFS service, stopped all services, and deleted the node from the cluster. I expected and fsck to immediately show under-replicated blocks, but everything comes up fine. How do I tell the cluster that

RE: decommissioning a node

2014-03-04 Thread John Lilley
OK, restarting all services now fsck shows under-replication. Was it the NameNode restart? John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Tuesday, March 04, 2014 5:47 AM To: user@hadoop.apache.org Subject: decommissioning a node Our cluster has a node that reboot randomly. So

Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
I have a file system with some missing/corrupt blocks. However, running hdfs fsck -delete also fails with errors. How do I get around this? Thanks John [hdfs@metallica yarn]$ hdfs fsck -delete /rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld Connecting to namenode via

RE: Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Tuesday, March 04, 2014 6:08 AM To: user@hadoop.apache.org Subject: Need help: fsck FAILs, refuses to clean up corrupt fs I have

RE: Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
Ah... found the answer. I had to manually leave safe mode to delete the corrupt files. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Tuesday, March 04, 2014 9:33 AM To: user@hadoop.apache.org Subject: RE: Need help: fsck FAILs, refuses to clean up corrupt fs More information

ResourceManager crash on deleted NM node back from the dead

2014-03-03 Thread John Lilley
We had a DN/NM node that went offline for a while and been removed from the cluster via Ambari without decommissioning (because it was offline). When the node came back up, its NM attempted connection to the RM. Later the RM failed with this exception (dokken is the errant node): 2014-03-03

RE: how to remove a dead node?

2014-03-02 Thread John Lilley
Actually it does show in Ambari, but the only time I've seen it is when adding a new host it shows in the other registered hosts list. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Saturday, March 01, 2014 12:32 PM To: user@hadoop.apache.org Subject: how to remove a dead node

RE: very long timeout on failed RM connect

2014-03-01 Thread John Lilley
of underlying ipc retry is controlled by ipc.client.connect.max.retries. Did you try setting both ? Jian On Wed, Feb 12, 2014 at 8:36 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: Setting conf.set(yarn.resourcemanager.connect.max-wait.mshttp

large CDR data samples

2014-03-01 Thread John Lilley
I would like to explore Call Data Record (CDR aka Call Detail Record) analysis, and to that end I'm looking for a large (GB+) CDR file or a program to synthesize a somewhat-realistic sample file. Does anyone know where to find such a thing? Thanks John

RE: large CDR data samples

2014-03-01 Thread John Lilley
Subject: Re: large CDR data samples Have you looked at http://www.gedis-studio.com/online-call-detail-records-cdr-generator.html ? On Sat, Mar 1, 2014 at 7:39 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: I would like to explore Call Data Record (CDR aka Call Detail

how to remove a dead node?

2014-03-01 Thread John Lilley
We have a node that died and had to be rebuilt. However, its status is still showing in the dfsadmin report hdfs dfsadmin -report [...] Dead datanodes: Name: 192.168.57.104:50010 (192.168.57.104) Hostname: tarantula.office.datalever.com Decommission Status : Decommissioned Configured Capacity:

RE: very long timeout on failed RM connect

2014-02-12 Thread John Lilley
that can be set? Thanks John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Monday, February 10, 2014 12:12 PM To: user@hadoop.apache.org Subject: RE: very long timeout on failed RM connect I tried: conf.set(yarn.resourcemanager.connect.max-wait.ms, 1); conf.set

very long timeout on failed RM connect

2014-02-10 Thread John Lilley
Our application (running outside the Hadoop cluster) connects to the RM through YarnClient. This works fine, except we've found that if the RM address or port is misconfigured in our software, or a firewall blocks access, the first call into the client (in this case getNodeReports) hangs for a

RE: very long timeout on failed RM connect

2014-02-10 Thread John Lilley
. yarn.resourcemanager.connect.retry-interval.mshttp://yarn.resourcemanager.connect.retry-interval.ms/ controls How often to try connecting to the ResourceManager. Jian On Mon, Feb 10, 2014 at 6:44 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: Our application

RE: HDFS read stats

2014-02-09 Thread John Lilley
DataInputStream} You can utilize this method: @InterfaceAudience.LimitedPrivate({HDFS}) public InputStream getWrappedStream() { return in; And cast the return value to DFSInputStream Cheers On Mon, Jan 27, 2014 at 11:07 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote

RE: HDFS buffer sizes

2014-02-09 Thread John Lilley
DistributedFileSystem ignores it though. On Sat, Jan 25, 2014 at 6:09 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: There is this in FileSystem.java, which would appear to use the default buffer size of 4096 in the create() call unless otherwise specified in io.file.buffer.size

RE: BlockMissingException reading HDFS file, but the block exists and fsck shows OK

2014-01-27 Thread John Lilley
that is holding the specific block for any errors? On Jan 27, 2014 8:37 PM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: I am getting this perplexing error. Our YARN application launches tasks that attempt to simultaneously open a large number of files for merge. There seems

BlockMissingException reading HDFS file, but the block exists and fsck shows OK

2014-01-27 Thread John Lilley
I am getting this perplexing error. Our YARN application launches tasks that attempt to simultaneously open a large number of files for merge. There seems to be a load threshold in terms of number of simultaneous tasks attempting to open a set of HDFS files on a four-node cluster. The

RE: HDFS open file limit

2014-01-27 Thread John Lilley
limitations. On Jan 26, 2014 11:03 PM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: I have an application that wants to open a large set of files in HDFS simultaneously. Are there hard or practical limits to what can be opened at once by a single process

RE: HDFS read stats

2014-01-27 Thread John Lilley
...@hadoop.apache.org Subject: Re: HDFS read stats Please take a look at DFSInputStream#ReadStatistics which contains four metrics including local bytes read. You can obtain ReadStatistics through getReadStatistics() Cheers On Sun, Jan 26, 2014 at 4:00 PM, John Lilley john.lil

RE: BlockMissingException reading HDFS file, but the block exists and fsck shows OK

2014-01-27 Thread John Lilley
access to the same set of files. 3) Replication=1 having an influence. Any ideas? I am not seeing any errors in the datanode logs. I will run some other tests with replication=3 to see what happens. John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Monday, January 27, 2014 8:41

RE: HDFS buffer sizes

2014-01-25 Thread John Lilley
buffer sizes I don't think that value is used either except in the legacy block reader which is turned off by default. On Fri, Jan 24, 2014 at 6:34 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: Ah, I see... it is a constant CommonConfigurationKeysPublic.java

RE: HDFS data transfer is faster than SCP based transfer?

2014-01-25 Thread John Lilley
There are no short-circuit writes, only reads, AFAIK. Is it necessary to transfer from HDFS to local disk? Can you read from HDFS directly using the FileSystem interface? john From: Shekhar Sharma [mailto:shekhar2...@gmail.com] Sent: Saturday, January 25, 2014 3:44 AM To: user@hadoop.apache.org

RE: HDFS buffer sizes

2014-01-24 Thread John Lilley
To: user@hadoop.apache.org Subject: Re: HDFS buffer sizes HDFS does not appear to use dfs.stream-buffer-size. On Thu, Jan 23, 2014 at 6:57 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: What is the interaction between dfs.stream-buffer-size and dfs.client-write-packet

HDFS buffer sizes

2014-01-23 Thread John Lilley
What is the interaction between dfs.stream-buffer-size and dfs.client-write-packet-size? I see that the default for dfs.stream-buffer-size is 4K. Does anyone have experience using larger buffers to optimize large writes? Thanks John

optimizing HDFS writes with replication=1

2014-01-21 Thread John Lilley
I am writing temp files to HDFS with replication=1, so I expect the blocks to be stored on the writing node. Are there any tips, in general, for optimizing write performance to HDFS? I use 128K buffers in the write() calls. Are there any parameters that can be set on the connection or in

RE: How to make AM terminate if client crashes?

2014-01-17 Thread John Lilley
on why this is desired? +Vinod On Jan 11, 2014, at 11:37 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: We have a YARN application that we want to automatically terminate if the YARN client disconnects or crashes. Is it possible to configure the YarnClient-RM

RE: temporary file locations for YARN applications

2014-01-14 Thread John Lilley
be able to make it work by giving it a config with the values it needs. On Mon, Oct 21, 2013 at 8:49 PM, John Lilley john.lil...@redpoint.net wrote: Thanks again. This gives me a lot of options; we will see what works. Do you know if there are any permissions issues if we directly access

RE: manipulating key in combine phase

2014-01-12 Thread John Lilley
Isn't this is what you'd normally do in the Mapper? My understanding of the combiner is that it is like a mapper-side pre-reducer and operates on blocks of data that have already been sorted by key, so mucking with the keys doesn't *seem* like a good idea. john From: Amit Sela

RE: FileSystem iterative or limited alternative to listStatus()

2014-01-12 Thread John Lilley
: Saturday, January 11, 2014 4:58 PM To: common-u...@hadoop.apache.org Subject: Re: FileSystem iterative or limited alternative to listStatus() Can you utilize the following API ? public FileStatus[] listStatus(Path f, PathFilter filter) Cheers On Sat, Jan 11, 2014 at 3:52 PM, John Lilley

How to make AM terminate if client crashes?

2014-01-11 Thread John Lilley
We have a YARN application that we want to automatically terminate if the YARN client disconnects or crashes. Is it possible to configure the YarnClient-RM connection so that if the client terminates the RM automatically terminates the AM? Or do we need to build our own logic (e.g. a direct

FileSystem iterative or limited alternative to listStatus()

2014-01-11 Thread John Lilley
Is there an HDFS file system method for listing a directory contents iteratively, or at least stopping at some limit? We have an application in which the user can type wildcards, which our app expands. However, during interactive phases we only want the first matching file to check its

RE: LocalResource size/time limits

2014-01-05 Thread John Lilley
-default.xml yarn.nodemanager.localizer.cache.target-size-mb 10240 Target size of localizer cache in MB, per local directory. On Sat, Jan 4, 2014 at 12:02 PM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: Are there any limits on the total size of LocalResources

RE: YARN log access

2014-01-05 Thread John Lilley
at 2.4.0 release and YARN-1524 is Open. From the link you posted, I see: amContainerLogs The URL of the application master container logs On Sat, Jan 4, 2014 at 9:18 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: Ted, Thanks for the pointer. But when I read about

LocalResource size/time limits

2014-01-04 Thread John Lilley
Are there any limits on the total size of LocalResources that a YARN app requests? Do the PUBLIC ones age out of cache over time? Are there settable controls? Thanks John

RE: YARN log access

2014-01-04 Thread John Lilley
of task container logs? Thanks john From: Ted Yu [mailto:yuzhih...@gmail.com] See: https://issues.apache.org/jira/browse/YARN-649 https://issues.apache.org/jira/browse/YARN-1524 On Fri, Jan 3, 2014 at 8:50 AM, John Lilley john.lil...@redpoint.net wrote: Is there a programmatic or HTTP interface

YARN log access

2014-01-03 Thread John Lilley
Is there a programmatic or HTTP interface for querying the logs of a YARN application run? Ideally I would start with the AppID, query the AppMaster log, and then descend into the task logs. Thanks John

RE: HDFS short-circuit reads

2013-12-19 Thread John Lilley
17, 2013 at 6:39 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: Thanks! I do call FileSytem.getFileBlockLocations() now to map tasks to local data blocks; is there any advantage to using listLocatedStatus() instead? I guess one call instead of two... John

RE: HDFS short-circuit reads

2013-12-17 Thread John Lilley
, Chris Nauroth Hortonworks http://hortonworks.com/ On Mon, Dec 16, 2013 at 4:21 PM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: Our YARN application would benefit from maximal bandwidth on HDFS reads. But I'm unclear on how short-circuit reads are enabled

HDFS short-circuit reads

2013-12-16 Thread John Lilley
Our YARN application would benefit from maximal bandwidth on HDFS reads. But I'm unclear on how short-circuit reads are enabled. Are they on by default? Can our application check programmatically to see if the short-circuit read is enabled? Thanks, john RE:

RE: In YARN, how does a task tracker knows the address of a job tracker?

2013-11-23 Thread John Lilley
there will be some external communication between AM and container tasks. I am trying to implement a Hadoop-like system to Yarn and I wanted to draw a high-level steps before starting the work. thanks, On Thu, Nov 21, 2013 at 3:27 PM, John Lilley john.lil...@redpoint.netmailto:john.lil

RE: In YARN, how does a task tracker knows the address of a job tracker?

2013-11-21 Thread John Lilley
MapReduce also communicates outside of what is directly supported by YARN. In a YARN application, there is very little direct communication between the client and the AM, and between the AM and container tasks. I think that an AM can update to the client two pieces of information -- state and

$VARIABLE and `command` in YARN command lines

2013-11-21 Thread John Lilley
We've found that using either $VARIABLE Or `command` As part of a YARN command-line doesn't really work, because those two things are replaced in the application master context. Is there a way to escape these elements so that they are passed through to the container launch? Thanks John

RE: Writing an application based on YARN 2.2

2013-11-14 Thread John Lilley
The distributed shell example is the best one that I know of. John From: Bill Q [mailto:bill.q@gmail.com] Sent: Tuesday, November 12, 2013 12:51 PM To: user@hadoop.apache.org Subject: Re: Writing an application based on YARN 2.2 Hi Lohit, Thanks a lot. Is there any updated docs that would

RE: worker affinity and YARN scheduling

2013-11-13 Thread John Lilley
Thanks! john From: Arun C Murthy [mailto:a...@hortonworks.com] Sent: Monday, November 11, 2013 7:25 PM To: user@hadoop.apache.org Subject: Re: worker affinity and YARN scheduling On Nov 11, 2013, at 6:28 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: I would

Issue with CLASSPATH environment wildcard expansion and application master

2013-11-13 Thread John Lilley
We are trying to get the appropriate jars into our AMs CLASSPATH, but running into an issue. We are building on the distributed shell sample code. Feel free to direct me to the right way to do this, if our approach is incorrect or the best practice has been revised. All we need are the

Two questions about FSDataInputStream

2013-11-12 Thread John Lilley
First, this documentation: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataInputStream.html claims that FSDataInputStream has a seek() method, but javap doesn't show one: $ javap -classpath [haddoopjars] org.apache.hadoop.fs.FSDataInputStream Compiled from

  1   2   3   >