Re: HDFS buffer sizes
HDFS does not appear to use dfs.stream-buffer-size. On Thu, Jan 23, 2014 at 6:57 AM, John Lilley wrote: > What is the interaction between dfs.stream-buffer-size and > dfs.client-write-packet-size? > > I see that the default for dfs.stream-buffer-size is 4K. Does anyone have > experience using larger buffers to optimize large writes? > > Thanks > > > John > > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: HDFS buffer sizes
Ah, I see... it is a constant CommonConfigurationKeysPublic.java: public static final int IO_FILE_BUFFER_SIZE_DEFAULT = 4096; Are there benefits to increasing this for large reads or writes? john From: Arpit Agarwal [mailto:aagar...@hortonworks.com] Sent: Thursday, January 23, 2014 3:31 PM To: user@hadoop.apache.org Subject: Re: HDFS buffer sizes HDFS does not appear to use dfs.stream-buffer-size. On Thu, Jan 23, 2014 at 6:57 AM, John Lilley mailto:john.lil...@redpoint.net>> wrote: What is the interaction between dfs.stream-buffer-size and dfs.client-write-packet-size? I see that the default for dfs.stream-buffer-size is 4K. Does anyone have experience using larger buffers to optimize large writes? Thanks John CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: HDFS buffer sizes
I don't think that value is used either except in the legacy block reader which is turned off by default. On Fri, Jan 24, 2014 at 6:34 AM, John Lilley wrote: > Ah, I see… it is a constant > > CommonConfigurationKeysPublic.java: public static final int > IO_FILE_BUFFER_SIZE_DEFAULT = 4096; > > Are there benefits to increasing this for large reads or writes? > > john > > > > *From:* Arpit Agarwal [mailto:aagar...@hortonworks.com] > *Sent:* Thursday, January 23, 2014 3:31 PM > *To:* user@hadoop.apache.org > *Subject:* Re: HDFS buffer sizes > > > > HDFS does not appear to use dfs.stream-buffer-size. > > > > On Thu, Jan 23, 2014 at 6:57 AM, John Lilley > wrote: > > What is the interaction between dfs.stream-buffer-size and > dfs.client-write-packet-size? > > I see that the default for dfs.stream-buffer-size is 4K. Does anyone have > experience using larger buffers to optimize large writes? > > Thanks > > > John > > > > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: HDFS buffer sizes
There is this in FileSystem.java, which would appear to use the default buffer size of 4096 in the create() call unless otherwise specified in io.file.buffer.size public FSDataOutputStream create(Path f, short replication, Progressable progress) throws IOException { return create(f, true, getConf().getInt( CommonConfigurationKeysPublic.IO_FILE_BUFFER_SIZE_KEY, CommonConfigurationKeysPublic.IO_FILE_BUFFER_SIZE_DEFAULT), replication, getDefaultBlockSize(f), progress); } But this discussion is missing the point; I really want to know, is there any benefit to setting a larger bufferSize in FileSystem.create() and FileSystem.append()? From: Arpit Agarwal [mailto:aagar...@hortonworks.com] Sent: Friday, January 24, 2014 9:35 AM To: user@hadoop.apache.org Subject: Re: HDFS buffer sizes I don't think that value is used either except in the legacy block reader which is turned off by default. On Fri, Jan 24, 2014 at 6:34 AM, John Lilley mailto:john.lil...@redpoint.net>> wrote: Ah, I see... it is a constant CommonConfigurationKeysPublic.java: public static final int IO_FILE_BUFFER_SIZE_DEFAULT = 4096; Are there benefits to increasing this for large reads or writes? john From: Arpit Agarwal [mailto:aagar...@hortonworks.com<mailto:aagar...@hortonworks.com>] Sent: Thursday, January 23, 2014 3:31 PM To: user@hadoop.apache.org<mailto:user@hadoop.apache.org> Subject: Re: HDFS buffer sizes HDFS does not appear to use dfs.stream-buffer-size. On Thu, Jan 23, 2014 at 6:57 AM, John Lilley mailto:john.lil...@redpoint.net>> wrote: What is the interaction between dfs.stream-buffer-size and dfs.client-write-packet-size? I see that the default for dfs.stream-buffer-size is 4K. Does anyone have experience using larger buffers to optimize large writes? Thanks John CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: HDFS buffer sizes
Looks like DistributedFileSystem ignores it though. On Sat, Jan 25, 2014 at 6:09 AM, John Lilley wrote: > There is this in FileSystem.java, which would appear to use the default > buffer size of 4096 in the create() call unless otherwise specified in > *io.file.buffer.size* > > > > public FSDataOutputStream create(Path f, short replication, > > Progressable progress) throws IOException { > > return create(f, true, > > getConf().getInt( > > > CommonConfigurationKeysPublic.IO_FILE_BUFFER_SIZE_KEY, > > > > CommonConfigurationKeysPublic.IO_FILE_BUFFER_SIZE_DEFAULT), > > replication, > > getDefaultBlockSize(f), progress); > > } > > > > But this discussion is missing the point; I really want to know, is there > any benefit to setting a larger bufferSize in FileSystem.create() and > FileSystem.append()? > > > > *From:* Arpit Agarwal [mailto:aagar...@hortonworks.com] > *Sent:* Friday, January 24, 2014 9:35 AM > > *To:* user@hadoop.apache.org > *Subject:* Re: HDFS buffer sizes > > > > I don't think that value is used either except in the legacy block reader > which is turned off by default. > > > > On Fri, Jan 24, 2014 at 6:34 AM, John Lilley > wrote: > > Ah, I see… it is a constant > > CommonConfigurationKeysPublic.java: public static final int > IO_FILE_BUFFER_SIZE_DEFAULT = 4096; > > Are there benefits to increasing this for large reads or writes? > > john > > > > *From:* Arpit Agarwal [mailto:aagar...@hortonworks.com] > *Sent:* Thursday, January 23, 2014 3:31 PM > *To:* user@hadoop.apache.org > *Subject:* Re: HDFS buffer sizes > > > > HDFS does not appear to use dfs.stream-buffer-size. > > > > On Thu, Jan 23, 2014 at 6:57 AM, John Lilley > wrote: > > What is the interaction between dfs.stream-buffer-size and > dfs.client-write-packet-size? > > I see that the default for dfs.stream-buffer-size is 4K. Does anyone have > experience using larger buffers to optimize large writes? > > Thanks > > > John > > > > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: HDFS buffer sizes
Thanks. Experimentally, I have found that changing the buffers sizes has no effect, so that makes sense. John From: Arpit Agarwal [mailto:aagar...@hortonworks.com] Sent: Tuesday, January 28, 2014 12:35 AM To: user@hadoop.apache.org Subject: Re: HDFS buffer sizes Looks like DistributedFileSystem ignores it though. On Sat, Jan 25, 2014 at 6:09 AM, John Lilley mailto:john.lil...@redpoint.net>> wrote: There is this in FileSystem.java, which would appear to use the default buffer size of 4096 in the create() call unless otherwise specified in io.file.buffer.size public FSDataOutputStream create(Path f, short replication, Progressable progress) throws IOException { return create(f, true, getConf().getInt( CommonConfigurationKeysPublic.IO_FILE_BUFFER_SIZE_KEY, CommonConfigurationKeysPublic.IO_FILE_BUFFER_SIZE_DEFAULT), replication, getDefaultBlockSize(f), progress); } But this discussion is missing the point; I really want to know, is there any benefit to setting a larger bufferSize in FileSystem.create() and FileSystem.append()? From: Arpit Agarwal [mailto:aagar...@hortonworks.com<mailto:aagar...@hortonworks.com>] Sent: Friday, January 24, 2014 9:35 AM To: user@hadoop.apache.org<mailto:user@hadoop.apache.org> Subject: Re: HDFS buffer sizes I don't think that value is used either except in the legacy block reader which is turned off by default. On Fri, Jan 24, 2014 at 6:34 AM, John Lilley mailto:john.lil...@redpoint.net>> wrote: Ah, I see... it is a constant CommonConfigurationKeysPublic.java: public static final int IO_FILE_BUFFER_SIZE_DEFAULT = 4096; Are there benefits to increasing this for large reads or writes? john From: Arpit Agarwal [mailto:aagar...@hortonworks.com<mailto:aagar...@hortonworks.com>] Sent: Thursday, January 23, 2014 3:31 PM To: user@hadoop.apache.org<mailto:user@hadoop.apache.org> Subject: Re: HDFS buffer sizes HDFS does not appear to use dfs.stream-buffer-size. On Thu, Jan 23, 2014 at 6:57 AM, John Lilley mailto:john.lil...@redpoint.net>> wrote: What is the interaction between dfs.stream-buffer-size and dfs.client-write-packet-size? I see that the default for dfs.stream-buffer-size is 4K. Does anyone have experience using larger buffers to optimize large writes? Thanks John CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.