Anatomy of read in hdfs
Hi Genies, I have a small doubt that hdfs read operation is parallel or sequential process. Because from my understanding it should be parallel but if I read "hadoop definitive guide 4" in anatomy of read it says "*Data is streamed from the datanode back **to the client, which calls read() repeatedly on the stream (step 4). When the end of the **block is reached, DFSInputStream will close the connection to the datanode, then find **the best datanode for the next block (step 5). This happens transparently to the client, **which from its point of view is just reading a continuous stream*." So can you kindly explain me how read operation will exactly happens. Thanks for your help in advance Sidharth
Customize Sqoop default property
Hi, I am importing data from RDBMS to hadoop using sqoop but my RDBMS data is multi valued and contains "," special character. So, While importing data using sqoop into hadoop ,sqoop by default it separate the columns by using "," character. Is there any property through which we can customize this character from "," to "|" or any other special character which is not a part of data. Thanks Sidharth
Re: Physical memory (bytes) snapshot counter question - how to get maximum memory used in reduce task
There are two new counters, MAP_PHYSICAL_MEMORY_BYTES_MAX and REDUCE_PHYSICAL_MEMORY_BYTES_MAX that give you the max value for map and reduce respectively. Thanks, Miklos On Wed, Apr 5, 2017 at 6:37 PM, Aaron Eng wrote: > An important consideration is the difference between the RSS of the JVM > process vs. the used heap size. Which of those are you looking for? And > also, importantly, why/what do you plan to do with that info? > > A second important consideration is the length of time you are at/around > your max RSS/java heap. Holding X MB of memory for 100ms is very different > from holding X MB of memory for 100 seconds. Are you looking for that > info? And if so, how do you plan to use it? > > > On Apr 5, 2017, at 6:15 PM, Nico Pappagianis < > nico.pappagia...@salesforce.com> wrote: > > > > Hi all > > > > I've made some memory optimizations on the reduce task and I would like > to compare the old reducer vs new reducer in terms of maximum memory > consumption. > > > > I have a question regarding the description of the following counter: > > > > PHYSICAL_MEMORY_BYTES | Physical memory (bytes) snapshot | Total > physical memory used by all tasks including spilled data. > > > > I'm assuming this means the aggregate of memory used throughout the > entire reduce task (if viewing at the reduce task-level). > > Please correct me if I'm wrong on this assumption (the description seems > pretty straightforward). > > > > Is there a way to get the maximum (not total) memory used by a reduce > task from the default counters? > > > > Thanks! > > > > > > > > > > - > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > For additional commands, e-mail: user-h...@hadoop.apache.org > >
Re: Physical memory (bytes) snapshot counter question - how to get maximum memory used in reduce task
An important consideration is the difference between the RSS of the JVM process vs. the used heap size. Which of those are you looking for? And also, importantly, why/what do you plan to do with that info? A second important consideration is the length of time you are at/around your max RSS/java heap. Holding X MB of memory for 100ms is very different from holding X MB of memory for 100 seconds. Are you looking for that info? And if so, how do you plan to use it? > On Apr 5, 2017, at 6:15 PM, Nico Pappagianis > wrote: > > Hi all > > I've made some memory optimizations on the reduce task and I would like to > compare the old reducer vs new reducer in terms of maximum memory consumption. > > I have a question regarding the description of the following counter: > > PHYSICAL_MEMORY_BYTES | Physical memory (bytes) snapshot | Total physical > memory used by all tasks including spilled data. > > I'm assuming this means the aggregate of memory used throughout the entire > reduce task (if viewing at the reduce task-level). > Please correct me if I'm wrong on this assumption (the description seems > pretty straightforward). > > Is there a way to get the maximum (not total) memory used by a reduce task > from the default counters? > > Thanks! > > > > - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org