[jira] Commented: (MAPREDUCE-834) When TaskTracker config use old memory management values its memory monitoring is diabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744915#action_12744915 ] Hemanth Yamijala commented on MAPREDUCE-834: Few comments: - Memory allotted for slot based on old configuration should not be based on getMaxVirtualMemoryForTask(), but on JobConf.MAPRED_TASK_DEFAULT_MAXVMEM_PROPERTY. Also note that this value will be in bytes, while the system maintains everything else in MB. So, it should be converted to MB. - testTaskMemoryMonitoringWithDeprecatedConfiguration should also set the TT configuration for JobConf.MAPRED_TASK_DEFAULT_MAXVMEM_PROPERTY in bytes instead of MAPRED_TASK_MAXVMEM_PROPERTY. > When TaskTracker config use old memory management values its memory > monitoring is diabled. > -- > > Key: MAPREDUCE-834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-834 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Karam Singh > Attachments: mapreduce-834-1.patch > > > TaskTracker memory config values -: > mapred.tasktracker.vmem.reserved=8589934592 > mapred.task.default.maxvmem=2147483648 > mapred.task.limit.maxvmem=4294967296 > mapred.tasktracker.pmem.reserved=2147483648 > TaskTracker start as -: >2009-08-05 12:39:03,308 WARN > org.apache.hadoop.mapred.TaskTracker: The variable > mapred.tasktracker.vmem.reserved is no longer used > 2009-08-05 12:39:03,308 WARN > org.apache.hadoop.mapred.TaskTracker: The variable > mapred.tasktracker.pmem.reserved is no longer used > 2009-08-05 12:39:03,308 WARN > org.apache.hadoop.mapred.TaskTracker: The variable > mapred.task.default.maxvmem is no longer used > 2009-08-05 12:39:03,308 WARN > org.apache.hadoop.mapred.TaskTracker: The variable mapred.task.limit.maxvmem > is no longer used > 2009-08-05 12:39:03,308 INFO > org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for > all reduce tasks on > 2009-08-05 12:39:03,309 INFO > org.apache.hadoop.mapred.TaskTracker: Using MemoryCalculatorPlugin : > org.apache.hadoop.util.linuxmemorycalculatorplu...@19be4777 > 2009-08-05 12:39:03,311 WARN > org.apache.hadoop.mapred.TaskTracker: TaskTracker's > totalMemoryAllottedForTasks is -1. TaskMemoryManager is disabled. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-777) A method for finding and tracking jobs from the new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-777: -- Status: Patch Available (was: Open) > A method for finding and tracking jobs from the new API > --- > > Key: MAPREDUCE-777 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-777 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: client >Reporter: Owen O'Malley >Assignee: Amareshwari Sriramadasu > Fix For: 0.21.0 > > Attachments: patch-777-1.txt, patch-777-2.txt, patch-777.txt > > > We need to create a replacement interface for the JobClient API in the new > interface. In particular, the user needs to be able to query and track jobs > that were launched by other processes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-777) A method for finding and tracking jobs from the new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-777: -- Attachment: patch-777-2.txt Patch incorporating review comments except comment(4). bq. Move Counters(org.apache.hadoop.mapred.Counters counters) to a method in old api This needs CounterGroup constructor(s) to be made public. So, did not move this method to old api. > A method for finding and tracking jobs from the new API > --- > > Key: MAPREDUCE-777 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-777 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: client >Reporter: Owen O'Malley >Assignee: Amareshwari Sriramadasu > Fix For: 0.21.0 > > Attachments: patch-777-1.txt, patch-777-2.txt, patch-777.txt > > > We need to create a replacement interface for the JobClient API in the new > interface. In particular, the user needs to be able to query and track jobs > that were launched by other processes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744910#action_12744910 ] rahul k singh commented on MAPREDUCE-861: - We had an offline discussion with owen and eric. There was an agreement in principal to use option 2 with slight modification . So all the configuration still remains the same except part. would change to Hence the new configuration would look like: {code:xml} queue1 subQueue1 alice,bob running {code} > Modify queue configuration format and parsing to support a hierarchy of > queues. > --- > > Key: MAPREDUCE-861 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-861 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Hemanth Yamijala >Assignee: rahul k singh > > MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce > framework. This JIRA is for defining changes to the configuration related to > queues. > The current format for defining a queue and its properties is as follows: > mapred.queue... For e.g. > mapred.queue..acl-submit-job. The reason for using this verbose > format was to be able to reuse the Configuration parser in Hadoop. However, > administrators currently using the queue configuration have already indicated > a very strong desire for a more manageable format. Since, this becomes more > unwieldy with hierarchical queues, the time may be good to introduce a new > format for representing queue configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-777) A method for finding and tracking jobs from the new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-777: -- Status: Open (was: Patch Available) Cancelling patch to incorporate offline comments from Amar Comments include: 1. Introduce Counters.downgrade() instead of constructor 2. {code} + org.apache.hadoop.mapreduce.JobClient.TaskStatusFilter newFilter = +getNewFilter(filter); + printTaskEvents(events, newFilter, profiling, mapRanges, reduceRanges); {code} Use getNewFilter directly. 3. deprecate public methods in jobtracker, that got changed for new JobSubmissionProtocol 4. Move Counters(org.apache.hadoop.mapred.Counters counters) to a method in old api > A method for finding and tracking jobs from the new API > --- > > Key: MAPREDUCE-777 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-777 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: client >Reporter: Owen O'Malley >Assignee: Amareshwari Sriramadasu > Fix For: 0.21.0 > > Attachments: patch-777-1.txt, patch-777.txt > > > We need to create a replacement interface for the JobClient API in the new > interface. In particular, the user needs to be able to query and track jobs > that were launched by other processes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-880) TestRecoveryManager times out
[ https://issues.apache.org/jira/browse/MAPREDUCE-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744883#action_12744883 ] Amar Kamat commented on MAPREDUCE-880: -- Looked into this. Looks like the problem is with the case where the jobtracker is dead and the tasktrackers have some tasks running. In such cases MiniMRCluster.shutdown() waits forever for the task to finish (tracker to be idle). Somehow earlier the tasks were not scheduled and it used to work fine. Continuing with the debugging. > TestRecoveryManager times out > - > > Key: MAPREDUCE-880 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-880 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Amar Kamat > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-832) Too many WARN messages about deprecated memorty config variables in JobTacker log
[ https://issues.apache.org/jira/browse/MAPREDUCE-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-832: --- Assignee: rahul k singh Status: Patch Available (was: Open) > Too many WARN messages about deprecated memorty config variables in JobTacker > log > - > > Key: MAPREDUCE-832 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-832 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: Karam Singh >Assignee: rahul k singh > Attachments: mapreduce-832-20.patch, mapreduce-832.patch > > > When user submit a mapred job using old memory config vairiable > (mapred.task.maxmem) followinig message too many times in JobTracker logs -: > [ > WARN org.apache.hadoop.mapred.JobConf: The variable mapred.task.maxvmem is no > longer used instead use mapred.job.map.memory.mb and > mapred.job.reduce.memory.mb > ] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-832) Too many WARN messages about deprecated memorty config variables in JobTacker log
[ https://issues.apache.org/jira/browse/MAPREDUCE-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-832: --- Attachment: mapreduce-832.patch Attached a new patch that works for trunk. It is the same as what Rahul uploaded, except I modified the method checkAndWarnDeprecation to not require a Configuration instance. Instead, it uses the current object's values itself. Running this through Hudson > Too many WARN messages about deprecated memorty config variables in JobTacker > log > - > > Key: MAPREDUCE-832 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-832 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: Karam Singh > Attachments: mapreduce-832-20.patch, mapreduce-832.patch > > > When user submit a mapred job using old memory config vairiable > (mapred.task.maxmem) followinig message too many times in JobTracker logs -: > [ > WARN org.apache.hadoop.mapred.JobConf: The variable mapred.task.maxvmem is no > longer used instead use mapred.job.map.memory.mb and > mapred.job.reduce.memory.mb > ] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-336) The logging level of the tasks should be configurable by the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy reassigned MAPREDUCE-336: --- Assignee: Arun C Murthy > The logging level of the tasks should be configurable by the job > > > Key: MAPREDUCE-336 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-336 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Arun C Murthy > Fix For: 0.21.0 > > Attachments: MAPREDUCE-336_0_20090818.patch > > > It would be nice to be able to configure the logging level of the Task JVM's > separately from the server JVM's. Reducing logging substantially increases > performance and reduces the consumption of local disk on the task trackers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-336) The logging level of the tasks should be configurable by the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-336: Fix Version/s: 0.21.0 Status: Patch Available (was: Open) > The logging level of the tasks should be configurable by the job > > > Key: MAPREDUCE-336 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-336 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Arun C Murthy > Fix For: 0.21.0 > > Attachments: MAPREDUCE-336_0_20090818.patch > > > It would be nice to be able to configure the logging level of the Task JVM's > separately from the server JVM's. Reducing logging substantially increases > performance and reduces the consumption of local disk on the task trackers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-336) The logging level of the tasks should be configurable by the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-336: Attachment: MAPREDUCE-336_0_20090818.patch Straight-forward fix. > The logging level of the tasks should be configurable by the job > > > Key: MAPREDUCE-336 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-336 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Arun C Murthy > Fix For: 0.21.0 > > Attachments: MAPREDUCE-336_0_20090818.patch > > > It would be nice to be able to configure the logging level of the Task JVM's > separately from the server JVM's. Reducing logging substantially increases > performance and reduces the consumption of local disk on the task trackers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-875) Make DBRecordReader execute queries lazily
[ https://issues.apache.org/jira/browse/MAPREDUCE-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-875: Attachment: MAPREDUCE-875.2.patch Attaching new patch after resync'ing with trunk. Just realized that avro was already added to sqoop's ivy.xml > Make DBRecordReader execute queries lazily > -- > > Key: MAPREDUCE-875 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-875 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-875.2.patch, MAPREDUCE-875.patch > > > DBInputFormat's DBRecordReader executes the user's SQL query in the > constructor. If the query is long-running, this can cause task timeout. The > user is unable to spawn a background thread (e.g., in a MapRunnable) to > inform Hadoop of on-going progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-875) Make DBRecordReader execute queries lazily
[ https://issues.apache.org/jira/browse/MAPREDUCE-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-875: Status: Patch Available (was: Open) > Make DBRecordReader execute queries lazily > -- > > Key: MAPREDUCE-875 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-875 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-875.patch > > > DBInputFormat's DBRecordReader executes the user's SQL query in the > constructor. If the query is long-running, this can cause task timeout. The > user is unable to spawn a background thread (e.g., in a MapRunnable) to > inform Hadoop of on-going progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-875) Make DBRecordReader execute queries lazily
[ https://issues.apache.org/jira/browse/MAPREDUCE-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744816#action_12744816 ] Aaron Kimball commented on MAPREDUCE-875: - The failing Sqoop tests claim that it's failing because it can't find Avro. Not sure why this is happening -- Sqoop doesn't make use of Avro anywhere. Recycling patch status in case this was transient. If not, do I have to put some more random libraries in ivy.xml? According to {{git-blame}}, this was added to the root ivy.xml earlier that day: {code} 9e58f6fc (Sharad Agarwal 2009-08-14 05:10:40 + 276) 9e58f6fc (Sharad Agarwal 2009-08-14 05:10:40 + 280) 9e58f6fc (Sharad Agarwal 2009-08-14 05:10:40 + 284) {code} > Make DBRecordReader execute queries lazily > -- > > Key: MAPREDUCE-875 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-875 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-875.patch > > > DBInputFormat's DBRecordReader executes the user's SQL query in the > constructor. If the query is long-running, this can cause task timeout. The > user is unable to spawn a background thread (e.g., in a MapRunnable) to > inform Hadoop of on-going progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-875) Make DBRecordReader execute queries lazily
[ https://issues.apache.org/jira/browse/MAPREDUCE-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-875: Status: Open (was: Patch Available) > Make DBRecordReader execute queries lazily > -- > > Key: MAPREDUCE-875 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-875 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-875.patch > > > DBInputFormat's DBRecordReader executes the user's SQL query in the > constructor. If the query is long-running, this can cause task timeout. The > user is unable to spawn a background thread (e.g., in a MapRunnable) to > inform Hadoop of on-going progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-885) More efficient SQL queries for DBInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744811#action_12744811 ] Aaron Kimball commented on MAPREDUCE-885: - I think this patch won't apply until MAPREDUCE-875 is in. > More efficient SQL queries for DBInputFormat > > > Key: MAPREDUCE-885 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-885 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-885.patch > > > DBInputFormat generates InputSplits by counting the available rows in a > table, and selecting subsections of the table via the "LIMIT" and "OFFSET" > SQL keywords. These are only meaningful in an ordered context, so the query > also includes an "ORDER BY" clause on an index column. The resulting queries > are often inefficient and require full table scans. Actually using multiple > mappers with these queries can lead to O(n^2) behavior in the database, where > n is the number of splits. Attempting to use parallelism with these queries > is counter-productive. > A better mechanism is to organize splits based on data values themselves, > which can be performed in the WHERE clause, allowing for index range scans of > tables, and can better exploit parallelism in the database. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-885) More efficient SQL queries for DBInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744809#action_12744809 ] Hadoop QA commented on MAPREDUCE-885: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416936/MAPREDUCE-885.patch against trunk revision 805324. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/490/console This message is automatically generated. > More efficient SQL queries for DBInputFormat > > > Key: MAPREDUCE-885 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-885 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-885.patch > > > DBInputFormat generates InputSplits by counting the available rows in a > table, and selecting subsections of the table via the "LIMIT" and "OFFSET" > SQL keywords. These are only meaningful in an ordered context, so the query > also includes an "ORDER BY" clause on an index column. The resulting queries > are often inefficient and require full table scans. Actually using multiple > mappers with these queries can lead to O(n^2) behavior in the database, where > n is the number of splits. Attempting to use parallelism with these queries > is counter-productive. > A better mechanism is to organize splits based on data values themselves, > which can be performed in the WHERE clause, allowing for index range scans of > tables, and can better exploit parallelism in the database. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-885) More efficient SQL queries for DBInputFormat
More efficient SQL queries for DBInputFormat Key: MAPREDUCE-885 URL: https://issues.apache.org/jira/browse/MAPREDUCE-885 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-885.patch DBInputFormat generates InputSplits by counting the available rows in a table, and selecting subsections of the table via the "LIMIT" and "OFFSET" SQL keywords. These are only meaningful in an ordered context, so the query also includes an "ORDER BY" clause on an index column. The resulting queries are often inefficient and require full table scans. Actually using multiple mappers with these queries can lead to O(n^2) behavior in the database, where n is the number of splits. Attempting to use parallelism with these queries is counter-productive. A better mechanism is to organize splits based on data values themselves, which can be performed in the WHERE clause, allowing for index range scans of tables, and can better exploit parallelism in the database. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-885) More efficient SQL queries for DBInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-885: Status: Patch Available (was: Open) > More efficient SQL queries for DBInputFormat > > > Key: MAPREDUCE-885 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-885 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-885.patch > > > DBInputFormat generates InputSplits by counting the available rows in a > table, and selecting subsections of the table via the "LIMIT" and "OFFSET" > SQL keywords. These are only meaningful in an ordered context, so the query > also includes an "ORDER BY" clause on an index column. The resulting queries > are often inefficient and require full table scans. Actually using multiple > mappers with these queries can lead to O(n^2) behavior in the database, where > n is the number of splits. Attempting to use parallelism with these queries > is counter-productive. > A better mechanism is to organize splits based on data values themselves, > which can be performed in the WHERE clause, allowing for index range scans of > tables, and can better exploit parallelism in the database. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-885) More efficient SQL queries for DBInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-885: Attachment: MAPREDUCE-885.patch This patch introduces DataDrivenDBInputFormat. This class extends DBInputFormat and reuses much of its common logic (e.g., setting up and tearing down connections, configuration, DBWritable, etc). But it adds a DataDrivenDBInputSplit class which splits queries based on data values, e.g. {{"id >= 10 AND id < 20"}} for one split, and {{"id >= 20 AND id < 30"}} for the next one. The resulting queries run significantly faster and parallelise properly. Instead of requiring a counting query like DBInputFormat, this InputFormat requires a query that returns the min and max values of the split column on the data to import. DataDrivenDBInputSplit is a subclass of DBInputSplit; the original DBRecordReader family of classes has been modified to discriminate between the new InputSplit class vs. the old one; if it detects a new one, it submits the newer WHERE-based query rather than the LIMIT/OFFSET-based query to the database. The min and max values of the column are used to generate splits via linear interpolation between the values. A DBSplitter interface has been added, which takes the min and max values for the column, as well as the number of splits to use. It then generates about this many splits, which subdivide the range of values into roughly-even intervals. Several DBSplitter implementations are provided which are applicable to different data types. For example, there is an IntegerSplitter which can split INTEGER, BIGINT, TINYINT, LONG, etc. columns. The FloatSplitter implementation works on DECIMAL, NUMBER, and REAL datatypes. A TextSplitter implementation is provided, but its utility is database-dependent. Databases may choose to sort strings via a number of algorithms (e.g., case-sensitive vs. case-insensitive). The TextSplitter assumes that strings are sorted in Unicode codepoint order. (e.g., "AAA" < "BBB" < "aaa".) A warning will be logged if the TextSplitter is used. Explicit tests have been added for some of the splitters. Sqoop has been modified to use the new InputFormat with encouraging performance results. Sqoop's existing regression test suite exercises the code paths for all the splitters and isolated several bugs which were fixed prior to submitting this patch. I will post the Sqoop patch separately after this JIRA issue is committed. > More efficient SQL queries for DBInputFormat > > > Key: MAPREDUCE-885 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-885 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-885.patch > > > DBInputFormat generates InputSplits by counting the available rows in a > table, and selecting subsections of the table via the "LIMIT" and "OFFSET" > SQL keywords. These are only meaningful in an ordered context, so the query > also includes an "ORDER BY" clause on an index column. The resulting queries > are often inefficient and require full table scans. Actually using multiple > mappers with these queries can lead to O(n^2) behavior in the database, where > n is the number of splits. Attempting to use parallelism with these queries > is counter-productive. > A better mechanism is to organize splits based on data values themselves, > which can be performed in the WHERE clause, allowing for index range scans of > tables, and can better exploit parallelism in the database. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-883) harchive: Document how to unarchive
[ https://issues.apache.org/jira/browse/MAPREDUCE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated MAPREDUCE-883: --- Attachment: mapreduce-883-0.patch Simple doc suggesting to use cp/distcp for unarchiving. > harchive: Document how to unarchive > --- > > Key: MAPREDUCE-883 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-883 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: documentation, harchive >Reporter: Koji Noguchi >Priority: Minor > Attachments: mapreduce-883-0.patch > > > I was thinking of implementing harchive's 'unarchive' feature, but realized > it has been implemented already ever since harchive was introduced. > It just needs to be documented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-237) Runtimes of TestJobTrackerRestart* testcases are high again
[ https://issues.apache.org/jira/browse/MAPREDUCE-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744661#action_12744661 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-237: -- Got a TestJobTrackerRestart timeout on Hudson [build #487|http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/487/testReport/org.apache.hadoop.mapred/TestJobTrackerRestart/testJobTrackerRestart/]. > Runtimes of TestJobTrackerRestart* testcases are high again > --- > > Key: MAPREDUCE-237 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-237 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Amar Kamat >Assignee: Amar Kamat > > [junit] Running org.apache.hadoop.mapred.TestJobTrackerRestart > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 575.887 sec > [junit] Running org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 864.319 sec > Something I saw on trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)
[ https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744516#action_12744516 ] Hadoop QA commented on MAPREDUCE-476: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416836/MAPREDUCE-476-20090818.txt against trunk revision 805324. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 14 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/489/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/489/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/489/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/489/console This message is automatically generated. > extend DistributedCache to work locally (LocalJobRunner) > > > Key: MAPREDUCE-476 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-476 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: sam rash >Assignee: Philip Zeyliger >Priority: Minor > Attachments: HADOOP-2914-v1-full.patch, > HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, > MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, > MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, > MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, > MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, > MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476.patch > > > The DistributedCache does not work locally when using the outlined recipe at > http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html > > Ideally, LocalJobRunner would take care of populating the JobConf and copying > remote files to the local file sytem (http, assume hdfs = default fs = local > fs when doing local development. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744505#action_12744505 ] rahul k singh commented on MAPREDUCE-861: - There is small error in the xsd mentioned above for option 2: {code:xml} {code} > Modify queue configuration format and parsing to support a hierarchy of > queues. > --- > > Key: MAPREDUCE-861 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-861 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Hemanth Yamijala >Assignee: rahul k singh > > MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce > framework. This JIRA is for defining changes to the configuration related to > queues. > The current format for defining a queue and its properties is as follows: > mapred.queue... For e.g. > mapred.queue..acl-submit-job. The reason for using this verbose > format was to be able to reuse the Configuration parser in Hadoop. However, > administrators currently using the queue configuration have already indicated > a very strong desire for a more manageable format. Since, this becomes more > unwieldy with hierarchical queues, the time may be good to introduce a new > format for representing queue configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-711) Move Distributed Cache from Common to Map/Reduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala resolved MAPREDUCE-711. Resolution: Fixed Fix Version/s: 0.21.0 Release Note: - Removed distributed cache classes and package from the Common project. - Added the same to the mapreduce project. - This will mean that users using Distributed Cache will now necessarily need the mapreduce jar in Hadoop 0.21. - Modified the package name to o.a.h.mapreduce.filecache from o.a.h.filecache and deprecated the old package name. Hadoop Flags: [Incompatible change, Reviewed] HDFS tests have also passed. Now, all the projects are sync'ed up. I committed this to trunk. Thanks, Vinod ! > Move Distributed Cache from Common to Map/Reduce > > > Key: MAPREDUCE-711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-711 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Vinod K V > Fix For: 0.21.0 > > Attachments: MAPREDUCE-711-20090709-common.txt, > MAPREDUCE-711-20090709-mapreduce.1.txt, MAPREDUCE-711-20090709-mapreduce.txt, > MAPREDUCE-711-20090710.txt > > > Distributed Cache logically belongs as part of map/reduce and not Common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-181) mapred.system.dir should be accessible only to hadoop daemons
[ https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744467#action_12744467 ] Devaraj Das commented on MAPREDUCE-181: --- I wonder whether it makes sense to have the jobclient write two files per a split file: 1) the splits info (the actual bytes) written to a secure location on the hdfs (with permissions 700) 2) the split metadata, which is a set of entries like {:.., } for each map-id. This is serialized over RPC, and the JobTracker writes it to the well known mapred-system-directory (which the JobTracker owns with perms 700). The JobTracker just reads/loads the metadata, and creates the TIP cache. The TaskTracker is handed off a split object that looks something like {}. As part of task localization, the TT copies the specific bytes from the split file (securely), and launches the task that then reads the split or the TT could simply stream it over RPC to the child. The replication factor could be set to a high number for the splits info file.. Doing it in this way should reduce the size of the split file information considerably (and we can have a cap on the metadata size as well), and also provide security for the user generated split files' content. For the JobConf, passing the basic and the minimum info to the JobTracker as Hong suggested on MAPREDUCE-841 seems to make sense. For all other conf properties, the Task can load them directly from the HDFS. The max size (in terms of #bytes) of the basic information could be easily derived and we could have a cap on that for the RPC communication. Thoughts? > mapred.system.dir should be accessible only to hadoop daemons > -- > > Key: MAPREDUCE-181 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-181 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Amar Kamat >Assignee: Amar Kamat > Attachments: hadoop-3578-branch-20-example-2.patch, > hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, > HADOOP-3578-v2.7.patch > > > Currently the jobclient accesses the {{mapred.system.dir}} to add job > details. Hence the {{mapred.system.dir}} has the permissions of > {{rwx-wx-wx}}. This could be a security loophole where the job files might > get overwritten/tampered after the job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-711) Move Distributed Cache from Common to Map/Reduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744464#action_12744464 ] Hudson commented on MAPREDUCE-711: -- Integrated in Hadoop-Hdfs-trunk #53 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/53/]) . Updated common and mapreduce jars from rev 804918 & 805081 resp. > Move Distributed Cache from Common to Map/Reduce > > > Key: MAPREDUCE-711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-711 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Vinod K V > Attachments: MAPREDUCE-711-20090709-common.txt, > MAPREDUCE-711-20090709-mapreduce.1.txt, MAPREDUCE-711-20090709-mapreduce.txt, > MAPREDUCE-711-20090710.txt > > > Distributed Cache logically belongs as part of map/reduce and not Common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-849) Renaming of configuration property names in mapreduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744462#action_12744462 ] Amareshwari Sriramadasu commented on MAPREDUCE-849: --- bq. I am assuming that configuration related to sub-components should start with a prefix of the parent component. For e.g., mapred.healthChecker.script.args will be mapreduce.tasktracker.healthChecker.script-args . Right? Yes. I will post a document which contains complete change-list of old name to new name. > Renaming of configuration property names in mapreduce > - > > Key: MAPREDUCE-849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-849 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Fix For: 0.21.0 > > > In-line with HDFS-531, property names in configuration files should be > standardized in MAPREDUCE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-849) Renaming of configuration property names in mapreduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744459#action_12744459 ] Vinod K V commented on MAPREDUCE-849: - These names look a lot cleaner. +1 for the overall direction. But, we should also think of ways to continue doing this going forward even after this issue gets committed. While doing this, if we can create the corresponding java.lang.String property names, ala HADOOP-3583 , and use them everywhere, it will be real good. For e.g., {code} static final String MAPREDUCE_CLUSTER_EXAMPLE_CONFIG_PROPERTY = "mapreduce.cluster.example.config" {code} Also, I think usage of strings like _mapreduce.map.max.attempts_ and _mapreduce.jobtracker.maxtasks.per.job_ should be discouraged in favour of _mapreduce.map.max-attempts_ and _mapreduce.jobtracker.maxtasks-per-job_ respectively. Thoughts about this? I am assuming that configuration related to sub-components should start with a prefix of the parent component. For e.g., _mapred.healthChecker.script.args_ will be _mapreduce.tasktracker.healthChecker.script-args_ . Right? > Renaming of configuration property names in mapreduce > - > > Key: MAPREDUCE-849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-849 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Fix For: 0.21.0 > > > In-line with HDFS-531, property names in configuration files should be > standardized in MAPREDUCE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744452#action_12744452 ] rahul k singh commented on MAPREDUCE-861: - As mentioned above , we had an internal agreement that we would be going ahead with xml based configuration for hierarchial queues. In terms of how configuration would be structured for hierarchial queues , we had 2 options in mind. Option 1: -- mapred-queues.xml would consist of the hierarchial queue hierarchy. Typical hierarchial queue configuration would look like: {code:xml} q1 q1q1 u1,u2,u3 u1,u2 stop/running {code} The configuration above defines a queue "q1" and a single child "q1q1" tag would act as an black box kind of section for the mapred based parsers. The xsd definition of would be {code:xml} {code} By defining as we can extend this section of configuration to add any kind of tags to the . Advantage: 1.This approach allows to have a single configuration file 2. It is generic enough as in it allows users to declare scheduler properties the way they want. Disadvantage: 1. This would result in having the parsing logic at different places , for framework level changes in framework and scheduler specific parsing would be done in scheduler. 2. More cumbersome to implement . Option 2: - Same as option 1 except the definition of would change . It would have child tags and which would define the key value mappings of the various properties required by schedulers. For example: {code:xml} q1 q1q1 u1,u2,u3 u1,u2 stop/running capacity maxCapacity {code} the new xsd for would look like {code:xml} {code} Advantage: 1. Allows to have a single configuration file. 2. Provides constant way to specify scheduling properties. 3. Easier to implement and parsing logic now resides at one common place. Disadvantage: 1. Doesn't allows the nested settings for scheduler properties. 2. Assumes that scheduler properties would always be in key value format. > Modify queue configuration format and parsing to support a hierarchy of > queues. > --- > > Key: MAPREDUCE-861 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-861 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Hemanth Yamijala >Assignee: rahul k singh > > MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce > framework. This JIRA is for defining changes to the configuration related to > queues. > The current format for defining a queue and its properties is as follows: > mapred.queue... For e.g. > mapred.queue..acl-submit-job. The reason for using this verbose > format was to be able to reuse the Configuration parser in Hadoop. However, > administrators currently using the queue configuration have already indicated > a very strong desire for a more manageable format. Since, this becomes more > unwieldy with hierarchical queues, the time may be good to introduce a new > format for representing queue configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-849) Renaming of configuration property names in mapreduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1279#action_1279 ] Amareshwari Sriramadasu commented on MAPREDUCE-849: --- Configuration properties in Mapreduce project are be catagorized into the following and suggested name for each catagory. ||Catagory|| Suggested Name|| |Cluster config | mapreduce.* | |JobTracker config | mapreduce.jobtracker.* | |TaskTracker config | mapreduce.tasktracker.* | |Job-level config | mapreduce.job.* | |Task-level config | mapreduce.task.* | |Map task config | mapreduce.map.* | |Reduce task config | mapreduce.reduce.* | |Job client config | mapreduce.jobclient.* | |Pipes config | mapreduce.pipes.* | |Lib config | mapreduce..* | |Example config | mapreduce..* | |Test config | mapreduce.test.* | |Streaming config | mapreduce.streaming.* or streaming.*| |Contrib project config | mapreduce..* or .* | Thoughts? > Renaming of configuration property names in mapreduce > - > > Key: MAPREDUCE-849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-849 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Fix For: 0.21.0 > > > In-line with HDFS-531, property names in configuration files should be > standardized in MAPREDUCE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-430) Task stuck in cleanup with OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/MAPREDUCE-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1271#action_1271 ] Hadoop QA commented on MAPREDUCE-430: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416767/MAPREDUCE-430-v1.7.patch against trunk revision 805081. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/488/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/488/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/488/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/488/console This message is automatically generated. > Task stuck in cleanup with OutOfMemoryErrors > > > Key: MAPREDUCE-430 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-430 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Amareshwari Sriramadasu >Assignee: Amar Kamat > Fix For: 0.20.1 > > Attachments: MAPREDUCE-430-v1.6-branch-0.20.patch, > MAPREDUCE-430-v1.6.patch, MAPREDUCE-430-v1.7.patch > > > Obesrved a task with OutOfMemory error, stuck in cleanup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-862) Modify UI to support a hierarchy of queues
[ https://issues.apache.org/jira/browse/MAPREDUCE-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreekanth Ramakrishnan updated MAPREDUCE-862: - Attachment: subqueue.png > Modify UI to support a hierarchy of queues > -- > > Key: MAPREDUCE-862 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-862 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Hemanth Yamijala > Attachments: clustersummarymodification.png, detailspage.png, > initialscreen.png, subqueue.png > > > MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce > framework. This JIRA is for defining changes to the UI related to queues. > This includes the hadoop queue CLI and the web UI on the JobTracker. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-862) Modify UI to support a hierarchy of queues
[ https://issues.apache.org/jira/browse/MAPREDUCE-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreekanth Ramakrishnan updated MAPREDUCE-862: - Attachment: initialscreen.png detailspage.png clustersummarymodification.png Attaching screens of how the UI would look for modified queue design. Cluster summary would be modified, to introduce a new column which will have a number of Queue, which will be linked to modified queue details page, which is described in initialscreen.png. >From initalscreen.png we can click on the queue hierarchy, which would have >two pages, for {{ContainerQueues}} we would not have a job list and for >{{JobQueue}} we have a job list apart from scheduling information. > Modify UI to support a hierarchy of queues > -- > > Key: MAPREDUCE-862 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-862 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Hemanth Yamijala > Attachments: clustersummarymodification.png, detailspage.png, > initialscreen.png, subqueue.png > > > MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce > framework. This JIRA is for defining changes to the UI related to queues. > This includes the hadoop queue CLI and the web UI on the JobTracker. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-711) Move Distributed Cache from Common to Map/Reduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744426#action_12744426 ] Giridharan Kesavan commented on MAPREDUCE-711: -- Updated hdfs/lib with common and mapreduce jars from rev 804918 & 805081 resp. Triggered a hdfs trunk build (build added to build queue, as vesta is still running a patch build). http://hudson.zones.apache.org/hudson/view/Hdfs/job/Hadoop-Hdfs-trunk/52/ > Move Distributed Cache from Common to Map/Reduce > > > Key: MAPREDUCE-711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-711 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Vinod K V > Attachments: MAPREDUCE-711-20090709-common.txt, > MAPREDUCE-711-20090709-mapreduce.1.txt, MAPREDUCE-711-20090709-mapreduce.txt, > MAPREDUCE-711-20090710.txt > > > Distributed Cache logically belongs as part of map/reduce and not Common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-773) LineRecordReader can report non-zero progress while it is processing a compressed stream
[ https://issues.apache.org/jira/browse/MAPREDUCE-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-773: -- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this. > LineRecordReader can report non-zero progress while it is processing a > compressed stream > > > Key: MAPREDUCE-773 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-773 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Reporter: Devaraj Das >Assignee: Devaraj Das > Fix For: 0.21.0 > > Attachments: 773.2.patch, 773.3.patch, 773.patch, 773.patch > > > Currently, the LineRecordReader returns 0.0 from getProgress() for most > inputs (since the "end" of the filesplit is set to Long.MAX_VALUE for > compressed inputs). This can be improved to return a non-zero progress even > for compressed streams (though it may not be very reflective of the actual > progress). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-284) Improvements to RPC between Child and TaskTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-284: --- Attachment: MR-284.v1.patch Attaching patch that sets ipc.client.tcpnodelay to true in core-default.xml > Improvements to RPC between Child and TaskTracker > - > > Key: MAPREDUCE-284 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-284 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Arun C Murthy >Assignee: Ravi Gummadi > Fix For: 0.21.0 > > Attachments: MR-284.patch, MR-284.v1.patch > > > We could improve the RPC between the Child and TaskTracker: >* Set ping interval lower by default to 5s >* Disable nagle's algorithm (tcp no-delay) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
[ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744415#action_12744415 ] Jothi Padmanabhan commented on MAPREDUCE-157: - Regarding the interface for readers, we could support two kinds of users: # Users who want fine grained control and would handle the individual events themselves. # Users who want a much more granular, summary kind of information. For users of type 1, who want finer grained information, they could use Event Readers to iterate through events and do the necessary processing For users of type 2, we could provide more granular information through a JobHistoryParser class. This class would internally build the Job-Task-Attempt hierarchy/information by consuming all events using a event reader and make the summary information available for users to access. Users could do some thing like {code} parser.init(history file or stream) JobInfo jobInfo = parser.getJobInfo(); // use the getters to get jobinfo (example: start time, finish time, counters, id, user name, conf, total maps, total reds, among others) List taskInfoList = jobInfo.getAllTasks(); // Iterate through the list and do necessary processing. Getters for taskinfo would include taskid, task type, status, splits, counters, etc List attemptsList = taskinfo.getAllAttempts(); // Attempt info would have getters for attempt id, errors, status, state, start time, finish time, tracker name, port etc. {code} Comments/Suggestions/Thoughts? > Job History log file format is not friendly for external tools. > --- > > Key: MAPREDUCE-157 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-157 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Jothi Padmanabhan > > Currently, parsing the job history logs with external tools is very difficult > because of the format. The most critical problem is that newlines aren't > escaped in the strings. That makes using tools like grep, sed, and awk very > tricky. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-711) Move Distributed Cache from Common to Map/Reduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744414#action_12744414 ] Hemanth Yamijala commented on MAPREDUCE-711: bq. Can you please run tests on Hudson (Giridharan could help with it I suppose) and commit the changes to HDFS when the tests pass. I have already run the tests with the updated jars locally. There does not appear to be a way to run these off Hudson. So, we are planning to commit the jars and then trigger a Hudson HDFS build to make sure things work still. If something breaks, we will revert the commit and check again. (But given they pass locally, I am hoping we won't get to it). Also, the MapReduce build failure in the tests is being tracked in MAPREDUCE-880 and is unrelated to this commit. Giri, can you please commit the common and Map/Reduce jars to HDFS and trigger a build ? > Move Distributed Cache from Common to Map/Reduce > > > Key: MAPREDUCE-711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-711 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Vinod K V > Attachments: MAPREDUCE-711-20090709-common.txt, > MAPREDUCE-711-20090709-mapreduce.1.txt, MAPREDUCE-711-20090709-mapreduce.txt, > MAPREDUCE-711-20090710.txt > > > Distributed Cache logically belongs as part of map/reduce and not Common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.