[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1570: Attachment: 1570.5.patch also adding trivial patch for HIVE-1473. filed separate patches for 1473 and 1520 as well - but folded in everything here. > referencing an added file by it's name in a transform script does not work in > hive local mode > - > > Key: HIVE-1570 > URL: https://issues.apache.org/jira/browse/HIVE-1570 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1570.1.patch, 1570.2.patch, 1570.3.patch, 1570.4.patch, > 1570.5.patch > > > Yongqiang tried this and it fails in local mode: > add file ../data/scripts/dumpdata_script.py; > select count(distinct subq.key) from > (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key > = 10) subq; > this needs to be fixed because it means we cannot choose local mode > automatically in case of transform scripts (since different paths need to be > used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1520) hive.mapred.local.mem should only be used in case of local mode job submissions
[ https://issues.apache.org/jira/browse/HIVE-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1520: Attachment: 1520.1.patch > hive.mapred.local.mem should only be used in case of local mode job > submissions > --- > > Key: HIVE-1520 > URL: https://issues.apache.org/jira/browse/HIVE-1520 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Joydeep Sen Sarma > Attachments: 1520.1.patch > > > Currently - whenever we submit a map-reduce job via a child jvm process, hive > sets HADOOP_HEAPSIZE to hive.mapred.local.mem (thereby limiting the max heap > memory of the child jvm). the assumption being that we are submitting a job > for local mode execution and different memory limits apply for that. > however - one can submit jobs via a child jvm for non local mode execution as > well. This is useful, for example, if hive wants to submit jobs via different > hadoop clients (for sending jobs to different hadoop clusters). in such case, > we can use the 'hive.exec.submitviachild' and 'hadoop.bin.path' to dispatch > job via an alternate hadoop client install point. however in such case, we > don't need to set HADOOP_HEAPSIZE. all we are using the child jvm is to run > the small bit of hive code that submits the job (and not for local mode > execution). > in this case - we shouldn't be setting the child jvm's memory limit and > should leave it to what the parent's value is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1473) plan file should have a high replication factor
[ https://issues.apache.org/jira/browse/HIVE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1473: Attachment: 1473.1.patch > plan file should have a high replication factor > --- > > Key: HIVE-1473 > URL: https://issues.apache.org/jira/browse/HIVE-1473 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Joydeep Sen Sarma >Priority: Minor > Attachments: 1473.1.patch > > > it should be set to 10 or something like that (just like job.xml). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1570: Attachment: 1570.4.patch added fix for hive-1520 - don't reset HADOOP_HEAPSIZE unless the child jvm is being launched for local mode execution. it's a one liner - simpler to get it all in in one shot. > referencing an added file by it's name in a transform script does not work in > hive local mode > - > > Key: HIVE-1570 > URL: https://issues.apache.org/jira/browse/HIVE-1570 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1570.1.patch, 1570.2.patch, 1570.3.patch, 1570.4.patch > > > Yongqiang tried this and it fails in local mode: > add file ../data/scripts/dumpdata_script.py; > select count(distinct subq.key) from > (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key > = 10) subq; > this needs to be fixed because it means we cannot choose local mode > automatically in case of transform scripts (since different paths need to be > used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive
[ https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918611#action_12918611 ] Joydeep Sen Sarma commented on HIVE-1620: - if we are overwriting a table ('insert overwrite table') - do we make sure that if the query/job fails in between - then some of the files (from maps/reduces that did succeed) are not left in the table's directory? > Patch to write directly to S3 from Hive > --- > > Key: HIVE-1620 > URL: https://issues.apache.org/jira/browse/HIVE-1620 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Vaibhav Aggarwal >Assignee: Vaibhav Aggarwal > Attachments: HIVE-1620.patch > > > We want to submit a patch to Hive which allows user to write files directly > to S3. > This patch allow user to specify an S3 location as the table output location > and hence eliminates the need of copying data from HDFS to S3. > Users can run Hive queries directly over the data stored in S3. > This patch helps integrate hive with S3 better and quicker. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1675) SAXParseException on plan.xml during local mode.
[ https://issues.apache.org/jira/browse/HIVE-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918609#action_12918609 ] Joydeep Sen Sarma commented on HIVE-1675: - what version of hadoop is this happening against? > SAXParseException on plan.xml during local mode. > > > Key: HIVE-1675 > URL: https://issues.apache.org/jira/browse/HIVE-1675 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Bennie Schut > Attachments: local_10005_plan.xml, local_10006_plan.xml > > > When hive switches to local mode (hive.exec.mode.local.auto=true) I receive a > sax parser exception on the plan.xml > If I set hive.exec.mode.local.auto=false I get the correct results. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job
[ https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918607#action_12918607 ] Joydeep Sen Sarma commented on HIVE-1695: - i think there's already a jira for this > MapJoin followed by ReduceSink should be done as single MapReduce Job > - > > Key: HIVE-1695 > URL: https://issues.apache.org/jira/browse/HIVE-1695 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Amareshwari Sriramadasu > > Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map > only job followed by a Map-Reduce job. It can be combined into single > MapReduce Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1685) scriptfile1.1 in minimr faling intermittently
[ https://issues.apache.org/jira/browse/HIVE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma resolved HIVE-1685. - Resolution: Duplicate the test and the output are not in sync. it should fail everytime. fixing as part of hive-1570 - it's a small change. > scriptfile1.1 in minimr faling intermittently > - > > Key: HIVE-1685 > URL: https://issues.apache.org/jira/browse/HIVE-1685 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Joydeep Sen Sarma > > [junit] Begin query: scriptfile1.q > [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I > lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime > -I Location -I transient_lastDdlTime -I last_modified_ -I > java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I > Caused by: -I [.][.][.] [0-9]* more > /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/scriptfile1.q.out > > /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/scriptfile1.q.out > [junit] 1c1 > [junit] < PREHOOK: query: CREATE TABLE scriptfile1_dest1(key INT, value > STRING) > [junit] --- > [junit] > PREHOOK: query: CREATE TABLE dest1(key INT, value STRING) > [junit] 3c3 > [junit] < POSTHOOK: query: CREATE TABLE scriptfile1_dest1(key INT, value > STRING) > [junit] --- > [junit] > POSTHOOK: query: CREATE TABLE dest1(key INT, value STRING) > [junit] 5c5 > [junit] < POSTHOOK: Output: defa...@scriptfile1_dest1 > [junit] --- > [junit] > POSTHOOK: Output: defa...@dest1 > [junit] 12c12 > [junit] < INSERT OVERWRITE TABLE scriptfile1_dest1 SELECT tmap.tkey, > tmap.tvalue > [junit] --- > [junit] junit.framework.AssertionFailedError: Client execution results > failed with error code = 1 > [junit] > INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue > [junit] See build/ql/tmp/hive.log, or try "ant test ... > -Dtest.silent=false" to get more logs. > [junit] 15c15 > [junit] at junit.framework.Assert.fail(Assert.java:47) > [junit] < PREHOOK: Output: defa...@scriptfile1_dest1 > [junit] at > org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_scriptfile1(TestMinimrCliDriver.java:522) > [junit] --- > [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [junit] > PREHOOK: Output: defa...@dest1 > [junit] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > [junit] 22c22 > [junit] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > [junit] < INSERT OVERWRITE TABLE scriptfile1_dest1 SELECT tmap.tkey, > tmap.tvalue > [junit] at java.lang.reflect.Method.invoke(Method.java:597) > [junit] --- > [junit] at junit.framework.TestCase.runTest(TestCase.java:154) > [junit] > INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue > [junit] at junit.framework.TestCase.runBare(TestCase.java:127) > [junit] 25,28c25,28 > [junit] at junit.framework.TestResult$1.protect(TestResult.java:106) > [junit] < POSTHOOK: Output: defa...@scriptfile1_dest1 > [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) > [junit] < POSTHOOK: Lineage: scriptfile1_dest1.key SCRIPT > [(src)src.FieldSchema(name:key, type:string, comment:default), > (src)src.FieldSchema(name:value, type:string, comment:default), ] > [junit] at junit.framework.TestResult.run(TestResult.java:109) > [junit] at junit.framework.TestCase.run(TestCase.java:118) > [junit] < POSTHOOK: Lineage: scriptfile1_dest1.value SCRIPT > [(src)src.FieldSchema(name:key, type:string, comment:default), > (src)src.FieldSchema(name:value, type:string, comment:default), ] > [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208) > [junit] < PREHOOK: query: SELECT scriptfile1_dest1.* FROM > scriptfile1_dest1 > [junit] at junit.framework.TestSuite.run(TestSuite.java:203) > [junit] --- > [junit] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) > [junit] > POSTHOOK: Output: defa...@dest1 > [junit] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) > [junit] > POSTHOOK: Lineage: dest1.key SCRIPT > [(src)src.FieldSchema(name:key, type:string, comment:default), > (src)src.FieldSchema(name:value, type:string, comment:default), ] > [junit] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768) > [junit] > POSTHOOK: Lineage: dest1.value SCRIPT > [(src)src.FieldSchema(name:key, type:str
[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1570: Attachment: 1570.3.patch added a console output for local mapred jobs containing location of execution log for debugging. > referencing an added file by it's name in a transform script does not work in > hive local mode > - > > Key: HIVE-1570 > URL: https://issues.apache.org/jira/browse/HIVE-1570 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1570.1.patch, 1570.2.patch, 1570.3.patch > > > Yongqiang tried this and it fails in local mode: > add file ../data/scripts/dumpdata_script.py; > select count(distinct subq.key) from > (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key > = 10) subq; > this needs to be fixed because it means we cannot choose local mode > automatically in case of transform scripts (since different paths need to be > used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1570: Status: Patch Available (was: Open) > referencing an added file by it's name in a transform script does not work in > hive local mode > - > > Key: HIVE-1570 > URL: https://issues.apache.org/jira/browse/HIVE-1570 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1570.1.patch, 1570.2.patch > > > Yongqiang tried this and it fails in local mode: > add file ../data/scripts/dumpdata_script.py; > select count(distinct subq.key) from > (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key > = 10) subq; > this needs to be fixed because it means we cannot choose local mode > automatically in case of transform scripts (since different paths need to be > used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1570: Attachment: 1570.2.patch working patch. no need for new test. had to modify some other tests to use 'add file'. > referencing an added file by it's name in a transform script does not work in > hive local mode > - > > Key: HIVE-1570 > URL: https://issues.apache.org/jira/browse/HIVE-1570 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1570.1.patch, 1570.2.patch > > > Yongqiang tried this and it fails in local mode: > add file ../data/scripts/dumpdata_script.py; > select count(distinct subq.key) from > (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key > = 10) subq; > this needs to be fixed because it means we cannot choose local mode > automatically in case of transform scripts (since different paths need to be > used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1570: Attachment: 1570.1.patch before running a map-reduce job in local mode we: 1. set a new working directory 2. symlink all added files from that working directory this is pretty much identical to how hadoop sets up task execution environment. all references to scripts and add files using their names only now resolve correctly in local mode. there was some hacky code in SemanticAnalyzer.java to deal with this that doesn't work in all cases (when referenced file is not the first item in command line or in automatic local mode). i have deleted it. duplicated one of the tests so that we get coverage against a real cluster (scriptfile1.q executed against minimr) and local mode (scriptfile2.q). still running tests. > referencing an added file by it's name in a transform script does not work in > hive local mode > - > > Key: HIVE-1570 > URL: https://issues.apache.org/jira/browse/HIVE-1570 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1570.1.patch > > > Yongqiang tried this and it fails in local mode: > add file ../data/scripts/dumpdata_script.py; > select count(distinct subq.key) from > (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key > = 10) subq; > this needs to be fixed because it means we cannot choose local mode > automatically in case of transform scripts (since different paths need to be > used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened
[ https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910837#action_12910837 ] Joydeep Sen Sarma commented on HIVE-1651: - yeah - but then the directory itself should be created as a tmp directory. and we should promote the directory to it's final name only when closing successfully. > ScriptOperator should not forward any output to downstream operators if an > exception is happened > > > Key: HIVE-1651 > URL: https://issues.apache.org/jira/browse/HIVE-1651 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1651.patch > > > ScriptOperator spawns 2 threads for getting the stdout and stderr from the > script and then forward the output from stdout to downstream operators. In > case of any exceptions to the script (e.g., got killed), the ScriptOperator > got an exception and throw it to upstream operators until MapOperator got it > and call close(abort). Before the ScriptOperator.close() is called the script > output stream can still forward output to downstream operators. We should > terminate it immediately. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened
[ https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910786#action_12910786 ] Joydeep Sen Sarma commented on HIVE-1651: - if a hadoop task is being failed - how is it that any side effect files created by hive code running in that task are getting promoted to the final output? i think the forwarding is a red-herring. we should not commit output files from a failed task. > ScriptOperator should not forward any output to downstream operators if an > exception is happened > > > Key: HIVE-1651 > URL: https://issues.apache.org/jira/browse/HIVE-1651 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1651.patch > > > ScriptOperator spawns 2 threads for getting the stdout and stderr from the > script and then forward the output from stdout to downstream operators. In > case of any exceptions to the script (e.g., got killed), the ScriptOperator > got an exception and throw it to upstream operators until MapOperator got it > and call close(abort). Before the ScriptOperator.close() is called the script > output stream can still forward output to downstream operators. We should > terminate it immediately. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909950#action_12909950 ] Joydeep Sen Sarma commented on HIVE-1570: - sure. confused - because the tests were all passing earlier when i added the minimr tests. > referencing an added file by it's name in a transform script does not work in > hive local mode > - > > Key: HIVE-1570 > URL: https://issues.apache.org/jira/browse/HIVE-1570 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > > Yongqiang tried this and it fails in local mode: > add file ../data/scripts/dumpdata_script.py; > select count(distinct subq.key) from > (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key > = 10) subq; > this needs to be fixed because it means we cannot choose local mode > automatically in case of transform scripts (since different paths need to be > used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1580) cleanup ExecDriver.progress
[ https://issues.apache.org/jira/browse/HIVE-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1580: Attachment: hive-1580.1.patch cleanup multiple calls to getCounters (which turns out to be really expensive call in JT) and don't print non-fatal stack traces to console. > cleanup ExecDriver.progress > --- > > Key: HIVE-1580 > URL: https://issues.apache.org/jira/browse/HIVE-1580 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1580.1.patch > > > a few problems: > - if a job is retired - then counters cannot be obtained and a stack trace is > printed out (from history code). this confuses users > - too many calls to getCounters. after a job has been detected to be finished > - there are quite a few more calls to get the job status and the counters. we > need to figure out a way to curtail this - in busy clusters the gap between > the job getting finished and the hive client noticing is very perceptible and > impacts user experience. > calls to getCounters are very expensive in 0.20 as they grab a jobtracker > global lock (something we have fixed internally at FB) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1602) List Partitioning
[ https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903692#action_12903692 ] Joydeep Sen Sarma commented on HIVE-1602: - yeah. but i have been asking how you are planning to make the grouping of partitioning transparent. to me that sounds like a very risky and big change and there are no details here. why would we do this at hive layer given we have HAR already? i really don't understand why we wouldn't start with hive-1467 and then add HAR as an optimization to reduce number of files for small partitions. this doesn't address the skew case. it doesn't address the fact that we still have to partition by dynamic partitioning columns - and that requires the same partition-only map-reduce operator that 1467 requires. at which point - we can just do 1467. what am i missing? > List Partitioning > - > > Key: HIVE-1602 > URL: https://issues.apache.org/jira/browse/HIVE-1602 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Ning Zhang > > Dynamic partition inserts create partitions bases on the dynamic partition > column values. Currently it creates one partition for each distinct DP column > value. This could result in skews in the created dynamic partitions in that > some partitions are large but there could be large number of small partitions > as well. This results in burdens in HDFS as well as metastore. A list > partitioning scheme that aggregate a number of small partitions into one big > one is more preferable for skewed partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1602) List Partitioning
[ https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903672#action_12903672 ] Joydeep Sen Sarma commented on HIVE-1602: - > combining small partitions into one large partitions seems to be a natural > way. sure - but i am worried that this is a fundamental change to hive's data model and may not be the quickest/safest solution to what is a pretty urgent problem. also - HAR solves the small files packed into big file already. and it doesn't require changes to hive's data model. so in that sense it seems like an easy win. u are still left with the problem of the large partition (skew) problem. this doesn't solve that either (assuming u are using reducers). > How can the user manually cluster event=s, event=m, event=l into one insert overwrite table xxx partition (event_class) select a,b,c,event, case(event when 's' then 'sml' when 'm' then 'sml' when 'l' then 'sml' else 'g') from ... > List Partitioning > - > > Key: HIVE-1602 > URL: https://issues.apache.org/jira/browse/HIVE-1602 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Ning Zhang > > Dynamic partition inserts create partitions bases on the dynamic partition > column values. Currently it creates one partition for each distinct DP column > value. This could result in skews in the created dynamic partitions in that > some partitions are large but there could be large number of small partitions > as well. This results in burdens in HDFS as well as metastore. A list > partitioning scheme that aggregate a number of small partitions into one big > one is more preferable for skewed partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1602) List Partitioning
[ https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903655#action_12903655 ] Joydeep Sen Sarma commented on HIVE-1602: - how will this be made transparent from queryability perspective? i think i still don't understand the details i agree - if the user does it themselves they have to duplicate the column. but this doesn't seem like a big deal to me (we compress everything anyway and partitioning columns are highly compressible since they will be repeated like crazy). my worry is that this change might be a very big one (representing multiple partitions in one storage container). it seems to me a much more fundamental change than just fixing dynamic partitioning. > List Partitioning > - > > Key: HIVE-1602 > URL: https://issues.apache.org/jira/browse/HIVE-1602 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Ning Zhang > > Dynamic partition inserts create partitions bases on the dynamic partition > column values. Currently it creates one partition for each distinct DP column > value. This could result in skews in the created dynamic partitions in that > some partitions are large but there could be large number of small partitions > as well. This results in burdens in HDFS as well as metastore. A list > partitioning scheme that aggregate a number of small partitions into one big > one is more preferable for skewed partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1602) List Partitioning
[ https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903630#action_12903630 ] Joydeep Sen Sarma commented on HIVE-1602: - yikes. how is this queried afterwards? the user can do this by doing the transformation namit listed in the select clause (on the partitioning column). the user can do a one time analysis of the data (for size distribution on different partitioning columns) and then generate the clumping logic manually. because this does not result in queryable data sets - it doesn't seem useful/reusable to me. > List Partitioning > - > > Key: HIVE-1602 > URL: https://issues.apache.org/jira/browse/HIVE-1602 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Ning Zhang > > Dynamic partition inserts create partitions bases on the dynamic partition > column values. Currently it creates one partition for each distinct DP column > value. This could result in skews in the created dynamic partitions in that > some partitions are large but there could be large number of small partitions > as well. This results in burdens in HDFS as well as metastore. A list > partitioning scheme that aggregate a number of small partitions into one big > one is more preferable for skewed partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1602) List Partitioning
[ https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903606#action_12903606 ] Joydeep Sen Sarma commented on HIVE-1602: - hmmm - not sure i understand. how can we collapse partitions? we have to generate one directory per distinct DP column value - no? (or are you thinking of jumping straight to har?) > List Partitioning > - > > Key: HIVE-1602 > URL: https://issues.apache.org/jira/browse/HIVE-1602 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Ning Zhang > > Dynamic partition inserts create partitions bases on the dynamic partition > column values. Currently it creates one partition for each distinct DP column > value. This could result in skews in the created dynamic partitions in that > some partitions are large but there could be large number of small partitions > as well. This results in burdens in HDFS as well as metastore. A list > partitioning scheme that aggregate a number of small partitions into one big > one is more preferable for skewed partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1467) dynamic partitioning should cluster by partitions
[ https://issues.apache.org/jira/browse/HIVE-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903562#action_12903562 ] Joydeep Sen Sarma commented on HIVE-1467: - @Ning - what about skew? > dynamic partitioning should cluster by partitions > - > > Key: HIVE-1467 > URL: https://issues.apache.org/jira/browse/HIVE-1467 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Joydeep Sen Sarma >Assignee: Namit Jain > > (based on internal discussion with Ning). Dynamic partitioning should offer a > mode where it clusters data by partition before writing out to each > partition. This will reduce number of files. Details: > 1. always use reducer stage > 2. mapper sends to reducer based on partitioning column. ie. reducer = > f(partition-cols) > 3. f() can be made somewhat smart to: >a. spread large partitions across multiple reducers - each mapper can > maintain row count seen per partition - and then apply (whenever it sees a > new row for a partition): >* reducer = (row count / 64k) % numReducers >Small partitions always go to one reducer. the larger the partition, > the more the reducers. this prevents one reducer becoming bottleneck writing > out one partition >b. this still leaves the issue of very large number of splits. (64K rows > from 10K mappers is pretty large). for this one can apply one slight > modification: >* reducer = (mapper-id/1024 + row-count/64k) % numReducers >ie. - the first 1000 mappers always send the first 64K rows for one > partition to the same reducer. the next 1000 send it to the next one. and so > on. > the constants 1024 and 64k are used just as an example. i don't know what the > right numbers are. it's also clear that this is a case where we need hadoop > to do only partitioning (and no sorting). this will be a useful feature to > have in hadoop. that will reduce the overhead due to reducers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1601) Hadoop 0.17 ant test broken by HIVE-1523
[ https://issues.apache.org/jira/browse/HIVE-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1601: Attachment: 1601.1.patch ant-contrib-1.0b3.jar - fix the jsp include - don't run minimr in 0.17 - it doesn't work - added ant-contrib jar (attached as a separate file). very useful for writing ant conditions (we simplify a bunch of other stuff with it) > Hadoop 0.17 ant test broken by HIVE-1523 > > > Key: HIVE-1601 > URL: https://issues.apache.org/jira/browse/HIVE-1601 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Affects Versions: 0.7.0 >Reporter: John Sichi >Assignee: Joydeep Sen Sarma > Fix For: 0.7.0 > > Attachments: 1601.1.patch, ant-contrib-1.0b3.jar > > > compile-test: >[javac] /data/users/jsichi/open/hive-trunk/build-common.xml:304: warning: > 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set > to false for repeatable builds >[javac] Compiling 33 source files to > /data/users/jsichi/open/hive-trunk/build/ql/test/classes > BUILD FAILED > /data/users/jsichi/open/hive-trunk/build.xml:168: The following error > occurred while executing this line: > /data/users/jsichi/open/hive-trunk/build.xml:105: The following error > occurred while executing this line: > /data/users/jsichi/open/hive-trunk/build-common.xml:304: > /data/users/jsichi/open/hive-trunk/build/hadoopcore/hadoop-0.17.2.1/lib/jsp-2.1 > does not exist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1601) Hadoop 0.17 ant test broken by HIVE-1523
[ https://issues.apache.org/jira/browse/HIVE-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1601: Status: Patch Available (was: Open) > Hadoop 0.17 ant test broken by HIVE-1523 > > > Key: HIVE-1601 > URL: https://issues.apache.org/jira/browse/HIVE-1601 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Affects Versions: 0.7.0 >Reporter: John Sichi >Assignee: Joydeep Sen Sarma > Fix For: 0.7.0 > > Attachments: 1601.1.patch, ant-contrib-1.0b3.jar > > > compile-test: >[javac] /data/users/jsichi/open/hive-trunk/build-common.xml:304: warning: > 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set > to false for repeatable builds >[javac] Compiling 33 source files to > /data/users/jsichi/open/hive-trunk/build/ql/test/classes > BUILD FAILED > /data/users/jsichi/open/hive-trunk/build.xml:168: The following error > occurred while executing this line: > /data/users/jsichi/open/hive-trunk/build.xml:105: The following error > occurred while executing this line: > /data/users/jsichi/open/hive-trunk/build-common.xml:304: > /data/users/jsichi/open/hive-trunk/build/hadoopcore/hadoop-0.17.2.1/lib/jsp-2.1 > does not exist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1487) parallelize test query runs
[ https://issues.apache.org/jira/browse/HIVE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902977#action_12902977 ] Joydeep Sen Sarma commented on HIVE-1487: - yeah - that would be my gut feel too (just ditch junit) however - we are going to lose the junit style test outputs etc. long time back Ashish did all the velocity stuff to have junit tests. i don't remember the exact thinking at that time - but a majority of people wanted to use junit. threading would actually be good though .. (we have a separate multithreaded test right now that we could happily obsolete) > parallelize test query runs > --- > > Key: HIVE-1487 > URL: https://issues.apache.org/jira/browse/HIVE-1487 > Project: Hadoop Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Joydeep Sen Sarma > > HIVE-1464 speeded up serial runs somewhat - but looks like it's still too > slow. we should use parallel junit or some similar setup to run test queries > in parallel. this should be really easy as well need to just use a separate > warehouse/metadb and potentiall mapred system dir location. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1487) parallelize test query runs
[ https://issues.apache.org/jira/browse/HIVE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902958#action_12902958 ] Joydeep Sen Sarma commented on HIVE-1487: - can people with experience running java tests in parallel comment on this? So far these seem the choices: * upgrade to junit4 and use custom runner that runs in parallel. the downside here is that junit does not seem to come with this parallel runner (but there's additional code on the web from the junit authors that does the same) * use parallel-junit. this seems the least disruptive - but this seems like an old/dead project * use TestNG - this is a replacement for junit that has inbuilt parallel execution support. but we would not be using junit anymore at all. any other thoughts on better test setup welcome as well. > parallelize test query runs > --- > > Key: HIVE-1487 > URL: https://issues.apache.org/jira/browse/HIVE-1487 > Project: Hadoop Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Joydeep Sen Sarma > > HIVE-1464 speeded up serial runs somewhat - but looks like it's still too > slow. we should use parallel junit or some similar setup to run test queries > in parallel. this should be really easy as well need to just use a separate > warehouse/metadb and potentiall mapred system dir location. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902560#action_12902560 ] Joydeep Sen Sarma commented on HIVE-1523: - there's already a jira on running tests in parallel. i think i can cover it there itself. > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch, > hive-1523.4.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1523: Attachment: hive-1523.4.patch - added exclude tests - minimr tests are excluded from regular clientpositive tests - did some subtle changes in how fs.default.name and mapred.job.tracker are specified to allow testing against external hadoop clusters > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch, > hive-1523.4.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1523: Status: Patch Available (was: Open) > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch, > hive-1523.4.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1583) Hive should not override Hadoop specific system properties
[ https://issues.apache.org/jira/browse/HIVE-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902149#action_12902149 ] Joydeep Sen Sarma commented on HIVE-1583: - +1 on HADOOP_CLASSPATH. i am not sure about CLASSPATH. Hadoop itself does not allow users to supply pre-existing value for CLASSPATH. here's snippet from 0.20 conf: # CLASSPATH initially contains $HADOOP_CONF_DIR CLASSPATH="${HADOOP_CONF_DIR}" > Hive should not override Hadoop specific system properties > -- > > Key: HIVE-1583 > URL: https://issues.apache.org/jira/browse/HIVE-1583 > Project: Hadoop Hive > Issue Type: Bug > Components: Configuration >Reporter: Amareshwari Sriramadasu >Assignee: Thiruvel Thirumoolan > Attachments: HIVE-1583.patch > > > Currently Hive overrides Hadoop specific system properties such as > HADOOP_CLASSPATH. > It does the following in bin/hive script : > {code} > # pass classpath to hadoop > export HADOOP_CLASSPATH=${CLASSPATH} > {code} > Instead, It should honor the value of HADOOP_CLASSPATH set by client by > appending CLASSPATH to it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901687#action_12901687 ] Joydeep Sen Sarma commented on HIVE-1523: - can someone review/commit this? i don't think i am going to make more changes to this. will work on long regression framework separately. > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1580) cleanup ExecDriver.progress
cleanup ExecDriver.progress --- Key: HIVE-1580 URL: https://issues.apache.org/jira/browse/HIVE-1580 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma a few problems: - if a job is retired - then counters cannot be obtained and a stack trace is printed out (from history code). this confuses users - too many calls to getCounters. after a job has been detected to be finished - there are quite a few more calls to get the job status and the counters. we need to figure out a way to curtail this - in busy clusters the gap between the job getting finished and the hive client noticing is very perceptible and impacts user experience. calls to getCounters are very expensive in 0.20 as they grab a jobtracker global lock (something we have fixed internally at FB) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures
[ https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900991#action_12900991 ] Joydeep Sen Sarma commented on HIVE-1578: - looks like the CHANGES.txt message of this commit and the merge commit got mixed up > Add conf. property hive.exec.show.job.failure.debug.info to enable/disable > displaying link to the task with most failures > - > > Key: HIVE-1578 > URL: https://issues.apache.org/jira/browse/HIVE-1578 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Paul Yang > Fix For: 0.7.0 > > Attachments: HIVE-1578.1.patch > > > If a job fails, Hive currently displays a link to the task with the most > number of failures for easy access to the error logs. However, generating the > link may require many RPC's to get all the task completion events, adding a > delay of up to 30 minutes. This patch adds a configuration variable to > control whether the link is generated. Turning off this feature would also > disable automatic debugging tips generated by heuristics reading from the > error logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1579) showJobFailDebugInfo fails job if tasktracker does not respond
showJobFailDebugInfo fails job if tasktracker does not respond -- Key: HIVE-1579 URL: https://issues.apache.org/jira/browse/HIVE-1579 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Paul Yang here's the stack trace: java.lang.RuntimeException: Error while reading from task log url at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130) at org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:844) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:624) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:609) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:478) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:356) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:316) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.io.FileNotFoundException: http://hadoop0062.snc3.facebook.com.:50060/tasklog?taskid=attempt_201008191557_26566\ _m_01_3&all=true at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1239) at java.net.URL.openStream(URL.java:1009) at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120) ... 16 more Ended Job = job_201008191557_26566 with exception 'java.lang.RuntimeException(Error while reading from task log url)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask this failed a multi hour script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1574) symlink_text_input_format.q needs fixes for minimr
symlink_text_input_format.q needs fixes for minimr -- Key: HIVE-1574 URL: https://issues.apache.org/jira/browse/HIVE-1574 Project: Hadoop Hive Issue Type: Bug Reporter: Joydeep Sen Sarma fails in minimr. these lines are problematic: dfs -cp ../data/files/symlink1.txt ../build/ql/test/data/warehouse/symlink_text_input_format/symlink1.txt; dfs -cp ../data/files/symlink2.txt ../build/ql/test/data/warehouse/symlink_text_input_format/symlink2.txt; we should just use load commands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1572) skewjoin.q output in minimr differs from local mode
skewjoin.q output in minimr differs from local mode --- Key: HIVE-1572 URL: https://issues.apache.org/jira/browse/HIVE-1572 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma checked in results: POSTHOOK: query: SELECT sum(hash(src1.key)), sum(hash(src1.val)), sum(hash(src2.key)) FROM T1 src1 JOIN T2 src2 ON src1.key+1 = src2.key JOIN T2 src3 ON src2.key = src3.key 370 11003 377 in minimr mode: POSTHOOK: query: SELECT sum(hash(src1.key)), sum(hash(src1.val)), sum(hash(src2.key)) FROM T1 src1 JOIN T2 src2 ON src1.key+1 = src2.key JOIN T2 src3 ON src2.key = src3.key 150 4707 153 it seems that the query is deterministic - so filing a bug. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1569) groupby_bigdata.q fails in minimr mode
[ https://issues.apache.org/jira/browse/HIVE-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900521#action_12900521 ] Joydeep Sen Sarma commented on HIVE-1569: - can u look at scriptfile1.q it works with add file and refers to script by name. it works for both localmode and minimr > groupby_bigdata.q fails in minimr mode > -- > > Key: HIVE-1569 > URL: https://issues.apache.org/jira/browse/HIVE-1569 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Namit Jain >Assignee: He Yongqiang > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900501#action_12900501 ] Joydeep Sen Sarma commented on HIVE-1570: - hmmm - how come scriptfile1.q works then? CREATE TABLE dest1(key INT, value STRING); ADD FILE src/test/scripts/testgrep; FROM ( FROM src SELECT TRANSFORM(src.key, src.value) USING 'testgrep' AS (tkey, tvalue) CLUSTER BY tkey ) tmap INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue; > referencing an added file by it's name in a transform script does not work in > hive local mode > - > > Key: HIVE-1570 > URL: https://issues.apache.org/jira/browse/HIVE-1570 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma > > Yongqiang tried this and it fails in local mode: > add file ../data/scripts/dumpdata_script.py; > select count(distinct subq.key) from > (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key > = 10) subq; > this needs to be fixed because it means we cannot choose local mode > automatically in case of transform scripts (since different paths need to be > used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
referencing an added file by it's name in a transform script does not work in hive local mode - Key: HIVE-1570 URL: https://issues.apache.org/jira/browse/HIVE-1570 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma Yongqiang tried this and it fails in local mode: add file ../data/scripts/dumpdata_script.py; select count(distinct subq.key) from (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq; this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1568) null_column.q fails in minimr mode
[ https://issues.apache.org/jira/browse/HIVE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900490#action_12900490 ] Joydeep Sen Sarma commented on HIVE-1568: - also ppd_multi_insert.q fails with same error. > null_column.q fails in minimr mode > -- > > Key: HIVE-1568 > URL: https://issues.apache.org/jira/browse/HIVE-1568 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Joydeep Sen Sarma > > followup from hive-1523: > ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=null_column.q test > [junit] Begin query: null_column.q > [junit] Exception: Client Execution failed with error code = 9 > [junit] See build/ql/tmp/hive.log, or try "ant test ... > -Dtest.silent=false" to get more logs. > [junit] junit.framework.AssertionFailedError: Client Execution failed > with error code = 9 > [junit] See build/ql/tmp/hive.log, or try "ant test ... > -Dtest.silent=false" to get more logs. > [junit] at junit.framework.Assert.fail(Assert.java:47) > [junit] at > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_null_column(TestCliDriver.java:108) > i lost the hive.log - but it's happening in the MoveTask that corresponds to > this statement: > insert overwrite directory "../build/ql/test/data/warehouse/null_columns.out" > select null, null from temp_null; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1568) null_column.q fails in minimr mode
null_column.q fails in minimr mode -- Key: HIVE-1568 URL: https://issues.apache.org/jira/browse/HIVE-1568 Project: Hadoop Hive Issue Type: Bug Reporter: Joydeep Sen Sarma followup from hive-1523: ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=null_column.q test [junit] Begin query: null_column.q [junit] Exception: Client Execution failed with error code = 9 [junit] See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get more logs. [junit] junit.framework.AssertionFailedError: Client Execution failed with error code = 9 [junit] See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get more logs. [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_null_column(TestCliDriver.java:108) i lost the hive.log - but it's happening in the MoveTask that corresponds to this statement: insert overwrite directory "../build/ql/test/data/warehouse/null_columns.out" select null, null from temp_null; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1562) default hadoop version should be 0.20.2 or newer
[ https://issues.apache.org/jira/browse/HIVE-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1562: Summary: default hadoop version should be 0.20.2 or newer (was: CombineHiveInputFormat issues in minimr mode) Description: The following failure report in CombineHiveInputFormat can be resolved by revising the hadoop-20 version being used: followup from HIVE-1523. This is probably because of CombineHiveInputFormat: ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=sample10.q test insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select *\ from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: type: QUERY 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12 2010-08-18 15:13:54,704 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applicati\ ons should implement Tool for the same. 2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed: java.lang.IllegalArgumentException: Network location name contains /: /default-rack at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75) at org.apache.hadoop.net.NodeBase.(NodeBase.java:57) at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326) at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320) at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) 2010-08-18 15:13:56,566 ERROR exec.MapRedTask (SessionState.java:printError(277)) - Ended Job = job_201008181513_0001 with errors 2010-08-18 15:13:56,597 ERROR ql.Driver (SessionState.java:printError(277)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedT\ ask See also:combine2.q was: followup from HIVE-1523. This is probably because of CombineHiveInputFormat: ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=sample10.q test insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select *\ from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: type: QUERY 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12 2010-08-18 15:13:54,704 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applicati\ ons should implement Tool for the same. 2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed: java.lang.IllegalArgumentException: Network location name contains /: /default-rack at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75) at org.apache.hadoop.net.NodeBase.(NodeBase.java:57) at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326) at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320) at org.apache.hadoop.mapred.JobIn
[jira] Commented: (HIVE-1562) CombineHiveInputFormat issues in minimr mode
[ https://issues.apache.org/jira/browse/HIVE-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900398#action_12900398 ] Joydeep Sen Sarma commented on HIVE-1562: - cool - thanks for the tip. looks like this is fixed in 0.20.2 - so if we can make that the default hadoop version dependency for hive - that would be enough. seems like ivy/maven are not finding 0.20.2 (whereas there are hadoop jiras saying that these versions have been published - so looks like some fix needed in our dependency management stuff) will change title. > CombineHiveInputFormat issues in minimr mode > > > Key: HIVE-1562 > URL: https://issues.apache.org/jira/browse/HIVE-1562 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Joydeep Sen Sarma > > followup from HIVE-1523. This is probably because of CombineHiveInputFormat: > ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=sample10.q test > insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart > where ds is not null and key < 10 > 2010-08-18 15:13:54,378 ERROR SessionState > (SessionState.java:printError(277)) - PREHOOK: query: insert overwrite table > srcpartbucket partition(ds, hr) select *\ > from srcpart where ds is not null and key < 10 > 2010-08-18 15:13:54,379 ERROR SessionState > (SessionState.java:printError(277)) - PREHOOK: type: QUERY > 2010-08-18 15:13:54,379 ERROR SessionState > (SessionState.java:printError(277)) - PREHOOK: Input: > defa...@srcpart@ds=2008-04-08/hr=11 > 2010-08-18 15:13:54,379 ERROR SessionState > (SessionState.java:printError(277)) - PREHOOK: Input: > defa...@srcpart@ds=2008-04-08/hr=12 > 2010-08-18 15:13:54,379 ERROR SessionState > (SessionState.java:printError(277)) - PREHOOK: Input: > defa...@srcpart@ds=2008-04-09/hr=11 > 2010-08-18 15:13:54,379 ERROR SessionState > (SessionState.java:printError(277)) - PREHOOK: Input: > defa...@srcpart@ds=2008-04-09/hr=12 > 2010-08-18 15:13:54,704 WARN mapred.JobClient > (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser > for parsing the arguments. Applicati\ > ons should implement Tool for the same. > 2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener > (EagerTaskInitializationListener.java:run(83)) - Job initialization failed: > java.lang.IllegalArgumentException: Network location name contains /: > /default-rack > at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75) > at org.apache.hadoop.net.NodeBase.(NodeBase.java:57) > at > org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326) > at > org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320) > at > org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343) > at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440) > at > org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) > at java.lang.Thread.run(Thread.java:619) > 2010-08-18 15:13:56,566 ERROR exec.MapRedTask > (SessionState.java:printError(277)) - Ended Job = job_201008181513_0001 with > errors > 2010-08-18 15:13:56,597 ERROR ql.Driver (SessionState.java:printError(277)) - > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.MapRedT\ > ask > See also:combine2.q -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1564) bucketizedhiveinputformat.q fails in minimr mode
[ https://issues.apache.org/jira/browse/HIVE-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900209#action_12900209 ] Joydeep Sen Sarma commented on HIVE-1564: - also groupby_bigdata.q shows same stack trace. > bucketizedhiveinputformat.q fails in minimr mode > > > Key: HIVE-1564 > URL: https://issues.apache.org/jira/browse/HIVE-1564 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Joydeep Sen Sarma > > followup to HIVE-1523: > ant -Dtestcase=TestCliDriver -Dqfile=bucketizedhiveinputformat.q > -Dclustermode=miniMR clean-test test > [junit] Begin query: bucketizedhiveinputformat.q > [junit] Exception: null > [junit] java.lang.AssertionError > [junit] at > org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:788) > [junit] at > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:624) > [junit] at > org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120) > ExecDriver.java:788 > // These tasks should have come from the same job. > > assert(ti.getJobId() == jobId); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1566) diff_part_input_formats.q failure in minimr mode
diff_part_input_formats.q failure in minimr mode Key: HIVE-1566 URL: https://issues.apache.org/jira/browse/HIVE-1566 Project: Hadoop Hive Issue Type: Bug Reporter: Joydeep Sen Sarma followup from HIVE-1523: [junit] Begin query: diff_part_input_formats.q [junit] java.lang.AssertionError [junit] at org.apache.hadoop.hive.ql.io.RCFileOutputFormat.setColumnNumber(RCFileOutputFormat.java:59) [junit] at org.apache.hadoop.hive.ql.io.RCFileOutputFormat.getHiveRecordWriter(RCFileOutputFormat.java:136) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.addInputPath(ExecDriver.java:1165) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.addInputPaths(ExecDriver.java:1222) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:590) public static void setColumnNumber(Configuration conf, int columnNum) { assert columnNum > 0; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1562) CombineHiveInputFormat issues in minimr mode
[ https://issues.apache.org/jira/browse/HIVE-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1562: Summary: CombineHiveInputFormat issues in minimr mode (was: sample10.q fails in minimr mode) Description: followup from HIVE-1523. This is probably because of CombineHiveInputFormat: ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=sample10.q test insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select *\ from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: type: QUERY 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12 2010-08-18 15:13:54,704 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applicati\ ons should implement Tool for the same. 2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed: java.lang.IllegalArgumentException: Network location name contains /: /default-rack at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75) at org.apache.hadoop.net.NodeBase.(NodeBase.java:57) at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326) at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320) at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) 2010-08-18 15:13:56,566 ERROR exec.MapRedTask (SessionState.java:printError(277)) - Ended Job = job_201008181513_0001 with errors 2010-08-18 15:13:56,597 ERROR ql.Driver (SessionState.java:printError(277)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedT\ ask See also:combine2.q was: followup from HIVE-1523. This is probably because of CombineHiveInputFormat: ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select *\ from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: type: QUERY 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12 2010-08-18 15:13:54,704 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applicati\ ons should implement Tool for the same. 2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed: java.lang.IllegalArgumentException: Network location name contains /: /default-rack at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75) at org.apache.hadoop.net.NodeBase.(NodeBase.java:57) at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326) at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320) at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440) at org.apac
[jira] Created: (HIVE-1565) archive.q fails in minimr mode
archive.q fails in minimr mode -- Key: HIVE-1565 URL: https://issues.apache.org/jira/browse/HIVE-1565 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma followup to hive-1523. in this case the results seem different ant -Doffline=true -Dtestcase=TestCliDriver -Dqfile=archive.q -Dclustermode=miniMR clean-test test [junit] Begin query: archive.q [junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1 [junit] See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get more logs. [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive(TestCliDriver.java:128) [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I lastAccessTime -I owner -I \ transient_lastDdlTime -I java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [\ 0-9]* more /data/users/jssarma/hive_testing/build/ql/test/logs/clientpositive/archive.q.out /data/users/jssarma/hive_testin\ g/ql/src/test/results/clientpositive/archive.q.out [junit] 489c489 [junit] < NULL [junit] --- [junit] > 48656137 here's the query with differing output: POSTHOOK: query: SELECT SUM(hash(col)) FROM (SELECT transform(*) using 'tr "\t" "_"' AS col FROM (SELECT * FROM old_name WHERE ds='1') subq1) subq2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1564) bucketizedhiveinputformat.q fails in minimr mode
bucketizedhiveinputformat.q fails in minimr mode Key: HIVE-1564 URL: https://issues.apache.org/jira/browse/HIVE-1564 Project: Hadoop Hive Issue Type: Bug Reporter: Joydeep Sen Sarma followup to HIVE-1523: ant -Dtestcase=TestCliDriver -Dqfile=bucketizedhiveinputformat.q -Dclustermode=miniMR clean-test test [junit] Begin query: bucketizedhiveinputformat.q [junit] Exception: null [junit] java.lang.AssertionError [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:788) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:624) [junit] at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120) ExecDriver.java:788 // These tasks should have come from the same job. assert(ti.getJobId() == jobId); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1523: Attachment: hive-1523.3.patch small change - fix 0.20 version match to pick the right jetty version. > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1523: Attachment: hive-1523.2.patch with modified list of minimr tests: + i took the ones that worked from John's list. also added a couple of tests that had 'add jar' and 'add file' commands (since their interaction with real cluster is quite different). > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch, hive-1523.2.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1562) sample10.q fails in minimr mode
sample10.q fails in minimr mode --- Key: HIVE-1562 URL: https://issues.apache.org/jira/browse/HIVE-1562 Project: Hadoop Hive Issue Type: Bug Reporter: Joydeep Sen Sarma followup from HIVE-1523. This is probably because of CombineHiveInputFormat: ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select *\ from srcpart where ds is not null and key < 10 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: type: QUERY 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11 2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) - PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12 2010-08-18 15:13:54,704 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applicati\ ons should implement Tool for the same. 2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83)) - Job initialization failed: java.lang.IllegalArgumentException: Network location name contains /: /default-rack at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75) at org.apache.hadoop.net.NodeBase.(NodeBase.java:57) at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326) at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320) at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) 2010-08-18 15:13:56,566 ERROR exec.MapRedTask (SessionState.java:printError(277)) - Ended Job = job_201008181513_0001 with errors 2010-08-18 15:13:56,597 ERROR ql.Driver (SessionState.java:printError(277)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedT\ ask -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode
smb_mapjoin_8.q returns different results in miniMr mode Key: HIVE-1561 URL: https://issues.apache.org/jira/browse/HIVE-1561 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma follow on to HIVE-1523: ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join smb_bucket4_2 b on a.key = b.key official results: 4 val_356 NULL NULL NULL NULL 484 val_169 2000 val_169 NULL NULL NULL NULL 3000 val_169 4000 val_125 NULL NULL in minimr mode: 2000 val_169 NULL NULL 4 val_356 NULL NULL 2000 val_169 NULL NULL 4000 val_125 NULL NULL NULL NULL 5000 val_125 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900048#action_12900048 ] Joydeep Sen Sarma commented on HIVE-1523: - i am running through the above qfiles and see what executes successfully on minimr (because many dont). one concern is the length of the tests. i think we need to divide our tests into a short and long regression. otherwise development cycle is severely impacted if everything has to be tested on every iteration. > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1560) binaryoutputformat.q failure in minimr mode
binaryoutputformat.q failure in minimr mode --- Key: HIVE-1560 URL: https://issues.apache.org/jira/browse/HIVE-1560 Project: Hadoop Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Joydeep Sen Sarma this is a followup to HIVE-1523. ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=binary_output_format.q test fails in a significant manner. all the rows are flattened out into one row: ntimeException -I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* more /data/users/jssarma/hive_testing/build/ql/test/logs/clientposit\ ive/binary_output_format.q.out /data/users/jssarma/hive_testing/ql/src/test/results/clientpositive/binary_output_format.q.out [junit] 313c313,812 [junit] < 238 val_23886 val_86311 val_31127 val_27165 val_165409 ... [junit] --- [junit] > 238 val_238 [junit] > 86 val_86 [junit] > 311 val_311 [junit] > 27 val_27 [junit] > 165 val_165 ... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1523: Status: Patch Available (was: Open) Assignee: Joydeep Sen Sarma > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: hive-1523.1.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1523: Attachment: hive-1523.1.patch fixed minimr test mode. enabled a couple of queries to always run (additionally) in minimr mode (like hbase-handler tests) when running standard tests. we should probably expand this to a larger number of queries (especially those requiring multiple reducers). i don't have good insight into this part - if people have ideas - we can expand the list easily. > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma > Attachments: hive-1523.1.patch > > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1293) Concurreny Model for Hive
[ https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899647#action_12899647 ] Joydeep Sen Sarma commented on HIVE-1293: - can u check the getLockObjects() routine. it seemed to me that even u called with partition in X mode - it would add the table to the list of objects to be locked as well (in the same X mode). i think we should, at least, as follow on make the optimization to not lock write entities for the duration of the query. > Concurreny Model for Hive > - > > Key: HIVE-1293 > URL: https://issues.apache.org/jira/browse/HIVE-1293 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, > hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt > > > Concurrency model for Hive: > Currently, hive does not provide a good concurrency model. The only > guanrantee provided in case of concurrent readers and writers is that > reader will not see partial data from the old version (before the write) and > partial data from the new version (after the write). > This has come across as a big problem, specially for background processes > performing maintenance operations. > The following possible solutions come to mind. > 1. Locks: Acquire read/write locks - they can be acquired at the beginning of > the query or the write locks can be delayed till move > task (when the directory is actually moved). Care needs to be taken for > deadlocks. > 2. Versioning: The writer can create a new version if the current version is > being read. Note that, it is not equivalent to snapshots, > the old version can only be accessed by the current readers, and will be > deleted when all of them have finished. > Comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1293) Concurreny Model for Hive
[ https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899347#action_12899347 ] Joydeep Sen Sarma commented on HIVE-1293: - also - i am missing something here: + for (WriteEntity output : plan.getOutputs()) { +lockObjects.addAll(getLockObjects(output.getTable(), output.getPartition(), HiveLockMode.EXCLUSIVE)); + } getLockObjects(): +if (p != null) { ... + locks.add(new LockObject(new HiveLockObject(p.getTable()), mode)); +} doesn't this end up locking the table in exclusive mode if a partition is being written to? (whereas the design talks about locking the table in shared mode only?) > Concurreny Model for Hive > - > > Key: HIVE-1293 > URL: https://issues.apache.org/jira/browse/HIVE-1293 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, > hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt > > > Concurrency model for Hive: > Currently, hive does not provide a good concurrency model. The only > guanrantee provided in case of concurrent readers and writers is that > reader will not see partial data from the old version (before the write) and > partial data from the new version (after the write). > This has come across as a big problem, specially for background processes > performing maintenance operations. > The following possible solutions come to mind. > 1. Locks: Acquire read/write locks - they can be acquired at the beginning of > the query or the write locks can be delayed till move > task (when the directory is actually moved). Care needs to be taken for > deadlocks. > 2. Versioning: The writer can create a new version if the current version is > being read. Note that, it is not equivalent to snapshots, > the old version can only be accessed by the current readers, and will be > deleted when all of them have finished. > Comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1293) Concurreny Model for Hive
[ https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899320#action_12899320 ] Joydeep Sen Sarma commented on HIVE-1293: - a little bummed that locks need to be held for entire query execution. that could mean a writer blocking readers for hours. hive's query plans seem to be of two distinct stages: 1. read a bunch of stuff, compute intermediate/final data 2. move final data into output locations ie. - a single query never reads what it writes (into a final output location). even if #1 and #2 are mingled today - they can easily be put in order. in that sense - we only need to get shared locks for all read entities involved in #1 to begin with. once phase #1 is done, we can drop all the read locks and get the exclusive locks for all the write entities in #2, perform #2 and quit. that way exclusive locks are held for a very short duration. i think this scheme is similarly deadlock free (now there are two independent lock acquire/release phases - and each of them can lock stuff in lex. order). > Concurreny Model for Hive > - > > Key: HIVE-1293 > URL: https://issues.apache.org/jira/browse/HIVE-1293 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.7.0 > > Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, > hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt > > > Concurrency model for Hive: > Currently, hive does not provide a good concurrency model. The only > guanrantee provided in case of concurrent readers and writers is that > reader will not see partial data from the old version (before the write) and > partial data from the new version (after the write). > This has come across as a big problem, specially for background processes > performing maintenance operations. > The following possible solutions come to mind. > 1. Locks: Acquire read/write locks - they can be acquired at the beginning of > the query or the write locks can be delayed till move > task (when the directory is actually moved). Care needs to be taken for > deadlocks. > 2. Versioning: The writer can create a new version if the current version is > being read. Note that, it is not equivalent to snapshots, > the old version can only be accessed by the current readers, and will be > deleted when all of them have finished. > Comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1540) Read-only, columnar data file for nested data structures
[ https://issues.apache.org/jira/browse/HIVE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899199#action_12899199 ] Joydeep Sen Sarma commented on HIVE-1540: - are there a lot of use cases for nested data structures? Google's approach is motivated by widespread use of Protocol Buffers. At Facebook - thrift serialized data sets (that motivated the initial support for nested data types) hasn't taken off. I think what's much more common is json serialized data (or map types more restrictively). it would be much more worthwhile, to begin with, to have optimized codecs and deserializers for map types. > Read-only, columnar data file for nested data structures > > > Key: HIVE-1540 > URL: https://issues.apache.org/jira/browse/HIVE-1540 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Jeff Hammerbacher > > RCFile is a great start on an optimized layout for working with structured > data with Hive. Given that Hive's data model supports nested lists and maps, > and taking inspiration from the recent work by Google on Dremel, it may be > useful for the Hive community to think about how to improve the RCFile format > for nested data structures. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR
[ https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898118#action_12898118 ] Joydeep Sen Sarma commented on HIVE-1530: - don't dis-allow hive.* options not specified in HiveConf. reason is that hive is extensible at various points via custom code and those have access to config object and installs may want to set variables specific to their plugins etc. (we shouldn't be in the business of telling them what not to name them) > Include hive-default.xml and hive-log4j.properties in hive-common JAR > - > > Key: HIVE-1530 > URL: https://issues.apache.org/jira/browse/HIVE-1530 > Project: Hadoop Hive > Issue Type: Improvement > Components: Configuration >Reporter: Carl Steinbach > > hive-common-*.jar should include hive-default.xml and hive-log4j.properties, > and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The > hive-default.xml file that currently sits in the conf/ directory should be > removed. > Motivations for this change: > * We explicitly tell users that they should never modify hive-default.xml yet > give them the opportunity to do so by placing the file in the conf dir. > * Many users are familiar with the Hadoop configuration mechanism that does > not require *-default.xml files to be present in the HADOOP_CONF_DIR, and > assume that the same is true for HIVE_CONF_DIR. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR
[ https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898080#action_12898080 ] Joydeep Sen Sarma commented on HIVE-1530: - ok - that makes sense. leave a hive-site.xml.sample and hive-log4j.properties.example in the conf/. i agree with Ed's point about how difficult it is to figure out hadoop config variables now and hadoop is worse off for it. commands are nice - but having a template is better. it's easy to clone an example file and append/modify the default description to add site specific notes. and one can grep. we could autogenerate the hive-site.xml.sample from config variable metadata in the source code. that would keep us in sync with code. > Include hive-default.xml and hive-log4j.properties in hive-common JAR > - > > Key: HIVE-1530 > URL: https://issues.apache.org/jira/browse/HIVE-1530 > Project: Hadoop Hive > Issue Type: Improvement > Components: Configuration >Reporter: Carl Steinbach > > hive-common-*.jar should include hive-default.xml and hive-log4j.properties, > and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The > hive-default.xml file that currently sits in the conf/ directory should be > removed. > Motivations for this change: > * We explicitly tell users that they should never modify hive-default.xml yet > give them the opportunity to do so by placing the file in the conf dir. > * Many users are familiar with the Hadoop configuration mechanism that does > not require *-default.xml files to be present in the HADOOP_CONF_DIR, and > assume that the same is true for HIVE_CONF_DIR. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR
[ https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897993#action_12897993 ] Joydeep Sen Sarma commented on HIVE-1530: - removing the .xml files makes sense. but users may want to modify the log4.properties files. how would do they do that in the new arrangement? > Include hive-default.xml and hive-log4j.properties in hive-common JAR > - > > Key: HIVE-1530 > URL: https://issues.apache.org/jira/browse/HIVE-1530 > Project: Hadoop Hive > Issue Type: Improvement > Components: Configuration >Reporter: Carl Steinbach > > hive-common-*.jar should include hive-default.xml and hive-log4j.properties, > and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The > hive-default.xml file that currently sits in the conf/ directory should be > removed. > Motivations for this change: > * We explicitly tell users that they should never modify hive-default.xml yet > give them the opportunity to do so by placing the file in the conf dir. > * Many users are familiar with the Hadoop configuration mechanism that does > not require *-default.xml files to be present in the HADOOP_CONF_DIR, and > assume that the same is true for HIVE_CONF_DIR. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1524) parallel execution failed if mapred.job.name is set
[ https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1524: Status: Resolved (was: Patch Available) Resolution: Fixed committed - thanks Ning. > parallel execution failed if mapred.job.name is set > --- > > Key: HIVE-1524 > URL: https://issues.apache.org/jira/browse/HIVE-1524 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1524.2.patch, HIVE-1524.patch > > > The plan file name was generated based on mapred.job.name. If the user > specify mapred.job.name before the query, two parallel queries will have > conflict plan file name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1524) parallel execution failed if mapred.job.name is set
[ https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1524: Affects Version/s: 0.5.0 (was: 0.7.0) > parallel execution failed if mapred.job.name is set > --- > > Key: HIVE-1524 > URL: https://issues.apache.org/jira/browse/HIVE-1524 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1524.2.patch, HIVE-1524.patch > > > The plan file name was generated based on mapred.job.name. If the user > specify mapred.job.name before the query, two parallel queries will have > conflict plan file name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1524) parallel execution failed if mapred.job.name is set
[ https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897037#action_12897037 ] Joydeep Sen Sarma commented on HIVE-1524: - will commit once tests pass. > parallel execution failed if mapred.job.name is set > --- > > Key: HIVE-1524 > URL: https://issues.apache.org/jira/browse/HIVE-1524 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1524.2.patch, HIVE-1524.patch > > > The plan file name was generated based on mapred.job.name. If the user > specify mapred.job.name before the query, two parallel queries will have > conflict plan file name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1524) parallel execution failed if mapred.job.name is set
[ https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896992#action_12896992 ] Joydeep Sen Sarma commented on HIVE-1524: - looks good to me. one comment: getJobID is a very confusing name (sounds like we are getting the hadoop jobid or something like that). it would be nice to make it more explicit (getHiveJobID perhaps?). > parallel execution failed if mapred.job.name is set > --- > > Key: HIVE-1524 > URL: https://issues.apache.org/jira/browse/HIVE-1524 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1524.patch > > > The plan file name was generated based on mapred.job.name. If the user > specify mapred.job.name before the query, two parallel queries will have > conflict plan file name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1523) ql tests no longer work in miniMR mode
ql tests no longer work in miniMR mode -- Key: HIVE-1523 URL: https://issues.apache.org/jira/browse/HIVE-1523 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Reporter: Joydeep Sen Sarma as per title. here's the first exception i see: 2010-08-09 18:05:11,259 ERROR hive.log (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: java.io.FileNotFoun\ dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. 2010-08-09 18:05:11,259 ERROR hive.log (MetaStoreUtils.java:logAndThrowMetaException(746)) - java.io.FileNotFoundException: Fil\ e file:/build/ql/test/data/warehouse/dest_j1 does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode
[ https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1523: Component/s: Query Processor (was: Build Infrastructure) > ql tests no longer work in miniMR mode > -- > > Key: HIVE-1523 > URL: https://issues.apache.org/jira/browse/HIVE-1523 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma > > as per title. here's the first exception i see: > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: > java.io.FileNotFoun\ > dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist. > 2010-08-09 18:05:11,259 ERROR hive.log > (MetaStoreUtils.java:logAndThrowMetaException(746)) - > java.io.FileNotFoundException: Fil\ > e file:/build/ql/test/data/warehouse/dest_j1 does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1521) compiling/testing against custom hadoop tree is broken
compiling/testing against custom hadoop tree is broken -- Key: HIVE-1521 URL: https://issues.apache.org/jira/browse/HIVE-1521 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Reporter: Joydeep Sen Sarma see: http://wiki.apache.org/hadoop/Hive/DeveloperGuide#Advanced_Mode compiling with specific value of hadoop.root no longer works because of the shims stuff. we should deprecate/fix this. it is still _very_ desirably to be able to test against a custom hadoop build (to test hive/hadoop integration). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1520) hive.mapred.local.mem should only be used in case of local mode job submissions
hive.mapred.local.mem should only be used in case of local mode job submissions --- Key: HIVE-1520 URL: https://issues.apache.org/jira/browse/HIVE-1520 Project: Hadoop Hive Issue Type: Bug Reporter: Joydeep Sen Sarma Currently - whenever we submit a map-reduce job via a child jvm process, hive sets HADOOP_HEAPSIZE to hive.mapred.local.mem (thereby limiting the max heap memory of the child jvm). the assumption being that we are submitting a job for local mode execution and different memory limits apply for that. however - one can submit jobs via a child jvm for non local mode execution as well. This is useful, for example, if hive wants to submit jobs via different hadoop clients (for sending jobs to different hadoop clusters). in such case, we can use the 'hive.exec.submitviachild' and 'hadoop.bin.path' to dispatch job via an alternate hadoop client install point. however in such case, we don't need to set HADOOP_HEAPSIZE. all we are using the child jvm is to run the small bit of hive code that submits the job (and not for local mode execution). in this case - we shouldn't be setting the child jvm's memory limit and should leave it to what the parent's value is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1432) Create a test case for case sensitive comparison done during field comparison
[ https://issues.apache.org/jira/browse/HIVE-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896423#action_12896423 ] Joydeep Sen Sarma commented on HIVE-1432: - a few comments: - don't need a custom script to convert \t to ^B. use 'tr' (see archive.q for example) - instead of directly referring to the script path - please use a 'add file ...' command and refer to the script directly by it's name in the transform clause - otherwise i think this test will run only in local mode (and may not be able to pass tests against real/minimr clusters potentially) (I think this is a problem with some other tests as well - but have to start somewhere) - no need for drop tables at the beginning and end of test anymore. the test harness now takes care of this (cleaning up non-src tables before and after test queries) > Create a test case for case sensitive comparison done during field comparison > - > > Key: HIVE-1432 > URL: https://issues.apache.org/jira/browse/HIVE-1432 > Project: Hadoop Hive > Issue Type: Task > Components: Query Processor >Reporter: Arvind Prabhakar >Assignee: Arvind Prabhakar > Fix For: 0.7.0 > > Attachments: HIVE-1432.patch > > > See HIVE-1271. This jira tracks the creation of a test case to test this fix > specifically. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability
[ https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1513: Attachment: hive-1513.3.patch Ning - u are right - i didn't notice the HEAP setting. in the internal tree - the heap setting is done differently (and i thought that the internal tree does not override these scripts). so i have incorporated both the suggestions (don't set opts/heap). also - the build script needed a slight change to make the .template file part of distribution. > hive starter scripts should load admin/user supplied script for > configurability > --- > > Key: HIVE-1513 > URL: https://issues.apache.org/jira/browse/HIVE-1513 > Project: Hadoop Hive > Issue Type: Improvement > Components: CLI >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1513.1.patch, 1513.2.patch, hive-1513.3.patch > > > it's difficult to add environment variables to Hive starter scripts except by > modifying the scripts directly. this is undesirable (since they are source > code). Hive starter scripts should load a admin supplied shell script for > configurability. This would be similar to what hadoop does with hadoop-env.sh -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability
[ https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896231#action_12896231 ] Joydeep Sen Sarma commented on HIVE-1513: - HADOOP_HEAPSIZE: hive-config.sh only supplies a default. if the admin has specified a value in hive-env - it will be used instead. HADOOP_OPTS: seems like it's appending a specific JVM flag. i agree this doesn't make sense (the admin should choose whether they want that flag or not). i will post another patch after taking it out. not sure about whether we should rename the template to .sh. hadoop-20 seems to have template files only. > hive starter scripts should load admin/user supplied script for > configurability > --- > > Key: HIVE-1513 > URL: https://issues.apache.org/jira/browse/HIVE-1513 > Project: Hadoop Hive > Issue Type: Improvement > Components: CLI >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1513.1.patch, 1513.2.patch > > > it's difficult to add environment variables to Hive starter scripts except by > modifying the scripts directly. this is undesirable (since they are source > code). Hive starter scripts should load a admin supplied shell script for > configurability. This would be similar to what hadoop does with hadoop-env.sh -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability
[ https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1513: Attachment: 1513.2.patch forgot to add one file > hive starter scripts should load admin/user supplied script for > configurability > --- > > Key: HIVE-1513 > URL: https://issues.apache.org/jira/browse/HIVE-1513 > Project: Hadoop Hive > Issue Type: Improvement > Components: CLI >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1513.1.patch, 1513.2.patch > > > it's difficult to add environment variables to Hive starter scripts except by > modifying the scripts directly. this is undesirable (since they are source > code). Hive starter scripts should load a admin supplied shell script for > configurability. This would be similar to what hadoop does with hadoop-env.sh -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1516) optimize split sizes automatically taking into account amount of nature of map tasks
optimize split sizes automatically taking into account amount of nature of map tasks Key: HIVE-1516 URL: https://issues.apache.org/jira/browse/HIVE-1516 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Joydeep Sen Sarma two immediate cases come to mind: - pure filter job (ie. no map-side sort required) - full aggregate computations only (like count(1)). in these cases - the amount of data to be sorted is zero or negligible. so mapper parallelism (and split size) should be dictated by the size of the cluster. there's no point running 1 mappers on a 500 node cluster for a pure filter job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability
[ https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1513: Attachment: 1513.1.patch simple change to let hive started script include conf/hive-env.sh. a template is provided as an example. ran all tests on 20 and tested by hand that the inclusion works. > hive starter scripts should load admin/user supplied script for > configurability > --- > > Key: HIVE-1513 > URL: https://issues.apache.org/jira/browse/HIVE-1513 > Project: Hadoop Hive > Issue Type: Improvement > Components: CLI >Reporter: Joydeep Sen Sarma > Attachments: 1513.1.patch > > > it's difficult to add environment variables to Hive starter scripts except by > modifying the scripts directly. this is undesirable (since they are source > code). Hive starter scripts should load a admin supplied shell script for > configurability. This would be similar to what hadoop does with hadoop-env.sh -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability
[ https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1513: Status: Patch Available (was: Open) Assignee: Joydeep Sen Sarma > hive starter scripts should load admin/user supplied script for > configurability > --- > > Key: HIVE-1513 > URL: https://issues.apache.org/jira/browse/HIVE-1513 > Project: Hadoop Hive > Issue Type: Improvement > Components: CLI >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1513.1.patch > > > it's difficult to add environment variables to Hive starter scripts except by > modifying the scripts directly. this is undesirable (since they are source > code). Hive starter scripts should load a admin supplied shell script for > configurability. This would be similar to what hadoop does with hadoop-env.sh -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability
[ https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895959#action_12895959 ] Joydeep Sen Sarma commented on HIVE-1513: - yes - it's possible. however a lot of variables etc. are initialized by the time we get to loading ext/*.sh. for example we allow HADOOP_HEAPSIZE to be specified via env var. but aside from doing an export before launching the hive script, there's no way to configure this externally. the ext/* trick wouldn't work cause it's comes too late. i think this is simple enough - we can just source a conf/hive-env.sh or something of the sort so that admins can provide right values for all these vars based on their requirements via config files. > hive starter scripts should load admin/user supplied script for > configurability > --- > > Key: HIVE-1513 > URL: https://issues.apache.org/jira/browse/HIVE-1513 > Project: Hadoop Hive > Issue Type: Improvement > Components: CLI >Reporter: Joydeep Sen Sarma > > it's difficult to add environment variables to Hive starter scripts except by > modifying the scripts directly. this is undesirable (since they are source > code). Hive starter scripts should load a admin supplied shell script for > configurability. This would be similar to what hadoop does with hadoop-env.sh -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability
hive starter scripts should load admin/user supplied script for configurability --- Key: HIVE-1513 URL: https://issues.apache.org/jira/browse/HIVE-1513 Project: Hadoop Hive Issue Type: Improvement Components: CLI Reporter: Joydeep Sen Sarma it's difficult to add environment variables to Hive starter scripts except by modifying the scripts directly. this is undesirable (since they are source code). Hive starter scripts should load a admin supplied shell script for configurability. This would be similar to what hadoop does with hadoop-env.sh -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1509) Monitor the working set of the number of files
[ https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1509: Affects Version/s: 0.6.0 (was: 0.7.0) > Monitor the working set of the number of files > --- > > Key: HIVE-1509 > URL: https://issues.apache.org/jira/browse/HIVE-1509 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Namit Jain >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, > HIVE-1509.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1509) Monitor the working set of the number of files
[ https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1509: Status: Resolved (was: Patch Available) Fix Version/s: 0.7.0 Resolution: Fixed committed - thanks Ning. it seems that the test problems were likely because there was a problem applying the patch. > Monitor the working set of the number of files > --- > > Key: HIVE-1509 > URL: https://issues.apache.org/jira/browse/HIVE-1509 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, > HIVE-1509.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1509) Monitor the working set of the number of files
[ https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895534#action_12895534 ] Joydeep Sen Sarma commented on HIVE-1509: - strange - let me retry. can u check the patch one last time? (perhaps it's not up to date with contents of ur tree?) > Monitor the working set of the number of files > --- > > Key: HIVE-1509 > URL: https://issues.apache.org/jira/browse/HIVE-1509 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Ning Zhang > Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, > HIVE-1509.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1509) Monitor the working set of the number of files
[ https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895472#action_12895472 ] Joydeep Sen Sarma commented on HIVE-1509: - the test result for dyn_part3.q is not matching the one provided in the patch. it seems that testnegativeclidriver is not executing anything but the first query in the .q file: [junit] diff -a -I file: -I pfile: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I lastAccessTime -I \ owner -I transient_lastDdlTime -I java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -\ I Caused by: -I [.][.][.] [0-9]* more /data/users/jssarma/hive_trunk/build/ql/test/logs/clientnegative/dy\ n_part3.q.out [junit] 9a10,27 [junit] > PREHOOK: query: create table nzhang_part( key string) partitioned by (value string) [junit] > PREHOOK: type: CREATETABLE [junit] > POSTHOOK: query: create table nzhang_part( key string) partitioned by (value string) [junit] > POSTHOOK: type: CREATETABLE [junit] > POSTHOOK: Output: defa...@nzhang_part [junit] > PREHOOK: query: insert overwrite table nzhang_part partition(value) select key, value from \ src [junit] > PREHOOK: type: QUERY [junit] > PREHOOK: Input: defa...@src [junit] > FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask [junit] > PREHOOK: query: create table nzhang_part( key string) partitioned by (value string) [junit] > PREHOOK: type: CREATETABLE [junit] > POSTHOOK: query: create table nzhang_part( key string) partitioned by (value string) [junit] > POSTHOOK: type: CREATETABLE [junit] > POSTHOOK: Output: defa...@nzhang_part [junit] > PREHOOK: query: insert overwrite table nzhang_part partition(value) select key, value from \ src [junit] > PREHOOK: type: QUERY [junit] > PREHOOK: Input: defa...@src [junit] > FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask > Monitor the working set of the number of files > --- > > Key: HIVE-1509 > URL: https://issues.apache.org/jira/browse/HIVE-1509 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Ning Zhang > Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, > HIVE-1509.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1509) Monitor the working set of the number of files
[ https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895411#action_12895411 ] Joydeep Sen Sarma commented on HIVE-1509: - ok - i will run tests on 20 and commit if all clear. > Monitor the working set of the number of files > --- > > Key: HIVE-1509 > URL: https://issues.apache.org/jira/browse/HIVE-1509 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Ning Zhang > Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, > HIVE-1509.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1509) Monitor the working set of the number of files
[ https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895384#action_12895384 ] Joydeep Sen Sarma commented on HIVE-1509: - let me know once the tests pass 0.20 and i can commit. one more question: +MAXCREATEDFILES("hive.exec.max.created.files", 10), i think u may have to append a 'L' to 10 since u are trying to later on do a: + long upperLimit = HiveConf.getLongVar(job, HiveConf.ConfVars.MAXCREATEDFILES); (or switch to using getIntVar). i am a little surprised how this is working because the 10 would be interpreted as Integer, go to the integer constructor which should leave the long default to -1. (or i guess i have forgotten how this works) > Monitor the working set of the number of files > --- > > Key: HIVE-1509 > URL: https://issues.apache.org/jira/browse/HIVE-1509 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Ning Zhang > Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1509) Monitor the working set of the number of files
[ https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895322#action_12895322 ] Joydeep Sen Sarma commented on HIVE-1509: - can u try bucketmapjoin2.q in clientpositive. it's failing for me > Monitor the working set of the number of files > --- > > Key: HIVE-1509 > URL: https://issues.apache.org/jira/browse/HIVE-1509 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Ning Zhang > Attachments: HIVE-1509.2.patch, HIVE-1509.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1509) Monitor the working set of the number of files
[ https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895312#action_12895312 ] Joydeep Sen Sarma commented on HIVE-1509: - couple of comments: - use ProgressCounter.CREATED_FILES directly instead of using valueOf("CREATED_FILES") - can we move the check for total number of created files to inside checkFatalErrors? we are duplicating some code (for example we just fixed a problem where getCounters() can return null and ignoring that inside checkFatal). > Monitor the working set of the number of files > --- > > Key: HIVE-1509 > URL: https://issues.apache.org/jira/browse/HIVE-1509 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Ning Zhang > Attachments: HIVE-1509.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1493) incorrect explanation when local mode not chosen automatically
[ https://issues.apache.org/jira/browse/HIVE-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma resolved HIVE-1493. - Assignee: Joydeep Sen Sarma Resolution: Fixed fixed via HIVE-1422 > incorrect explanation when local mode not chosen automatically > -- > > Key: HIVE-1493 > URL: https://issues.apache.org/jira/browse/HIVE-1493 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma >Priority: Minor > > slipped past in 1408: > // check for max input size > > if (inputSummary.getLength() > maxBytes) > return "Input Size (= " + maxBytes + ") is larger than " + > HiveConf.ConfVars.LOCALMODEMAXBYTES.varname + " (= " + maxBytes + > ")"; > printing same value twice. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null
[ https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1422: Attachment: 1422.2.patch damn - this uncovered a bug in the tests and fixed an unncessary throws declaration. > skip counter update when RunningJob.getCounters() returns null > -- > > Key: HIVE-1422 > URL: https://issues.apache.org/jira/browse/HIVE-1422 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: John Sichi >Assignee: Joydeep Sen Sarma > Fix For: 0.7.0 > > Attachments: 1422.2.patch, 1422.2.patch, HIVE-1422.1.patch > > > Under heavy load circumstances on some Hadoop versions, we may get a NPE from > trying to dereference a null Counters object. I don't have a unit test which > can reproduce it, but here's an example stack from a production cluster we > saw today: > 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 > with exception 'java.lang.NullPointerException(null)' > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999) > at > org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503) > at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390) > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) > at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1507) Supply DriverContext to Hooks
Supply DriverContext to Hooks - Key: HIVE-1507 URL: https://issues.apache.org/jira/browse/HIVE-1507 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma The DriverContext and the Context (linked off the latter) created during query compilation have information that's invaluable to writing hooks. In particular, the Context object has a cache of pathname to file size mappings looked up via hdfs. i would like to get access to this cache (for both reading and writing) in order to write a hook that depends on query size (for the purpose of dispatching it to the right cluster). It's unfortunate we don't have a generic context object for hooks (into which we can add more stuff as needed). This is forcing an unnecessary api enhancement (we should be able to maintain backwards compatibility using reflection though). I think going forward we should have a generic context object with Session and Query related data inside. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null
[ https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1422: Status: Patch Available (was: Reopened) > skip counter update when RunningJob.getCounters() returns null > -- > > Key: HIVE-1422 > URL: https://issues.apache.org/jira/browse/HIVE-1422 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: John Sichi >Assignee: Joydeep Sen Sarma > Fix For: 0.7.0 > > Attachments: 1422.2.patch, HIVE-1422.1.patch > > > Under heavy load circumstances on some Hadoop versions, we may get a NPE from > trying to dereference a null Counters object. I don't have a unit test which > can reproduce it, but here's an example stack from a production cluster we > saw today: > 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 > with exception 'java.lang.NullPointerException(null)' > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999) > at > org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503) > at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390) > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) > at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1430) serializing/deserializing the query plan is useless and expensive
[ https://issues.apache.org/jira/browse/HIVE-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894253#action_12894253 ] Joydeep Sen Sarma commented on HIVE-1430: - hey - so i am a little puzzled by this change: - Zheng had added serialize/deserialize of the plan to make sure it got tested. wasn't a bad idea. - i had added this option to the build file so that serialize would actually run during the tests (and not outside of tests) - note that this property is not set outside of test environment. so serialize/deserialize of plan would not have been happening during regular use of hive so not sure to me why we are making this change (are we concerned about memory/cpu usage during testing)? unless i am missing something major - this has no impact on memory/cpu of regular hive client. > serializing/deserializing the query plan is useless and expensive > - > > Key: HIVE-1430 > URL: https://issues.apache.org/jira/browse/HIVE-1430 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1430.patch > > > We should turn it off by default -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null
[ https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1422: Attachment: 1422.2.patch - cleaned up ExecDriver a bit - removed some dead code, some unnecessary global vars and throw better exception if JT has lost job - fixed HIVE-1493 here as well. it's a one line fix in a printf running tests. > skip counter update when RunningJob.getCounters() returns null > -- > > Key: HIVE-1422 > URL: https://issues.apache.org/jira/browse/HIVE-1422 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: John Sichi >Assignee: Joydeep Sen Sarma > Fix For: 0.7.0 > > Attachments: 1422.2.patch, HIVE-1422.1.patch > > > Under heavy load circumstances on some Hadoop versions, we may get a NPE from > trying to dereference a null Counters object. I don't have a unit test which > can reproduce it, but here's an example stack from a production cluster we > saw today: > 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 > with exception 'java.lang.NullPointerException(null)' > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999) > at > org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503) > at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390) > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) > at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null
[ https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894126#action_12894126 ] Joydeep Sen Sarma commented on HIVE-1422: - more hadoop goriness - i think John your fix was pretty spot on: - there are three levels of job storage: a. fully in memory (can get status and counters) b. partially in memory (a la retired - can get status and not counters) c. on disk (completed jobs) so what is happening is that we are hitting case b. jobstatus is available - but not counters. we should probably anticipate the null jobstatus (which we used to get in 0.17 before b. and c. were available). what is the effect of not having final counter values available in Hive? Local mode also doesn't report counters i think. > skip counter update when RunningJob.getCounters() returns null > -- > > Key: HIVE-1422 > URL: https://issues.apache.org/jira/browse/HIVE-1422 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: John Sichi >Assignee: Joydeep Sen Sarma > Fix For: 0.7.0 > > Attachments: HIVE-1422.1.patch > > > Under heavy load circumstances on some Hadoop versions, we may get a NPE from > trying to dereference a null Counters object. I don't have a unit test which > can reproduce it, but here's an example stack from a production cluster we > saw today: > 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 > with exception 'java.lang.NullPointerException(null)' > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999) > at > org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503) > at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390) > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) > at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null
[ https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893852#action_12893852 ] Joydeep Sen Sarma commented on HIVE-1422: - i looked at the hadoop source for 20 a bit. looks like both getCounters() and getJob() can return null (in case the job cannot be found). on 0.20 - completed jobs are looked up from persistent store - so i think this is pretty hard to happen (if it does - it seems like a hadoop bug). but for 17 (and maybe other versions in between) - we need to guard against these. > skip counter update when RunningJob.getCounters() returns null > -- > > Key: HIVE-1422 > URL: https://issues.apache.org/jira/browse/HIVE-1422 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: John Sichi >Assignee: Joydeep Sen Sarma > Fix For: 0.7.0 > > Attachments: HIVE-1422.1.patch > > > Under heavy load circumstances on some Hadoop versions, we may get a NPE from > trying to dereference a null Counters object. I don't have a unit test which > can reproduce it, but here's an example stack from a production cluster we > saw today: > 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 > with exception 'java.lang.NullPointerException(null)' > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999) > at > org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503) > at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390) > at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) > at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) > at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1493) incorrect explanation when local mode not chosen automatically
incorrect explanation when local mode not chosen automatically -- Key: HIVE-1493 URL: https://issues.apache.org/jira/browse/HIVE-1493 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma Priority: Minor slipped past in 1408: // check for max input size if (inputSummary.getLength() > maxBytes) return "Input Size (= " + maxBytes + ") is larger than " + HiveConf.ConfVars.LOCALMODEMAXBYTES.varname + " (= " + maxBytes + ")"; printing same value twice. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics
[ https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893462#action_12893462 ] Joydeep Sen Sarma commented on HIVE-1408: - Ning - anything else u need from me? i was hoping to get it in before hive-417. otherwise i am sure would have to regenerate/reconcile a ton of stuff > add option to let hive automatically run in local mode based on tunable > heuristics > -- > > Key: HIVE-1408 > URL: https://issues.apache.org/jira/browse/HIVE-1408 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, > 1408.7.patch, hive-1408.6.patch > > > as a followup to HIVE-543 - we should have a simple option (enabled by > default) to let hive run in local mode if possible. > two levels of options are desirable: > 1. hive.exec.mode.local.auto=true/false // control whether local mode is > automatically chosen > 2. Options to control different heuristics, some naiive examples: > hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode > if data > 1G > hive.exec.mode.local.auto.script.enable=true/false // choose if local > mode is enabled for queries with user scripts > this can be implemented as a pre/post execution hook. It makes sense to > provide this as a standard hook in the hive codebase since it's likely to > improve response time for many users (especially for test queries). > the initial proposal is to choose this at a query level and not at per > hive-task (ie. hadoop job) level. per job-level requires more changes to > compilation (to not pre-commit to hdfs or local scratch directories at > compile time). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-417) Implement Indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893455#action_12893455 ] Joydeep Sen Sarma commented on HIVE-417: i am waiting for a commit on hive-1408. that's probably gonna collide. > Implement Indexing in Hive > -- > > Key: HIVE-417 > URL: https://issues.apache.org/jira/browse/HIVE-417 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore, Query Processor >Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 >Reporter: Prasad Chakka >Assignee: He Yongqiang > Fix For: 0.7.0 > > Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, > hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, > hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, > idx2.png, indexing_with_ql_rewrites_trunk_953221.patch > > > Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics
[ https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893314#action_12893314 ] Joydeep Sen Sarma commented on HIVE-1408: - yeah - so the solution is that the mapred.local.dir needs to be set correctly in hive/hadoop client side xml. for our internal install - i will send a diff changing the client side to point to /tmp (instead of having server side config). there's nothing to do on the hive open source version. mapred.local.dir is a client only variable and needs to be set specific to the client side by the admin. basically our internal client side config has a bug :-) > add option to let hive automatically run in local mode based on tunable > heuristics > -- > > Key: HIVE-1408 > URL: https://issues.apache.org/jira/browse/HIVE-1408 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, > 1408.7.patch, hive-1408.6.patch > > > as a followup to HIVE-543 - we should have a simple option (enabled by > default) to let hive run in local mode if possible. > two levels of options are desirable: > 1. hive.exec.mode.local.auto=true/false // control whether local mode is > automatically chosen > 2. Options to control different heuristics, some naiive examples: > hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode > if data > 1G > hive.exec.mode.local.auto.script.enable=true/false // choose if local > mode is enabled for queries with user scripts > this can be implemented as a pre/post execution hook. It makes sense to > provide this as a standard hook in the hive codebase since it's likely to > improve response time for many users (especially for test queries). > the initial proposal is to choose this at a query level and not at per > hive-task (ie. hadoop job) level. per job-level requires more changes to > compilation (to not pre-commit to hdfs or local scratch directories at > compile time). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.