[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-06 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Attachment: 1570.5.patch

also adding trivial patch for HIVE-1473. filed separate patches for 1473 and 
1520 as well - but folded in everything here.

> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1570.1.patch, 1570.2.patch, 1570.3.patch, 1570.4.patch, 
> 1570.5.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1520) hive.mapred.local.mem should only be used in case of local mode job submissions

2010-10-06 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1520:


Attachment: 1520.1.patch

> hive.mapred.local.mem should only be used in case of local mode job 
> submissions
> ---
>
> Key: HIVE-1520
> URL: https://issues.apache.org/jira/browse/HIVE-1520
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
> Attachments: 1520.1.patch
>
>
> Currently - whenever we submit a map-reduce job via a child jvm process, hive 
> sets HADOOP_HEAPSIZE to hive.mapred.local.mem (thereby limiting the max heap 
> memory of the child jvm). the assumption being that we are submitting a job 
> for local mode execution and different memory limits apply for that.
> however - one can submit jobs via a child jvm for non local mode execution as 
> well. This is useful, for example, if hive wants to submit jobs via different 
> hadoop clients (for sending jobs to different hadoop clusters). in such case, 
> we can use the 'hive.exec.submitviachild' and 'hadoop.bin.path' to dispatch 
> job via an alternate hadoop client install point. however in such case, we 
> don't need to set HADOOP_HEAPSIZE. all we are using the child jvm is to run 
> the small bit of hive code that submits the job (and not for local mode 
> execution).
> in this case - we shouldn't be setting the child jvm's memory limit and 
> should leave it to what the parent's value is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1473) plan file should have a high replication factor

2010-10-06 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1473:


Attachment: 1473.1.patch

> plan file should have a high replication factor
> ---
>
> Key: HIVE-1473
> URL: https://issues.apache.org/jira/browse/HIVE-1473
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Joydeep Sen Sarma
>Priority: Minor
> Attachments: 1473.1.patch
>
>
> it should be set to 10 or something like that (just like job.xml). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-06 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Attachment: 1570.4.patch

added fix for hive-1520 - don't reset HADOOP_HEAPSIZE unless the child jvm is 
being launched for local mode execution.

it's a one liner - simpler to get it all in in one shot.

> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1570.1.patch, 1570.2.patch, 1570.3.patch, 1570.4.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

2010-10-06 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918611#action_12918611
 ] 

Joydeep Sen Sarma commented on HIVE-1620:
-

if we are overwriting a table ('insert overwrite table') - do we make sure that 
if the query/job fails in between - then some of the files (from maps/reduces 
that did succeed) are not left in the table's directory?

> Patch to write directly to S3 from Hive
> ---
>
> Key: HIVE-1620
> URL: https://issues.apache.org/jira/browse/HIVE-1620
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1620.patch
>
>
> We want to submit a patch to Hive which allows user to write files directly 
> to S3.
> This patch allow user to specify an S3 location as the table output location 
> and hence eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1675) SAXParseException on plan.xml during local mode.

2010-10-06 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918609#action_12918609
 ] 

Joydeep Sen Sarma commented on HIVE-1675:
-

what version of hadoop is this happening against?

> SAXParseException on plan.xml during local mode.
> 
>
> Key: HIVE-1675
> URL: https://issues.apache.org/jira/browse/HIVE-1675
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Bennie Schut
> Attachments: local_10005_plan.xml, local_10006_plan.xml
>
>
> When hive switches to local mode (hive.exec.mode.local.auto=true) I receive a 
> sax parser exception on the plan.xml
> If I set hive.exec.mode.local.auto=false I get the correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job

2010-10-06 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918607#action_12918607
 ] 

Joydeep Sen Sarma commented on HIVE-1695:
-

i think there's already a jira for this

> MapJoin followed by ReduceSink should be done as single MapReduce Job
> -
>
> Key: HIVE-1695
> URL: https://issues.apache.org/jira/browse/HIVE-1695
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map 
> only job followed by a Map-Reduce job. It can be combined into single 
> MapReduce Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1685) scriptfile1.1 in minimr faling intermittently

2010-10-05 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma resolved HIVE-1685.
-

Resolution: Duplicate

the test and the output are not in sync. it should fail everytime.

fixing as part of  hive-1570 - it's a small change.

> scriptfile1.1 in minimr faling intermittently
> -
>
> Key: HIVE-1685
> URL: https://issues.apache.org/jira/browse/HIVE-1685
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Joydeep Sen Sarma
>
>  [junit] Begin query: scriptfile1.q
> [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
> lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime 
> -I Location -I transient_lastDdlTime -I last_modified_ -I 
> java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I 
> Caused by: -I [.][.][.] [0-9]* more 
> /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/scriptfile1.q.out
>  
> /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/scriptfile1.q.out
> [junit] 1c1
> [junit] < PREHOOK: query: CREATE TABLE scriptfile1_dest1(key INT, value 
> STRING)
> [junit] ---
> [junit] > PREHOOK: query: CREATE TABLE dest1(key INT, value STRING)
> [junit] 3c3
> [junit] < POSTHOOK: query: CREATE TABLE scriptfile1_dest1(key INT, value 
> STRING)
> [junit] ---
> [junit] > POSTHOOK: query: CREATE TABLE dest1(key INT, value STRING)
> [junit] 5c5
> [junit] < POSTHOOK: Output: defa...@scriptfile1_dest1
> [junit] ---
> [junit] > POSTHOOK: Output: defa...@dest1
> [junit] 12c12
> [junit] < INSERT OVERWRITE TABLE scriptfile1_dest1 SELECT tmap.tkey, 
> tmap.tvalue
> [junit] ---
> [junit] junit.framework.AssertionFailedError: Client execution results 
> failed with error code = 1
> [junit] > INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue
> [junit] See build/ql/tmp/hive.log, or try "ant test ... 
> -Dtest.silent=false" to get more logs.
> [junit] 15c15
> [junit]   at junit.framework.Assert.fail(Assert.java:47)
> [junit] < PREHOOK: Output: defa...@scriptfile1_dest1
> [junit]   at 
> org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_scriptfile1(TestMinimrCliDriver.java:522)
> [junit] ---
> [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit] > PREHOOK: Output: defa...@dest1
> [junit]   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> [junit] 22c22
> [junit]   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> [junit] < INSERT OVERWRITE TABLE scriptfile1_dest1 SELECT tmap.tkey, 
> tmap.tvalue
> [junit]   at java.lang.reflect.Method.invoke(Method.java:597)
> [junit] ---
> [junit]   at junit.framework.TestCase.runTest(TestCase.java:154)
> [junit] > INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue
> [junit]   at junit.framework.TestCase.runBare(TestCase.java:127)
> [junit] 25,28c25,28
> [junit]   at junit.framework.TestResult$1.protect(TestResult.java:106)
> [junit] < POSTHOOK: Output: defa...@scriptfile1_dest1
> [junit]   at junit.framework.TestResult.runProtected(TestResult.java:124)
> [junit] < POSTHOOK: Lineage: scriptfile1_dest1.key SCRIPT 
> [(src)src.FieldSchema(name:key, type:string, comment:default), 
> (src)src.FieldSchema(name:value, type:string, comment:default), ]
> [junit]   at junit.framework.TestResult.run(TestResult.java:109)
> [junit]   at junit.framework.TestCase.run(TestCase.java:118)
> [junit] < POSTHOOK: Lineage: scriptfile1_dest1.value SCRIPT 
> [(src)src.FieldSchema(name:key, type:string, comment:default), 
> (src)src.FieldSchema(name:value, type:string, comment:default), ]
> [junit]   at junit.framework.TestSuite.runTest(TestSuite.java:208)
> [junit] < PREHOOK: query: SELECT scriptfile1_dest1.* FROM 
> scriptfile1_dest1
> [junit]   at junit.framework.TestSuite.run(TestSuite.java:203)
> [junit] ---
> [junit]   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
> [junit] > POSTHOOK: Output: defa...@dest1
> [junit]   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
> [junit] > POSTHOOK: Lineage: dest1.key SCRIPT 
> [(src)src.FieldSchema(name:key, type:string, comment:default), 
> (src)src.FieldSchema(name:value, type:string, comment:default), ]
> [junit]   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> [junit] > POSTHOOK: Lineage: dest1.value SCRIPT 
> [(src)src.FieldSchema(name:key, type:str

[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-05 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Attachment: 1570.3.patch

added a console output for local mapred jobs containing location of execution 
log for debugging.

> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1570.1.patch, 1570.2.patch, 1570.3.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-04 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Status: Patch Available  (was: Open)

> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1570.1.patch, 1570.2.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-04 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Attachment: 1570.2.patch

working patch. no need for new test. had to modify some other tests to use 'add 
file'.

> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1570.1.patch, 1570.2.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-04 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Attachment: 1570.1.patch

before running a map-reduce job in local mode we:
1. set a new working directory
2. symlink all added files from that working directory

this is pretty much identical to how hadoop sets up task execution environment. 
all references to scripts and add files using their names only now resolve 
correctly in local mode.

there was some hacky code in SemanticAnalyzer.java to deal with this that 
doesn't work in all cases (when referenced file is not the first item in 
command line or in automatic local mode). i have deleted it.

duplicated one of the tests so that we get coverage against a real cluster 
(scriptfile1.q executed against minimr) and local mode (scriptfile2.q).

still running tests.

> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1570.1.patch
>
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910837#action_12910837
 ] 

Joydeep Sen Sarma commented on HIVE-1651:
-

yeah - but then the directory itself should be created as a tmp directory. and 
we should promote the directory to it's final name only when closing 
successfully.

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910786#action_12910786
 ] 

Joydeep Sen Sarma commented on HIVE-1651:
-

if a hadoop task is being failed - how is it that any side effect files created 
by hive code running in that task are getting promoted to the final output?

i think the forwarding is a red-herring. we should not commit output files from 
a failed task.

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-09-15 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909950#action_12909950
 ] 

Joydeep Sen Sarma commented on HIVE-1570:
-

sure. confused - because the tests were all passing earlier when i added the 
minimr tests.

> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1580) cleanup ExecDriver.progress

2010-09-02 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1580:


Attachment: hive-1580.1.patch

cleanup multiple calls to getCounters (which turns out to be really expensive 
call in JT) and don't print non-fatal stack traces to console.

> cleanup ExecDriver.progress
> ---
>
> Key: HIVE-1580
> URL: https://issues.apache.org/jira/browse/HIVE-1580
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1580.1.patch
>
>
> a few problems:
> - if a job is retired - then counters cannot be obtained and a stack trace is 
> printed out (from history code). this confuses users
> - too many calls to getCounters. after a job has been detected to be finished 
> - there are quite a few more calls to get the job status and the counters. we 
> need to figure out a way to curtail this - in busy clusters the gap between 
> the job getting finished and the hive client noticing is very perceptible and 
> impacts user experience.
> calls to getCounters are very expensive in 0.20 as they grab a jobtracker 
> global lock (something we have fixed internally at FB)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1602) List Partitioning

2010-08-27 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903692#action_12903692
 ] 

Joydeep Sen Sarma commented on HIVE-1602:
-

yeah. but i have been asking how you are planning to make the grouping of 
partitioning transparent. to me that sounds like a very risky and big change 
and there are no details here.

why would we do this at hive layer given we have HAR already?

i really don't understand why we wouldn't start with hive-1467 and then add HAR 
as an optimization to reduce number of files for small partitions. this doesn't 
address the skew case. it doesn't address the fact that we still have to 
partition by dynamic partitioning columns - and that requires the same 
partition-only map-reduce operator that 1467 requires. at which point - we can 
just do 1467.

what am i missing?

> List Partitioning
> -
>
> Key: HIVE-1602
> URL: https://issues.apache.org/jira/browse/HIVE-1602
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>
> Dynamic partition inserts create partitions bases on the dynamic partition 
> column values. Currently it creates one partition for each distinct DP column 
> value. This could result in skews in the created dynamic partitions in that 
> some partitions are large but there could be large number of small partitions 
> as well. This results in burdens in HDFS as well as metastore. A list 
> partitioning scheme that aggregate a number of small partitions into one big 
> one is more preferable for skewed partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1602) List Partitioning

2010-08-27 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903672#action_12903672
 ] 

Joydeep Sen Sarma commented on HIVE-1602:
-

> combining small partitions into one large partitions seems to be a natural 
> way.

sure - but i am worried that this is a fundamental change to hive's data model 
and may not be the quickest/safest solution to what is a pretty urgent problem.

also - HAR solves the small files packed into big file already. and it doesn't 
require changes to hive's data model. so in that sense it seems like an easy 
win.

u are still left with the problem of the large partition (skew) problem. this 
doesn't solve that either (assuming u are using reducers).

>  How can the user manually cluster event=s, event=m, event=l into one

insert overwrite table xxx partition (event_class) select a,b,c,event, 
case(event when 's' then 'sml' when 'm' then 'sml' when 'l' then 'sml' else 
'g') from ...

> List Partitioning
> -
>
> Key: HIVE-1602
> URL: https://issues.apache.org/jira/browse/HIVE-1602
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>
> Dynamic partition inserts create partitions bases on the dynamic partition 
> column values. Currently it creates one partition for each distinct DP column 
> value. This could result in skews in the created dynamic partitions in that 
> some partitions are large but there could be large number of small partitions 
> as well. This results in burdens in HDFS as well as metastore. A list 
> partitioning scheme that aggregate a number of small partitions into one big 
> one is more preferable for skewed partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1602) List Partitioning

2010-08-27 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903655#action_12903655
 ] 

Joydeep Sen Sarma commented on HIVE-1602:
-

how will this be made transparent from queryability perspective? i think i 
still don't understand the details

i agree - if the user does it themselves they have to duplicate the column. but 
this doesn't seem like a big deal to me (we compress everything anyway and 
partitioning columns are highly compressible since they will be repeated like 
crazy).

my worry is that this change might be a very big one (representing multiple 
partitions in one storage container). it seems to me a much more fundamental 
change than just fixing dynamic partitioning.



> List Partitioning
> -
>
> Key: HIVE-1602
> URL: https://issues.apache.org/jira/browse/HIVE-1602
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>
> Dynamic partition inserts create partitions bases on the dynamic partition 
> column values. Currently it creates one partition for each distinct DP column 
> value. This could result in skews in the created dynamic partitions in that 
> some partitions are large but there could be large number of small partitions 
> as well. This results in burdens in HDFS as well as metastore. A list 
> partitioning scheme that aggregate a number of small partitions into one big 
> one is more preferable for skewed partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1602) List Partitioning

2010-08-27 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903630#action_12903630
 ] 

Joydeep Sen Sarma commented on HIVE-1602:
-

yikes. how is this queried afterwards?

the user can do this by doing the transformation namit listed in the select 
clause (on the partitioning column). the user can do a one time analysis of the 
data (for size distribution on different partitioning columns) and then 
generate the clumping logic manually.

because this does not result in queryable data sets - it doesn't seem 
useful/reusable to me.

> List Partitioning
> -
>
> Key: HIVE-1602
> URL: https://issues.apache.org/jira/browse/HIVE-1602
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>
> Dynamic partition inserts create partitions bases on the dynamic partition 
> column values. Currently it creates one partition for each distinct DP column 
> value. This could result in skews in the created dynamic partitions in that 
> some partitions are large but there could be large number of small partitions 
> as well. This results in burdens in HDFS as well as metastore. A list 
> partitioning scheme that aggregate a number of small partitions into one big 
> one is more preferable for skewed partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1602) List Partitioning

2010-08-27 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903606#action_12903606
 ] 

Joydeep Sen Sarma commented on HIVE-1602:
-

hmmm - not sure i understand. how can we collapse partitions? we have to 
generate one directory per distinct DP column value - no?

(or are you thinking of jumping straight to har?)

> List Partitioning
> -
>
> Key: HIVE-1602
> URL: https://issues.apache.org/jira/browse/HIVE-1602
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>
> Dynamic partition inserts create partitions bases on the dynamic partition 
> column values. Currently it creates one partition for each distinct DP column 
> value. This could result in skews in the created dynamic partitions in that 
> some partitions are large but there could be large number of small partitions 
> as well. This results in burdens in HDFS as well as metastore. A list 
> partitioning scheme that aggregate a number of small partitions into one big 
> one is more preferable for skewed partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1467) dynamic partitioning should cluster by partitions

2010-08-27 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903562#action_12903562
 ] 

Joydeep Sen Sarma commented on HIVE-1467:
-

@Ning - what about skew?

> dynamic partitioning should cluster by partitions
> -
>
> Key: HIVE-1467
> URL: https://issues.apache.org/jira/browse/HIVE-1467
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>Assignee: Namit Jain
>
> (based on internal discussion with Ning). Dynamic partitioning should offer a 
> mode where it clusters data by partition before writing out to each 
> partition. This will reduce number of files. Details:
> 1. always use reducer stage
> 2. mapper sends to reducer based on partitioning column. ie. reducer = 
> f(partition-cols)
> 3. f() can be made somewhat smart to:
>a. spread large partitions across multiple reducers - each mapper can 
> maintain row count seen per partition - and then apply (whenever it sees a 
> new row for a partition): 
>* reducer = (row count / 64k) % numReducers 
>Small partitions always go to one reducer. the larger the partition, 
> the more the reducers. this prevents one reducer becoming bottleneck writing 
> out one partition
>b. this still leaves the issue of very large number of splits. (64K rows 
> from 10K mappers is pretty large). for this one can apply one slight 
> modification:
>* reducer = (mapper-id/1024 + row-count/64k) % numReducers
>ie. - the first 1000 mappers always send the first 64K rows for one 
> partition to the same reducer. the next 1000 send it to the next one. and so 
> on.
> the constants 1024 and 64k are used just as an example. i don't know what the 
> right numbers are. it's also clear that this is a case where we need hadoop 
> to do only partitioning (and no sorting). this will be a useful feature to 
> have in hadoop. that will reduce the overhead due to reducers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1601) Hadoop 0.17 ant test broken by HIVE-1523

2010-08-27 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1601:


Attachment: 1601.1.patch
ant-contrib-1.0b3.jar

- fix the jsp include
- don't run minimr in 0.17 - it doesn't work
- added ant-contrib jar (attached as a separate file). very useful for writing 
ant conditions (we simplify a bunch of other stuff with it)

> Hadoop 0.17 ant test broken by HIVE-1523
> 
>
> Key: HIVE-1601
> URL: https://issues.apache.org/jira/browse/HIVE-1601
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Joydeep Sen Sarma
> Fix For: 0.7.0
>
> Attachments: 1601.1.patch, ant-contrib-1.0b3.jar
>
>
> compile-test:
>[javac] /data/users/jsichi/open/hive-trunk/build-common.xml:304: warning: 
> 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set 
> to false for repeatable builds
>[javac] Compiling 33 source files to 
> /data/users/jsichi/open/hive-trunk/build/ql/test/classes
> BUILD FAILED
> /data/users/jsichi/open/hive-trunk/build.xml:168: The following error 
> occurred while executing this line:
> /data/users/jsichi/open/hive-trunk/build.xml:105: The following error 
> occurred while executing this line:
> /data/users/jsichi/open/hive-trunk/build-common.xml:304: 
> /data/users/jsichi/open/hive-trunk/build/hadoopcore/hadoop-0.17.2.1/lib/jsp-2.1
>  does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1601) Hadoop 0.17 ant test broken by HIVE-1523

2010-08-27 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1601:


Status: Patch Available  (was: Open)

> Hadoop 0.17 ant test broken by HIVE-1523
> 
>
> Key: HIVE-1601
> URL: https://issues.apache.org/jira/browse/HIVE-1601
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Joydeep Sen Sarma
> Fix For: 0.7.0
>
> Attachments: 1601.1.patch, ant-contrib-1.0b3.jar
>
>
> compile-test:
>[javac] /data/users/jsichi/open/hive-trunk/build-common.xml:304: warning: 
> 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set 
> to false for repeatable builds
>[javac] Compiling 33 source files to 
> /data/users/jsichi/open/hive-trunk/build/ql/test/classes
> BUILD FAILED
> /data/users/jsichi/open/hive-trunk/build.xml:168: The following error 
> occurred while executing this line:
> /data/users/jsichi/open/hive-trunk/build.xml:105: The following error 
> occurred while executing this line:
> /data/users/jsichi/open/hive-trunk/build-common.xml:304: 
> /data/users/jsichi/open/hive-trunk/build/hadoopcore/hadoop-0.17.2.1/lib/jsp-2.1
>  does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1487) parallelize test query runs

2010-08-26 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902977#action_12902977
 ] 

Joydeep Sen Sarma commented on HIVE-1487:
-

yeah - that would be my gut feel too (just ditch junit)

however - we are going to lose the junit style test outputs etc. long time back 
Ashish did all the velocity stuff to have junit tests. i don't remember the 
exact thinking at that time - but a majority of people wanted to use junit.

threading would actually be good though .. (we have a separate multithreaded 
test right now that we could happily obsolete)

> parallelize test query runs
> ---
>
> Key: HIVE-1487
> URL: https://issues.apache.org/jira/browse/HIVE-1487
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Joydeep Sen Sarma
>
> HIVE-1464 speeded up serial runs somewhat - but looks like it's still too 
> slow. we should use parallel junit or some similar setup to run test queries 
> in parallel. this should be really easy as well need to just use a separate 
> warehouse/metadb and potentiall mapred system dir location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1487) parallelize test query runs

2010-08-26 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902958#action_12902958
 ] 

Joydeep Sen Sarma commented on HIVE-1487:
-

can people with experience running java tests in parallel comment on this? So 
far these seem the choices:

* upgrade to junit4 and use custom runner that runs in parallel. the downside 
here is that junit does not seem to come with this parallel runner (but there's 
additional code on the web from the junit authors that does the same)

* use parallel-junit. this seems the least disruptive - but this seems like an 
old/dead project

* use TestNG - this is a replacement for junit that has inbuilt parallel 
execution support. but we would not be using junit anymore at all.

any other thoughts on better test setup welcome as well.

> parallelize test query runs
> ---
>
> Key: HIVE-1487
> URL: https://issues.apache.org/jira/browse/HIVE-1487
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Joydeep Sen Sarma
>
> HIVE-1464 speeded up serial runs somewhat - but looks like it's still too 
> slow. we should use parallel junit or some similar setup to run test queries 
> in parallel. this should be really easy as well need to just use a separate 
> warehouse/metadb and potentiall mapred system dir location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-25 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902560#action_12902560
 ] 

Joydeep Sen Sarma commented on HIVE-1523:
-

there's already a jira on running tests in parallel. i think i can cover it 
there itself.

> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch, 
> hive-1523.4.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-25 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1523:


Attachment: hive-1523.4.patch

- added exclude tests - minimr tests are excluded from regular clientpositive 
tests
- did some subtle changes in how fs.default.name and mapred.job.tracker are 
specified to allow testing against external hadoop clusters

> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch, 
> hive-1523.4.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-25 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1523:


Status: Patch Available  (was: Open)

> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch, 
> hive-1523.4.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1583) Hive should not override Hadoop specific system properties

2010-08-24 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902149#action_12902149
 ] 

Joydeep Sen Sarma commented on HIVE-1583:
-

+1 on HADOOP_CLASSPATH.

i am not sure about CLASSPATH. Hadoop itself does not allow users to supply 
pre-existing value for CLASSPATH. here's snippet from 0.20 conf:

# CLASSPATH initially contains $HADOOP_CONF_DIR 
   
CLASSPATH="${HADOOP_CONF_DIR}"



> Hive should not override Hadoop specific system properties
> --
>
> Key: HIVE-1583
> URL: https://issues.apache.org/jira/browse/HIVE-1583
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Amareshwari Sriramadasu
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-1583.patch
>
>
> Currently Hive overrides Hadoop specific system properties such as 
> HADOOP_CLASSPATH.
> It does the following in bin/hive script :
> {code}
> # pass classpath to hadoop
> export HADOOP_CLASSPATH=${CLASSPATH}
> {code}
> Instead, It should honor the value of HADOOP_CLASSPATH set by client by 
> appending CLASSPATH to it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-23 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901687#action_12901687
 ] 

Joydeep Sen Sarma commented on HIVE-1523:
-

can someone review/commit this? i don't think i am going to make more changes 
to this.

will work on long regression framework separately.

> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1580) cleanup ExecDriver.progress

2010-08-21 Thread Joydeep Sen Sarma (JIRA)
cleanup ExecDriver.progress
---

 Key: HIVE-1580
 URL: https://issues.apache.org/jira/browse/HIVE-1580
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma


a few problems:

- if a job is retired - then counters cannot be obtained and a stack trace is 
printed out (from history code). this confuses users
- too many calls to getCounters. after a job has been detected to be finished - 
there are quite a few more calls to get the job status and the counters. we 
need to figure out a way to curtail this - in busy clusters the gap between the 
job getting finished and the hive client noticing is very perceptible and 
impacts user experience.

calls to getCounters are very expensive in 0.20 as they grab a jobtracker 
global lock (something we have fixed internally at FB)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1578) Add conf. property hive.exec.show.job.failure.debug.info to enable/disable displaying link to the task with most failures

2010-08-21 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900991#action_12900991
 ] 

Joydeep Sen Sarma commented on HIVE-1578:
-

looks like the CHANGES.txt message of this commit and the merge commit got 
mixed up

> Add conf. property hive.exec.show.job.failure.debug.info to enable/disable 
> displaying link to the task with most failures
> -
>
> Key: HIVE-1578
> URL: https://issues.apache.org/jira/browse/HIVE-1578
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1578.1.patch
>
>
> If a job fails, Hive currently displays a link to the task with the most 
> number of failures for easy access to the error logs. However, generating the 
> link may require many RPC's to get all the task completion events, adding a 
> delay of up to 30 minutes. This patch adds a configuration variable to 
> control whether the link is generated. Turning off this feature would also 
> disable automatic debugging tips generated by heuristics reading from the 
> error logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1579) showJobFailDebugInfo fails job if tasktracker does not respond

2010-08-21 Thread Joydeep Sen Sarma (JIRA)
showJobFailDebugInfo fails job if tasktracker does not respond
--

 Key: HIVE-1579
 URL: https://issues.apache.org/jira/browse/HIVE-1579
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Paul Yang


here's the stack trace:

java.lang.RuntimeException: Error while reading from task log url
  at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
  at 
org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:844)
  at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:624)
  at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:609)
  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:478)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:356)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:199)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:316)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.FileNotFoundException: 
http://hadoop0062.snc3.facebook.com.:50060/tasklog?taskid=attempt_201008191557_26566\
_m_01_3&all=true
  at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1239)
  at java.net.URL.openStream(URL.java:1009)
  at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
  ... 16 more
Ended Job = job_201008191557_26566 with exception 
'java.lang.RuntimeException(Error while reading from task log url)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MapRedTask

this failed a multi hour script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1574) symlink_text_input_format.q needs fixes for minimr

2010-08-19 Thread Joydeep Sen Sarma (JIRA)
symlink_text_input_format.q needs fixes for minimr
--

 Key: HIVE-1574
 URL: https://issues.apache.org/jira/browse/HIVE-1574
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma


fails in minimr. these lines are problematic:


dfs -cp ../data/files/symlink1.txt 
../build/ql/test/data/warehouse/symlink_text_input_format/symlink1.txt;
dfs -cp ../data/files/symlink2.txt 
../build/ql/test/data/warehouse/symlink_text_input_format/symlink2.txt;


we should just use load commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1572) skewjoin.q output in minimr differs from local mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)
skewjoin.q output in minimr differs from local mode
---

 Key: HIVE-1572
 URL: https://issues.apache.org/jira/browse/HIVE-1572
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma


checked in results:

POSTHOOK: query: SELECT sum(hash(src1.key)), sum(hash(src1.val)), 
sum(hash(src2.key)) FROM T1 src1 JOIN T2 src2 ON src1.key+1 = src2.key JOIN T2 
src3 ON src2.key = src3.key
370 11003 377


in minimr mode:
POSTHOOK: query: SELECT sum(hash(src1.key)), sum(hash(src1.val)), 
sum(hash(src2.key)) FROM T1 src1 JOIN T2 src2 ON src1.key+1 = src2.key JOIN T2 
src3 ON src2.key = src3.key
150 4707  153

it seems that the query is deterministic - so filing a bug.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1569) groupby_bigdata.q fails in minimr mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900521#action_12900521
 ] 

Joydeep Sen Sarma commented on HIVE-1569:
-

can u look at scriptfile1.q

it works with add file and refers to script by name. it works for both 
localmode and minimr

> groupby_bigdata.q fails in minimr mode
> --
>
> Key: HIVE-1569
> URL: https://issues.apache.org/jira/browse/HIVE-1569
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: He Yongqiang
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900501#action_12900501
 ] 

Joydeep Sen Sarma commented on HIVE-1570:
-

hmmm - how come scriptfile1.q works then?


CREATE TABLE dest1(key INT, value STRING);

ADD FILE src/test/scripts/testgrep;

FROM (
  FROM src
  SELECT TRANSFORM(src.key, src.value)
 USING 'testgrep' AS (tkey, tvalue) 
  CLUSTER BY tkey 
) tmap
INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue;


> referencing an added file by it's name in a transform script does not work in 
> hive local mode
> -
>
> Key: HIVE-1570
> URL: https://issues.apache.org/jira/browse/HIVE-1570
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>
> Yongqiang tried this and it fails in local mode:
> add file ../data/scripts/dumpdata_script.py;
> select count(distinct subq.key) from
> (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
> = 10) subq;
> this needs to be fixed because it means we cannot choose local mode 
> automatically in case of transform scripts (since different paths need to be 
> used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)
referencing an added file by it's name in a transform script does not work in 
hive local mode
-

 Key: HIVE-1570
 URL: https://issues.apache.org/jira/browse/HIVE-1570
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma


Yongqiang tried this and it fails in local mode:

add file ../data/scripts/dumpdata_script.py;

select count(distinct subq.key) from
(FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 
10) subq;


this needs to be fixed because it means we cannot choose local mode 
automatically in case of transform scripts (since different paths need to be 
used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1568) null_column.q fails in minimr mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900490#action_12900490
 ] 

Joydeep Sen Sarma commented on HIVE-1568:
-

also ppd_multi_insert.q fails with same error.


> null_column.q fails in minimr mode
> --
>
> Key: HIVE-1568
> URL: https://issues.apache.org/jira/browse/HIVE-1568
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> followup from hive-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=null_column.q test
> [junit] Begin query: null_column.q
> [junit] Exception: Client Execution failed with error code = 9
> [junit] See build/ql/tmp/hive.log, or try "ant test ... 
> -Dtest.silent=false" to get more logs.
> [junit] junit.framework.AssertionFailedError: Client Execution failed 
> with error code = 9
> [junit] See build/ql/tmp/hive.log, or try "ant test ... 
> -Dtest.silent=false" to get more logs.
> [junit]   at junit.framework.Assert.fail(Assert.java:47)
> [junit]   at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_null_column(TestCliDriver.java:108)
> i lost the hive.log - but it's happening in the MoveTask that corresponds to 
> this statement:
> insert overwrite directory "../build/ql/test/data/warehouse/null_columns.out" 
> select null, null from temp_null;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1568) null_column.q fails in minimr mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)
null_column.q fails in minimr mode
--

 Key: HIVE-1568
 URL: https://issues.apache.org/jira/browse/HIVE-1568
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma


followup from hive-1523:

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=null_column.q test

[junit] Begin query: null_column.q
[junit] Exception: Client Execution failed with error code = 9
[junit] See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.
[junit] junit.framework.AssertionFailedError: Client Execution failed with 
error code = 9
[junit] See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.
[junit]   at junit.framework.Assert.fail(Assert.java:47)
[junit]   at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_null_column(TestCliDriver.java:108)

i lost the hive.log - but it's happening in the MoveTask that corresponds to 
this statement:
insert overwrite directory "../build/ql/test/data/warehouse/null_columns.out" 
select null, null from temp_null;



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1562) default hadoop version should be 0.20.2 or newer

2010-08-19 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1562:


Summary: default hadoop version should be 0.20.2 or newer  (was: 
CombineHiveInputFormat issues in minimr mode)
Description: 
The following failure report in CombineHiveInputFormat can be resolved by 
revising the hadoop-20 version being used:



followup from HIVE-1523. This is probably because of CombineHiveInputFormat:

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=sample10.q test

insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart 
where ds is not null and key < 10
2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select 
*\
 from srcpart where ds is not null and key < 10
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: type: QUERY
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
2010-08-18 15:13:54,704 WARN  mapred.JobClient 
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser 
for parsing the arguments. Applicati\
ons should implement Tool for the same.
2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener 
(EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
java.lang.IllegalArgumentException: Network location name contains /: 
/default-rack
  at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)
  at org.apache.hadoop.net.NodeBase.(NodeBase.java:57)
  at 
org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326)
  at 
org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320)
  at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343)
  at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440)
  at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
  at java.lang.Thread.run(Thread.java:619)


2010-08-18 15:13:56,566 ERROR exec.MapRedTask 
(SessionState.java:printError(277)) - Ended Job = job_201008181513_0001 with 
errors
2010-08-18 15:13:56,597 ERROR ql.Driver (SessionState.java:printError(277)) - 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedT\
ask

See also:combine2.q


  was:
followup from HIVE-1523. This is probably because of CombineHiveInputFormat:

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=sample10.q test

insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart 
where ds is not null and key < 10
2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select 
*\
 from srcpart where ds is not null and key < 10
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: type: QUERY
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
2010-08-18 15:13:54,704 WARN  mapred.JobClient 
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser 
for parsing the arguments. Applicati\
ons should implement Tool for the same.
2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener 
(EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
java.lang.IllegalArgumentException: Network location name contains /: 
/default-rack
  at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)
  at org.apache.hadoop.net.NodeBase.(NodeBase.java:57)
  at 
org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326)
  at 
org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320)
  at org.apache.hadoop.mapred.JobIn

[jira] Commented: (HIVE-1562) CombineHiveInputFormat issues in minimr mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900398#action_12900398
 ] 

Joydeep Sen Sarma commented on HIVE-1562:
-

cool - thanks for the tip. looks like this is fixed in 0.20.2 - so if we can 
make that the default hadoop version dependency for hive - that would be enough.

seems like ivy/maven are not finding 0.20.2 (whereas there are hadoop jiras 
saying that these versions have been published - so looks like some fix needed 
in our dependency management stuff)

will change title.

> CombineHiveInputFormat issues in minimr mode
> 
>
> Key: HIVE-1562
> URL: https://issues.apache.org/jira/browse/HIVE-1562
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> followup from HIVE-1523. This is probably because of CombineHiveInputFormat:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=sample10.q test
> insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart 
> where ds is not null and key < 10
> 2010-08-18 15:13:54,378 ERROR SessionState 
> (SessionState.java:printError(277)) - PREHOOK: query: insert overwrite table 
> srcpartbucket partition(ds, hr) select *\
>  from srcpart where ds is not null and key < 10
> 2010-08-18 15:13:54,379 ERROR SessionState 
> (SessionState.java:printError(277)) - PREHOOK: type: QUERY
> 2010-08-18 15:13:54,379 ERROR SessionState 
> (SessionState.java:printError(277)) - PREHOOK: Input: 
> defa...@srcpart@ds=2008-04-08/hr=11
> 2010-08-18 15:13:54,379 ERROR SessionState 
> (SessionState.java:printError(277)) - PREHOOK: Input: 
> defa...@srcpart@ds=2008-04-08/hr=12
> 2010-08-18 15:13:54,379 ERROR SessionState 
> (SessionState.java:printError(277)) - PREHOOK: Input: 
> defa...@srcpart@ds=2008-04-09/hr=11
> 2010-08-18 15:13:54,379 ERROR SessionState 
> (SessionState.java:printError(277)) - PREHOOK: Input: 
> defa...@srcpart@ds=2008-04-09/hr=12
> 2010-08-18 15:13:54,704 WARN  mapred.JobClient 
> (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser 
> for parsing the arguments. Applicati\
> ons should implement Tool for the same.
> 2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener 
> (EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: 
> /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)
>   at org.apache.hadoop.net.NodeBase.(NodeBase.java:57)
>   at 
> org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326)
>   at 
> org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320)
>   at 
> org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440)
>   at 
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> 2010-08-18 15:13:56,566 ERROR exec.MapRedTask 
> (SessionState.java:printError(277)) - Ended Job = job_201008181513_0001 with 
> errors
> 2010-08-18 15:13:56,597 ERROR ql.Driver (SessionState.java:printError(277)) - 
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.MapRedT\
> ask
> See also:combine2.q

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1564) bucketizedhiveinputformat.q fails in minimr mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900209#action_12900209
 ] 

Joydeep Sen Sarma commented on HIVE-1564:
-

also groupby_bigdata.q shows same stack trace.

> bucketizedhiveinputformat.q fails in minimr mode
> 
>
> Key: HIVE-1564
> URL: https://issues.apache.org/jira/browse/HIVE-1564
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> followup to HIVE-1523:
> ant -Dtestcase=TestCliDriver -Dqfile=bucketizedhiveinputformat.q 
> -Dclustermode=miniMR  clean-test test 
> [junit] Begin query: bucketizedhiveinputformat.q
> [junit] Exception: null
> [junit] java.lang.AssertionError
> [junit]   at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:788)
> [junit]   at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:624)
> [junit]   at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
> ExecDriver.java:788
> // These tasks should have come from the same job.
>   
> assert(ti.getJobId() == jobId);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1566) diff_part_input_formats.q failure in minimr mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)
diff_part_input_formats.q failure in minimr mode


 Key: HIVE-1566
 URL: https://issues.apache.org/jira/browse/HIVE-1566
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma


followup from HIVE-1523:

[junit] Begin query: diff_part_input_formats.q
[junit] java.lang.AssertionError
[junit]   at 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat.setColumnNumber(RCFileOutputFormat.java:59)
[junit]   at 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat.getHiveRecordWriter(RCFileOutputFormat.java:136)
[junit]   at 
org.apache.hadoop.hive.ql.exec.ExecDriver.addInputPath(ExecDriver.java:1165)
[junit]   at 
org.apache.hadoop.hive.ql.exec.ExecDriver.addInputPaths(ExecDriver.java:1222)
[junit]   at 
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:590)

  public static void setColumnNumber(Configuration conf, int columnNum) {
assert columnNum > 0;


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1562) CombineHiveInputFormat issues in minimr mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1562:


Summary: CombineHiveInputFormat issues in minimr mode  (was: sample10.q 
fails in minimr mode)
Description: 
followup from HIVE-1523. This is probably because of CombineHiveInputFormat:

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=sample10.q test

insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart 
where ds is not null and key < 10
2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select 
*\
 from srcpart where ds is not null and key < 10
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: type: QUERY
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
2010-08-18 15:13:54,704 WARN  mapred.JobClient 
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser 
for parsing the arguments. Applicati\
ons should implement Tool for the same.
2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener 
(EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
java.lang.IllegalArgumentException: Network location name contains /: 
/default-rack
  at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)
  at org.apache.hadoop.net.NodeBase.(NodeBase.java:57)
  at 
org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326)
  at 
org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320)
  at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343)
  at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440)
  at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
  at java.lang.Thread.run(Thread.java:619)


2010-08-18 15:13:56,566 ERROR exec.MapRedTask 
(SessionState.java:printError(277)) - Ended Job = job_201008181513_0001 with 
errors
2010-08-18 15:13:56,597 ERROR ql.Driver (SessionState.java:printError(277)) - 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedT\
ask

See also:combine2.q


  was:
followup from HIVE-1523. This is probably because of CombineHiveInputFormat:

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test

insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart 
where ds is not null and key < 10
2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select 
*\
 from srcpart where ds is not null and key < 10
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: type: QUERY
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
2010-08-18 15:13:54,704 WARN  mapred.JobClient 
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser 
for parsing the arguments. Applicati\
ons should implement Tool for the same.
2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener 
(EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
java.lang.IllegalArgumentException: Network location name contains /: 
/default-rack
  at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)
  at org.apache.hadoop.net.NodeBase.(NodeBase.java:57)
  at 
org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326)
  at 
org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320)
  at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343)
  at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440)
  at 
org.apac

[jira] Created: (HIVE-1565) archive.q fails in minimr mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)
archive.q fails in minimr mode
--

 Key: HIVE-1565
 URL: https://issues.apache.org/jira/browse/HIVE-1565
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma


followup to hive-1523. in this case the results seem different

ant -Doffline=true -Dtestcase=TestCliDriver -Dqfile=archive.q 
-Dclustermode=miniMR  clean-test test


[junit] Begin query: archive.q
[junit] junit.framework.AssertionFailedError: Client execution results 
failed with error code = 1
[junit] See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.
[junit]   at junit.framework.Assert.fail(Assert.java:47)
[junit]   at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive(TestCliDriver.java:128)
 
[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I owner -I \
transient_lastDdlTime -I java.lang.RuntimeException -I at org -I at sun -I at 
java -I at junit -I Caused by: -I [.][.][.] [\
0-9]* more 
/data/users/jssarma/hive_testing/build/ql/test/logs/clientpositive/archive.q.out
 /data/users/jssarma/hive_testin\
g/ql/src/test/results/clientpositive/archive.q.out
[junit] 489c489
[junit] < NULL
[junit] ---
[junit] > 48656137

here's the query with differing output:

POSTHOOK: query: SELECT SUM(hash(col)) FROM (SELECT transform(*) using 'tr "\t" 
"_"' AS col
FROM (SELECT * FROM old_name WHERE ds='1') subq1) subq2



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1564) bucketizedhiveinputformat.q fails in minimr mode

2010-08-19 Thread Joydeep Sen Sarma (JIRA)
bucketizedhiveinputformat.q fails in minimr mode


 Key: HIVE-1564
 URL: https://issues.apache.org/jira/browse/HIVE-1564
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma


followup to HIVE-1523:

ant -Dtestcase=TestCliDriver -Dqfile=bucketizedhiveinputformat.q 
-Dclustermode=miniMR  clean-test test 

[junit] Begin query: bucketizedhiveinputformat.q
[junit] Exception: null
[junit] java.lang.AssertionError
[junit]   at 
org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:788)
[junit]   at 
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:624)
[junit]   at 
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)

ExecDriver.java:788
// These tasks should have come from the same job.  

assert(ti.getJobId() == jobId);


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1523:


Attachment: hive-1523.3.patch

small change - fix 0.20 version match to pick the right jetty version. 

> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1523:


Attachment: hive-1523.2.patch

with modified list of minimr tests:

+  

   

i took the ones that worked from John's list. also added a couple of tests that 
had 'add jar' and 'add file' commands (since their interaction with real 
cluster is quite different).




> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch, hive-1523.2.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1562) sample10.q fails in minimr mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)
sample10.q fails in minimr mode
---

 Key: HIVE-1562
 URL: https://issues.apache.org/jira/browse/HIVE-1562
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma


followup from HIVE-1523. This is probably because of CombineHiveInputFormat:

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test

insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart 
where ds is not null and key < 10
2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select 
*\
 from srcpart where ds is not null and key < 10
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: type: QUERY
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
2010-08-18 15:13:54,704 WARN  mapred.JobClient 
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser 
for parsing the arguments. Applicati\
ons should implement Tool for the same.
2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener 
(EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
java.lang.IllegalArgumentException: Network location name contains /: 
/default-rack
  at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)
  at org.apache.hadoop.net.NodeBase.(NodeBase.java:57)
  at 
org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326)
  at 
org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320)
  at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343)
  at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440)
  at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
  at java.lang.Thread.run(Thread.java:619)


2010-08-18 15:13:56,566 ERROR exec.MapRedTask 
(SessionState.java:printError(277)) - Ended Job = job_201008181513_0001 with 
errors
2010-08-18 15:13:56,597 ERROR ql.Driver (SessionState.java:printError(277)) - 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedT\
ask


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)
smb_mapjoin_8.q returns different results in miniMr mode


 Key: HIVE-1561
 URL: https://issues.apache.org/jira/browse/HIVE-1561
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma


follow on to HIVE-1523:

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test

POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join 
smb_bucket4_2 b on a.key = b.key

official results:
4 val_356 NULL  NULL
NULL  NULL  484 val_169
2000  val_169 NULL  NULL
NULL  NULL  3000  val_169
4000  val_125 NULL  NULL


in minimr mode:
2000  val_169 NULL  NULL
4 val_356 NULL  NULL
2000  val_169 NULL  NULL
4000  val_125 NULL  NULL
NULL  NULL  5000  val_125


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900048#action_12900048
 ] 

Joydeep Sen Sarma commented on HIVE-1523:
-

i am running through the above qfiles and see what executes successfully on 
minimr (because many dont).

one concern is the length of the tests. i think we need to divide our tests 
into a short and long regression. otherwise development cycle is severely 
impacted if everything has to be tested on every iteration.

> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1560) binaryoutputformat.q failure in minimr mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)
binaryoutputformat.q failure in minimr mode
---

 Key: HIVE-1560
 URL: https://issues.apache.org/jira/browse/HIVE-1560
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Joydeep Sen Sarma


this is a followup to HIVE-1523.

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver 
-Dqfile=binary_output_format.q test

fails in a significant manner. all the rows are flattened out into one row:

ntimeException -I at org -I at sun -I at java -I at junit -I Caused by: -I 
[.][.][.] [0-9]* more 
/data/users/jssarma/hive_testing/build/ql/test/logs/clientposit\
ive/binary_output_format.q.out 
/data/users/jssarma/hive_testing/ql/src/test/results/clientpositive/binary_output_format.q.out
[junit] 313c313,812
[junit] < 238 val_23886 val_86311 val_31127 val_27165 val_165409
...
[junit] ---
[junit] > 238 val_238
[junit] > 86  val_86
[junit] > 311 val_311
[junit] > 27  val_27
[junit] > 165 val_165
 ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-17 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1523:


  Status: Patch Available  (was: Open)
Assignee: Joydeep Sen Sarma

> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-17 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1523:


Attachment: hive-1523.1.patch

fixed minimr test mode.

enabled a couple of queries to always run (additionally) in minimr mode (like 
hbase-handler tests) when running standard tests. we should probably expand 
this to a larger number of queries (especially those requiring multiple 
reducers). i don't have good insight into this part - if people have ideas - we 
can expand the list easily.

> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-17 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899647#action_12899647
 ] 

Joydeep Sen Sarma commented on HIVE-1293:
-

can u check the getLockObjects() routine. it seemed to me that even u called 
with partition in X mode - it would add the table to the list of objects to be 
locked as well (in the same X mode).

i think we should, at least, as follow on make the optimization to not lock 
write entities for the duration of the query.

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-17 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899347#action_12899347
 ] 

Joydeep Sen Sarma commented on HIVE-1293:
-

also - i am missing something here:

+  for (WriteEntity output : plan.getOutputs()) {
+lockObjects.addAll(getLockObjects(output.getTable(), 
output.getPartition(), HiveLockMode.EXCLUSIVE));
+  }

getLockObjects():

+if (p != null) {
...
+  locks.add(new LockObject(new HiveLockObject(p.getTable()), mode));
+}

doesn't this end up locking the table in exclusive mode if a partition is being 
written to? (whereas the design talks about locking the table in shared mode 
only?)

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-17 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899320#action_12899320
 ] 

Joydeep Sen Sarma commented on HIVE-1293:
-

a little bummed that locks need to be held for entire query execution. that 
could mean a writer blocking readers for hours.

hive's query plans seem to be of two distinct stages:
1. read a bunch of stuff, compute intermediate/final data
2. move final data into output locations

ie. - a single query never reads what it writes (into a final output location). 
even if #1 and #2 are mingled today - they can easily be put in order.

in that sense - we only need to get shared locks for all read entities involved 
in #1 to begin with. once phase #1 is done, we can drop all the read locks and 
get the exclusive locks for all the write entities in #2, perform #2 and quit. 
that way exclusive locks are held for a very short duration. i think this 
scheme is similarly deadlock free (now there are two independent lock 
acquire/release phases - and each of them can lock stuff in lex. order).

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1540) Read-only, columnar data file for nested data structures

2010-08-16 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899199#action_12899199
 ] 

Joydeep Sen Sarma commented on HIVE-1540:
-

are there a lot of use cases for nested data structures? Google's approach is 
motivated by widespread use of Protocol Buffers. At Facebook - thrift 
serialized data sets (that motivated the initial support for nested data types) 
hasn't taken off.

I think what's much more common is json serialized data (or map types more 
restrictively). it would be much more worthwhile, to begin with, to have 
optimized codecs and deserializers for map types.

> Read-only, columnar data file for nested data structures
> 
>
> Key: HIVE-1540
> URL: https://issues.apache.org/jira/browse/HIVE-1540
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jeff Hammerbacher
>
> RCFile is a great start on an optimized layout for working with structured 
> data with Hive. Given that Hive's data model supports nested lists and maps, 
> and taking inspiration from the recent work by Google on Dremel, it may be 
> useful for the Hive community to think about how to improve the RCFile format 
> for nested data structures.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-13 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898118#action_12898118
 ] 

Joydeep Sen Sarma commented on HIVE-1530:
-

don't dis-allow hive.* options not specified in HiveConf. reason is that hive 
is extensible at various points via custom code and those have access to config 
object and installs may want to set variables specific to their plugins etc. 
(we shouldn't be in the business of telling them what not to name them)

> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-12 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898080#action_12898080
 ] 

Joydeep Sen Sarma commented on HIVE-1530:
-

ok - that makes sense. leave a hive-site.xml.sample and 
hive-log4j.properties.example in the conf/. i agree with Ed's point about how 
difficult it is to figure out hadoop config variables now and hadoop is worse 
off for it. commands are nice - but having a template is better. it's easy to 
clone an example file and append/modify the default description to add site 
specific notes. and one can grep.

we could autogenerate the hive-site.xml.sample from config variable metadata in 
the source code. that would keep us in sync with code.

> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-12 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897993#action_12897993
 ] 

Joydeep Sen Sarma commented on HIVE-1530:
-

removing the .xml files makes sense.

but users may want to modify the log4.properties files. how would do they do 
that in the new arrangement?

> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1524) parallel execution failed if mapred.job.name is set

2010-08-10 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1524:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

committed - thanks Ning.

> parallel execution failed if mapred.job.name is set
> ---
>
> Key: HIVE-1524
> URL: https://issues.apache.org/jira/browse/HIVE-1524
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1524.2.patch, HIVE-1524.patch
>
>
> The plan file name was generated based on mapred.job.name. If the user 
> specify mapred.job.name before the query, two parallel queries will have 
> conflict plan file name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1524) parallel execution failed if mapred.job.name is set

2010-08-10 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1524:


Affects Version/s: 0.5.0
   (was: 0.7.0)

> parallel execution failed if mapred.job.name is set
> ---
>
> Key: HIVE-1524
> URL: https://issues.apache.org/jira/browse/HIVE-1524
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1524.2.patch, HIVE-1524.patch
>
>
> The plan file name was generated based on mapred.job.name. If the user 
> specify mapred.job.name before the query, two parallel queries will have 
> conflict plan file name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1524) parallel execution failed if mapred.job.name is set

2010-08-10 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897037#action_12897037
 ] 

Joydeep Sen Sarma commented on HIVE-1524:
-

will commit once tests pass.

> parallel execution failed if mapred.job.name is set
> ---
>
> Key: HIVE-1524
> URL: https://issues.apache.org/jira/browse/HIVE-1524
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1524.2.patch, HIVE-1524.patch
>
>
> The plan file name was generated based on mapred.job.name. If the user 
> specify mapred.job.name before the query, two parallel queries will have 
> conflict plan file name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1524) parallel execution failed if mapred.job.name is set

2010-08-10 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896992#action_12896992
 ] 

Joydeep Sen Sarma commented on HIVE-1524:
-

looks good to me.

one comment: getJobID is a very confusing name (sounds like we are getting the 
hadoop jobid or something like that). it would be nice to make it more explicit 
(getHiveJobID perhaps?).

> parallel execution failed if mapred.job.name is set
> ---
>
> Key: HIVE-1524
> URL: https://issues.apache.org/jira/browse/HIVE-1524
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1524.patch
>
>
> The plan file name was generated based on mapred.job.name. If the user 
> specify mapred.job.name before the query, two parallel queries will have 
> conflict plan file name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-09 Thread Joydeep Sen Sarma (JIRA)
ql tests no longer work in miniMR mode
--

 Key: HIVE-1523
 URL: https://issues.apache.org/jira/browse/HIVE-1523
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Joydeep Sen Sarma


as per title. here's the first exception i see:


2010-08-09 18:05:11,259 ERROR hive.log 
(MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
java.io.FileNotFoun\
dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
2010-08-09 18:05:11,259 ERROR hive.log 
(MetaStoreUtils.java:logAndThrowMetaException(746)) - 
java.io.FileNotFoundException: Fil\
e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
  at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
  at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
  at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
  at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-09 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1523:


Component/s: Query Processor
 (was: Build Infrastructure)

> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1521) compiling/testing against custom hadoop tree is broken

2010-08-09 Thread Joydeep Sen Sarma (JIRA)
compiling/testing against custom hadoop tree is broken
--

 Key: HIVE-1521
 URL: https://issues.apache.org/jira/browse/HIVE-1521
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Joydeep Sen Sarma


see:

http://wiki.apache.org/hadoop/Hive/DeveloperGuide#Advanced_Mode

compiling with specific value of hadoop.root no longer works because of the 
shims stuff. we should deprecate/fix this. it is still _very_ desirably to be 
able to test against a custom hadoop build (to test hive/hadoop integration).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1520) hive.mapred.local.mem should only be used in case of local mode job submissions

2010-08-09 Thread Joydeep Sen Sarma (JIRA)
hive.mapred.local.mem should only be used in case of local mode job submissions
---

 Key: HIVE-1520
 URL: https://issues.apache.org/jira/browse/HIVE-1520
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma


Currently - whenever we submit a map-reduce job via a child jvm process, hive 
sets HADOOP_HEAPSIZE to hive.mapred.local.mem (thereby limiting the max heap 
memory of the child jvm). the assumption being that we are submitting a job for 
local mode execution and different memory limits apply for that.

however - one can submit jobs via a child jvm for non local mode execution as 
well. This is useful, for example, if hive wants to submit jobs via different 
hadoop clients (for sending jobs to different hadoop clusters). in such case, 
we can use the 'hive.exec.submitviachild' and 'hadoop.bin.path' to dispatch job 
via an alternate hadoop client install point. however in such case, we don't 
need to set HADOOP_HEAPSIZE. all we are using the child jvm is to run the small 
bit of hive code that submits the job (and not for local mode execution).

in this case - we shouldn't be setting the child jvm's memory limit and should 
leave it to what the parent's value is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1432) Create a test case for case sensitive comparison done during field comparison

2010-08-08 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896423#action_12896423
 ] 

Joydeep Sen Sarma commented on HIVE-1432:
-

a few comments:

- don't need a custom script to convert \t to ^B. use 'tr' (see archive.q for 
example)
- instead of directly referring to the script path - please use a 'add file 
...' command and refer to the script directly by it's name in the transform 
clause - otherwise i think this test will run only in local mode (and may not 
be able to pass tests against real/minimr clusters potentially)

  (I think this is a problem with some other tests as well - but have to start 
somewhere)
- no need for drop tables at the beginning and end of test anymore. the test 
harness now takes care of this (cleaning up non-src tables before and after 
test queries)

> Create a test case for case sensitive comparison done during field comparison
> -
>
> Key: HIVE-1432
> URL: https://issues.apache.org/jira/browse/HIVE-1432
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Query Processor
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Fix For: 0.7.0
>
> Attachments: HIVE-1432.patch
>
>
> See HIVE-1271. This jira tracks the creation of a test case to test this fix 
> specifically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability

2010-08-07 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1513:


Attachment: hive-1513.3.patch

Ning - u are right - i didn't notice the HEAP setting. in the internal tree - 
the heap setting is done differently (and i thought that the internal tree does 
not override these scripts).

so i have incorporated both the suggestions (don't set opts/heap). also - the 
build script needed a slight change to make the .template file part of 
distribution.

> hive starter scripts should load admin/user supplied script for 
> configurability
> ---
>
> Key: HIVE-1513
> URL: https://issues.apache.org/jira/browse/HIVE-1513
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1513.1.patch, 1513.2.patch, hive-1513.3.patch
>
>
> it's difficult to add environment variables to Hive starter scripts except by 
> modifying the scripts directly. this is undesirable (since they are source 
> code). Hive starter scripts should load a admin supplied shell script for 
> configurability. This would be similar to what hadoop does with hadoop-env.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability

2010-08-07 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896231#action_12896231
 ] 

Joydeep Sen Sarma commented on HIVE-1513:
-

HADOOP_HEAPSIZE: hive-config.sh only supplies a default. if the admin has 
specified a value in hive-env - it will be used instead.
HADOOP_OPTS: seems like it's appending a specific JVM flag. i agree this 
doesn't make sense (the admin should choose whether they want that flag or 
not). i will post another patch after taking it out.

not sure about whether we should rename the template to .sh. hadoop-20 seems to 
have template files only. 

> hive starter scripts should load admin/user supplied script for 
> configurability
> ---
>
> Key: HIVE-1513
> URL: https://issues.apache.org/jira/browse/HIVE-1513
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1513.1.patch, 1513.2.patch
>
>
> it's difficult to add environment variables to Hive starter scripts except by 
> modifying the scripts directly. this is undesirable (since they are source 
> code). Hive starter scripts should load a admin supplied shell script for 
> configurability. This would be similar to what hadoop does with hadoop-env.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability

2010-08-06 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1513:


Attachment: 1513.2.patch

forgot to add one file

> hive starter scripts should load admin/user supplied script for 
> configurability
> ---
>
> Key: HIVE-1513
> URL: https://issues.apache.org/jira/browse/HIVE-1513
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1513.1.patch, 1513.2.patch
>
>
> it's difficult to add environment variables to Hive starter scripts except by 
> modifying the scripts directly. this is undesirable (since they are source 
> code). Hive starter scripts should load a admin supplied shell script for 
> configurability. This would be similar to what hadoop does with hadoop-env.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1516) optimize split sizes automatically taking into account amount of nature of map tasks

2010-08-06 Thread Joydeep Sen Sarma (JIRA)
optimize split sizes automatically taking into account amount of nature of map 
tasks


 Key: HIVE-1516
 URL: https://issues.apache.org/jira/browse/HIVE-1516
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Joydeep Sen Sarma


two immediate cases come to mind:
- pure filter job (ie. no map-side sort required)
- full aggregate computations only (like count(1)).

in these cases - the amount of data to be sorted is zero or negligible. so 
mapper parallelism (and split size) should be dictated by the size of the 
cluster. there's no point running 1 mappers on a 500 node cluster for a 
pure filter job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability

2010-08-06 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1513:


Attachment: 1513.1.patch

simple change to let hive started script include conf/hive-env.sh. a template 
is provided as an example. ran all tests on 20 and tested by hand that the 
inclusion works.

> hive starter scripts should load admin/user supplied script for 
> configurability
> ---
>
> Key: HIVE-1513
> URL: https://issues.apache.org/jira/browse/HIVE-1513
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Joydeep Sen Sarma
> Attachments: 1513.1.patch
>
>
> it's difficult to add environment variables to Hive starter scripts except by 
> modifying the scripts directly. this is undesirable (since they are source 
> code). Hive starter scripts should load a admin supplied shell script for 
> configurability. This would be similar to what hadoop does with hadoop-env.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability

2010-08-06 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1513:


  Status: Patch Available  (was: Open)
Assignee: Joydeep Sen Sarma

> hive starter scripts should load admin/user supplied script for 
> configurability
> ---
>
> Key: HIVE-1513
> URL: https://issues.apache.org/jira/browse/HIVE-1513
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1513.1.patch
>
>
> it's difficult to add environment variables to Hive starter scripts except by 
> modifying the scripts directly. this is undesirable (since they are source 
> code). Hive starter scripts should load a admin supplied shell script for 
> configurability. This would be similar to what hadoop does with hadoop-env.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability

2010-08-05 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895959#action_12895959
 ] 

Joydeep Sen Sarma commented on HIVE-1513:
-

yes - it's possible. however a lot of variables etc. are initialized by the 
time we get to loading ext/*.sh. for example we allow HADOOP_HEAPSIZE to be 
specified via env var. but aside from doing an export before launching the hive 
script, there's no way to configure this externally. the ext/* trick wouldn't 
work cause it's comes too late.

i think this is simple enough - we can just source a conf/hive-env.sh or 
something of the sort so that admins can provide right values for all these 
vars based on their requirements via config files.

> hive starter scripts should load admin/user supplied script for 
> configurability
> ---
>
> Key: HIVE-1513
> URL: https://issues.apache.org/jira/browse/HIVE-1513
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Joydeep Sen Sarma
>
> it's difficult to add environment variables to Hive starter scripts except by 
> modifying the scripts directly. this is undesirable (since they are source 
> code). Hive starter scripts should load a admin supplied shell script for 
> configurability. This would be similar to what hadoop does with hadoop-env.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability

2010-08-05 Thread Joydeep Sen Sarma (JIRA)
hive starter scripts should load admin/user supplied script for configurability
---

 Key: HIVE-1513
 URL: https://issues.apache.org/jira/browse/HIVE-1513
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: CLI
Reporter: Joydeep Sen Sarma


it's difficult to add environment variables to Hive starter scripts except by 
modifying the scripts directly. this is undesirable (since they are source 
code). Hive starter scripts should load a admin supplied shell script for 
configurability. This would be similar to what hadoop does with hadoop-env.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1509) Monitor the working set of the number of files

2010-08-05 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1509:


Affects Version/s: 0.6.0
   (was: 0.7.0)

> Monitor the working set of the number of files 
> ---
>
> Key: HIVE-1509
> URL: https://issues.apache.org/jira/browse/HIVE-1509
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, 
> HIVE-1509.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1509) Monitor the working set of the number of files

2010-08-05 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1509:


   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.7.0
   Resolution: Fixed

committed - thanks Ning.

it seems that the test problems were likely because there was a problem 
applying the patch.

> Monitor the working set of the number of files 
> ---
>
> Key: HIVE-1509
> URL: https://issues.apache.org/jira/browse/HIVE-1509
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, 
> HIVE-1509.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1509) Monitor the working set of the number of files

2010-08-04 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895534#action_12895534
 ] 

Joydeep Sen Sarma commented on HIVE-1509:
-

strange - let me retry. can u check the patch one last time? (perhaps it's not 
up to date with contents of ur tree?)

> Monitor the working set of the number of files 
> ---
>
> Key: HIVE-1509
> URL: https://issues.apache.org/jira/browse/HIVE-1509
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, 
> HIVE-1509.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1509) Monitor the working set of the number of files

2010-08-04 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895472#action_12895472
 ] 

Joydeep Sen Sarma commented on HIVE-1509:
-

the test result for dyn_part3.q is not matching the one provided in the patch. 
it seems that testnegativeclidriver is not executing anything but the first 
query in the .q file:


[junit] diff -a -I file: -I pfile: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I \
owner -I transient_lastDdlTime -I java.lang.RuntimeException -I at org -I at 
sun -I at java -I at junit -\
I Caused by: -I [.][.][.] [0-9]* more 
/data/users/jssarma/hive_trunk/build/ql/test/logs/clientnegative/dy\
n_part3.q.out
[junit] 9a10,27
[junit] > PREHOOK: query: create table nzhang_part( key string) partitioned 
by (value string)
[junit] > PREHOOK: type: CREATETABLE
[junit] > POSTHOOK: query: create table nzhang_part( key string) 
partitioned by (value string)
[junit] > POSTHOOK: type: CREATETABLE
[junit] > POSTHOOK: Output: defa...@nzhang_part
[junit] > PREHOOK: query: insert overwrite table nzhang_part 
partition(value) select key, value from \
src
[junit] > PREHOOK: type: QUERY
[junit] > PREHOOK: Input: defa...@src
[junit] > FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
[junit] > PREHOOK: query: create table nzhang_part( key string) partitioned 
by (value string)
[junit] > PREHOOK: type: CREATETABLE
[junit] > POSTHOOK: query: create table nzhang_part( key string) 
partitioned by (value string)
[junit] > POSTHOOK: type: CREATETABLE
[junit] > POSTHOOK: Output: defa...@nzhang_part
[junit] > PREHOOK: query: insert overwrite table nzhang_part 
partition(value) select key, value from \
src
[junit] > PREHOOK: type: QUERY
[junit] > PREHOOK: Input: defa...@src
[junit] > FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask


> Monitor the working set of the number of files 
> ---
>
> Key: HIVE-1509
> URL: https://issues.apache.org/jira/browse/HIVE-1509
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, 
> HIVE-1509.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1509) Monitor the working set of the number of files

2010-08-04 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895411#action_12895411
 ] 

Joydeep Sen Sarma commented on HIVE-1509:
-

ok - i will run tests on 20 and commit if all clear.

> Monitor the working set of the number of files 
> ---
>
> Key: HIVE-1509
> URL: https://issues.apache.org/jira/browse/HIVE-1509
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.4.patch, 
> HIVE-1509.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1509) Monitor the working set of the number of files

2010-08-04 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895384#action_12895384
 ] 

Joydeep Sen Sarma commented on HIVE-1509:
-

let me know once the tests pass 0.20 and i can commit.

one more question:
+MAXCREATEDFILES("hive.exec.max.created.files", 10),

i think u may have to append a 'L' to 10 since u are trying to later on do 
a:

+  long upperLimit =  HiveConf.getLongVar(job, 
HiveConf.ConfVars.MAXCREATEDFILES);

(or switch to using getIntVar). i am a little surprised how this is working 
because the 10 would be interpreted as Integer, go to the integer 
constructor which should leave the long default to -1. (or i guess i have 
forgotten how this works)


> Monitor the working set of the number of files 
> ---
>
> Key: HIVE-1509
> URL: https://issues.apache.org/jira/browse/HIVE-1509
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-1509.2.patch, HIVE-1509.3.patch, HIVE-1509.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1509) Monitor the working set of the number of files

2010-08-04 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895322#action_12895322
 ] 

Joydeep Sen Sarma commented on HIVE-1509:
-

can u try bucketmapjoin2.q in clientpositive. it's failing for me

> Monitor the working set of the number of files 
> ---
>
> Key: HIVE-1509
> URL: https://issues.apache.org/jira/browse/HIVE-1509
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-1509.2.patch, HIVE-1509.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1509) Monitor the working set of the number of files

2010-08-04 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895312#action_12895312
 ] 

Joydeep Sen Sarma commented on HIVE-1509:
-

couple of comments:

- use ProgressCounter.CREATED_FILES directly instead of using 
valueOf("CREATED_FILES")
- can we move the check for total number of created files to inside 
checkFatalErrors? we are duplicating some code (for example we just fixed a 
problem where getCounters() can return null and ignoring that inside 
checkFatal).


> Monitor the working set of the number of files 
> ---
>
> Key: HIVE-1509
> URL: https://issues.apache.org/jira/browse/HIVE-1509
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Attachments: HIVE-1509.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1493) incorrect explanation when local mode not chosen automatically

2010-08-03 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma resolved HIVE-1493.
-

  Assignee: Joydeep Sen Sarma
Resolution: Fixed

fixed via HIVE-1422

> incorrect explanation when local mode not chosen automatically
> --
>
> Key: HIVE-1493
> URL: https://issues.apache.org/jira/browse/HIVE-1493
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
>Priority: Minor
>
> slipped past in 1408:
> // check for max input size   
>   
> if (inputSummary.getLength() > maxBytes)
> return "Input Size (= " + maxBytes + ") is larger than " +
> HiveConf.ConfVars.LOCALMODEMAXBYTES.varname + " (= " + maxBytes + 
> ")";
> printing same value twice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null

2010-08-02 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1422:


Attachment: 1422.2.patch

damn - this uncovered a bug in the tests and fixed an unncessary throws 
declaration.

> skip counter update when RunningJob.getCounters() returns null
> --
>
> Key: HIVE-1422
> URL: https://issues.apache.org/jira/browse/HIVE-1422
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Joydeep Sen Sarma
> Fix For: 0.7.0
>
> Attachments: 1422.2.patch, 1422.2.patch, HIVE-1422.1.patch
>
>
> Under heavy load circumstances on some Hadoop versions, we may get a NPE from 
> trying to dereference a null Counters object.  I don't have a unit test which 
> can reproduce it, but here's an example stack from a production cluster we 
> saw today:
> 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 
> with exception 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999)
> at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1507) Supply DriverContext to Hooks

2010-08-02 Thread Joydeep Sen Sarma (JIRA)
Supply DriverContext to Hooks
-

 Key: HIVE-1507
 URL: https://issues.apache.org/jira/browse/HIVE-1507
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma


The DriverContext and the Context (linked off the latter) created during query 
compilation have information that's invaluable to writing hooks. In particular, 
the Context object has a cache of pathname to file size mappings looked up via 
hdfs. i would like to get access to this cache (for both reading and writing) 
in order to write a hook that depends on query size (for the purpose of 
dispatching it to the right cluster).

It's unfortunate we don't have a generic context object for hooks (into which 
we can add more stuff as needed). This is forcing an unnecessary api 
enhancement (we should be able to maintain backwards compatibility using 
reflection though). I think going forward we should have a generic context 
object with Session and Query related data inside.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null

2010-08-02 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1422:


Status: Patch Available  (was: Reopened)

> skip counter update when RunningJob.getCounters() returns null
> --
>
> Key: HIVE-1422
> URL: https://issues.apache.org/jira/browse/HIVE-1422
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Joydeep Sen Sarma
> Fix For: 0.7.0
>
> Attachments: 1422.2.patch, HIVE-1422.1.patch
>
>
> Under heavy load circumstances on some Hadoop versions, we may get a NPE from 
> trying to dereference a null Counters object.  I don't have a unit test which 
> can reproduce it, but here's an example stack from a production cluster we 
> saw today:
> 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 
> with exception 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999)
> at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1430) serializing/deserializing the query plan is useless and expensive

2010-07-30 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894253#action_12894253
 ] 

Joydeep Sen Sarma commented on HIVE-1430:
-

hey - so i am a little puzzled by this change:

- Zheng had added serialize/deserialize of the plan to make sure it got tested. 
wasn't a bad idea.
- i had added this option to the build file so that serialize would actually 
run during the tests (and not outside of tests)
- note that this property is not set outside of test environment. so 
serialize/deserialize of plan would not have been happening during regular use 
of hive

so not sure to me why we are making this change (are we concerned about 
memory/cpu usage during testing)? unless i am missing something major - this 
has no impact on memory/cpu of regular hive client.

> serializing/deserializing the query plan is useless and expensive
> -
>
> Key: HIVE-1430
> URL: https://issues.apache.org/jira/browse/HIVE-1430
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1430.patch
>
>
> We should turn it off by default

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null

2010-07-30 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1422:


Attachment: 1422.2.patch

- cleaned up ExecDriver a bit - removed some dead code, some unnecessary global 
vars and throw better exception if JT has lost job
- fixed HIVE-1493 here as well. it's a one line fix in a printf

running tests.

> skip counter update when RunningJob.getCounters() returns null
> --
>
> Key: HIVE-1422
> URL: https://issues.apache.org/jira/browse/HIVE-1422
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Joydeep Sen Sarma
> Fix For: 0.7.0
>
> Attachments: 1422.2.patch, HIVE-1422.1.patch
>
>
> Under heavy load circumstances on some Hadoop versions, we may get a NPE from 
> trying to dereference a null Counters object.  I don't have a unit test which 
> can reproduce it, but here's an example stack from a production cluster we 
> saw today:
> 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 
> with exception 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999)
> at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null

2010-07-30 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894126#action_12894126
 ] 

Joydeep Sen Sarma commented on HIVE-1422:
-

more hadoop goriness - i think John your fix was pretty spot on:

- there are three levels of job storage: 
   a. fully in memory (can get status and counters)
   b. partially in memory (a la retired - can get status and not counters)
   c. on disk (completed jobs)

so what is happening is that we are hitting case b. jobstatus is available - 
but not counters. we should probably anticipate the null jobstatus (which we 
used to get in 0.17 before b. and c. were available).

what is the effect of not having final counter values available in Hive? Local 
mode also doesn't report counters i think.

> skip counter update when RunningJob.getCounters() returns null
> --
>
> Key: HIVE-1422
> URL: https://issues.apache.org/jira/browse/HIVE-1422
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Joydeep Sen Sarma
> Fix For: 0.7.0
>
> Attachments: HIVE-1422.1.patch
>
>
> Under heavy load circumstances on some Hadoop versions, we may get a NPE from 
> trying to dereference a null Counters object.  I don't have a unit test which 
> can reproduce it, but here's an example stack from a production cluster we 
> saw today:
> 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 
> with exception 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999)
> at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null

2010-07-29 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893852#action_12893852
 ] 

Joydeep Sen Sarma commented on HIVE-1422:
-

i looked at the hadoop source for 20 a bit. looks like both getCounters() and 
getJob() can return null (in case the job cannot be found). on 0.20 - completed 
jobs are looked up from persistent store - so i think this is pretty hard to 
happen (if it does - it seems like a hadoop bug). but for 17 (and maybe other 
versions in between) - we need to guard against these.

> skip counter update when RunningJob.getCounters() returns null
> --
>
> Key: HIVE-1422
> URL: https://issues.apache.org/jira/browse/HIVE-1422
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Joydeep Sen Sarma
> Fix For: 0.7.0
>
> Attachments: HIVE-1422.1.patch
>
>
> Under heavy load circumstances on some Hadoop versions, we may get a NPE from 
> trying to dereference a null Counters object.  I don't have a unit test which 
> can reproduce it, but here's an example stack from a production cluster we 
> saw today:
> 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 
> with exception 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999)
> at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1493) incorrect explanation when local mode not chosen automatically

2010-07-29 Thread Joydeep Sen Sarma (JIRA)
incorrect explanation when local mode not chosen automatically
--

 Key: HIVE-1493
 URL: https://issues.apache.org/jira/browse/HIVE-1493
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Priority: Minor


slipped past in 1408:

// check for max input size 

if (inputSummary.getLength() > maxBytes)
return "Input Size (= " + maxBytes + ") is larger than " +
HiveConf.ConfVars.LOCALMODEMAXBYTES.varname + " (= " + maxBytes + 
")";


printing same value twice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics

2010-07-28 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893462#action_12893462
 ] 

Joydeep Sen Sarma commented on HIVE-1408:
-

Ning - anything else u need from me? i was hoping to get it in before hive-417. 
otherwise i am sure would have to regenerate/reconcile a ton of stuff

> add option to let hive automatically run in local mode based on tunable 
> heuristics
> --
>
> Key: HIVE-1408
> URL: https://issues.apache.org/jira/browse/HIVE-1408
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 
> 1408.7.patch, hive-1408.6.patch
>
>
> as a followup to HIVE-543 - we should have a simple option (enabled by 
> default) to let hive run in local mode if possible.
> two levels of options are desirable:
> 1. hive.exec.mode.local.auto=true/false // control whether local mode is 
> automatically chosen
> 2. Options to control different heuristics, some naiive examples:
>  hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode 
> if data > 1G
>  hive.exec.mode.local.auto.script.enable=true/false // choose if local 
> mode is enabled for queries with user scripts
> this can be implemented as a pre/post execution hook. It makes sense to 
> provide this as a standard hook in the hive codebase since it's likely to 
> improve response time for many users (especially for test queries).
> the initial proposal is to choose this at a query level and not at per 
> hive-task (ie. hadoop job) level. per job-level requires more changes to 
> compilation (to not pre-commit to hdfs or local scratch directories at 
> compile time).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-417) Implement Indexing in Hive

2010-07-28 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893455#action_12893455
 ] 

Joydeep Sen Sarma commented on HIVE-417:


i am waiting for a commit on hive-1408. that's probably gonna collide.

> Implement Indexing in Hive
> --
>
> Key: HIVE-417
> URL: https://issues.apache.org/jira/browse/HIVE-417
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, 
> hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, 
> hive-indexing.5.thrift.patch, hive.indexing.11.patch, hive.indexing.12.patch, 
> idx2.png, indexing_with_ql_rewrites_trunk_953221.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics

2010-07-28 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893314#action_12893314
 ] 

Joydeep Sen Sarma commented on HIVE-1408:
-

yeah - so the solution is that the mapred.local.dir needs to be set correctly 
in hive/hadoop client side xml. for our internal install - i will send a diff 
changing the client side to point to /tmp (instead of having server side 
config).

there's nothing to do on the hive open source version. mapred.local.dir is a 
client only variable and needs to be set specific to the client side by the 
admin. basically our internal client side config has a bug :-)

> add option to let hive automatically run in local mode based on tunable 
> heuristics
> --
>
> Key: HIVE-1408
> URL: https://issues.apache.org/jira/browse/HIVE-1408
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 
> 1408.7.patch, hive-1408.6.patch
>
>
> as a followup to HIVE-543 - we should have a simple option (enabled by 
> default) to let hive run in local mode if possible.
> two levels of options are desirable:
> 1. hive.exec.mode.local.auto=true/false // control whether local mode is 
> automatically chosen
> 2. Options to control different heuristics, some naiive examples:
>  hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode 
> if data > 1G
>  hive.exec.mode.local.auto.script.enable=true/false // choose if local 
> mode is enabled for queries with user scripts
> this can be implemented as a pre/post execution hook. It makes sense to 
> provide this as a standard hook in the hive codebase since it's likely to 
> improve response time for many users (especially for test queries).
> the initial proposal is to choose this at a query level and not at per 
> hive-task (ie. hadoop job) level. per job-level requires more changes to 
> compilation (to not pre-commit to hdfs or local scratch directories at 
> compile time).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   6   >