date:20100818

[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-18 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Attachment: HIVE-1307.3.patch
HIVE-1307.3_java.patch

Uploading HIVE-1307.3.patch and HIVE-1307.3_java.patch (java changes only). 
This patch fixes a bug in dynamic partition insert (adding partition column 
property in GenMRFileSink1.java). Also added one unit test case merge4.q for 
this case.

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.3.patch, 
> HIVE-1307.3_java.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1505) Support non-UTF8 data

2010-08-18 Thread Ted Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Xu updated HIVE-1505:
-

Attachment: trunk-encoding.patch

We implemented encoding config feature on tables.
Set table encoding through serde parameter, for example:
{code}
alter table src set serdeproperties ('serialization.encoding'='GBK');
{code}
that makes table src using GBK encoding (Chinese encoding format). Further 
more, if using command line interface, parameter 'hive.cli.encoding' shall be 
set. 'hive.cli.encoding' must set before hive prompt started, so set 
'hive.cli.encoding' in hive-site.xml or using -hiveconf hive.cli.encoding=GBK 
in command line parameter, instead of 'set hive.cli.encoding=GBK' in hive ql.
Because of the reason above, I can't find a way to add a unit test.




> Support non-UTF8 data
> -
>
> Key: HIVE-1505
> URL: https://issues.apache.org/jira/browse/HIVE-1505
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: bc Wong
> Attachments: trunk-encoding.patch
>
>
> I'd like to work with non-UTF8 data easily.
> Suppose I have data in latin1. Currently, doing a "select *" will return the 
> upper ascii characters in '\xef\xbf\xbd', which is the replacement character 
> '\ufffd' encoded in UTF-8. Would be nice for Hive to understand different 
> encodings, or to have a concept of byte string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-18 Thread Amareshwari Sriramadasu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900178#action_12900178
 ] 

Amareshwari Sriramadasu commented on HIVE-1561:
---

When I tried SMB join on local machine (pseudo distributed mode) I'm seeing 
wrong results for the join. I think if there are more than one mapper, the join 
logic does not work correctly.
Here is my run:
{noformat}
hive> describe extended smb_input;
OK
key int
value   int

Detailed Table Information  Table(tableName:smb_input, dbName:default, 
owner:amarsri, createTime:1282026968, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null), 
FieldSchema(name:value, type:int, comment:null)], 
location:hdfs://localhost:19000/user/hive/warehouse/smb_input, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1}), bucketCols:[key], 
sortCols:[Order(col:key, order:1)], parameters:{}), partitionKeys:[], 
parameters:{SORTBUCKETCOLSPREFIX=TRUE, transient_lastDdlTime=1282027032}, 
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.05 seconds

hive> select * from smb_input;
OK
12  35
48  40
100 100
Time taken: 0.343 seconds

hive> set hive.optimize.bucketmapjoin = true;
hive> set hive.optimize.bucketmapjoin.sortedmerge = true;

hive> select /*+ MAPJOIN(a) */ * from smb_input a join smb_input b on 
a.key=b.key;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201008031340_0170, Tracking URL = 
http://localhost:50030/jobdetails.jsp?jobid=job_201008031340_0170
Kill Command = /home/amarsri/workspace/Yahoo20/bin/../bin/hadoop job  
-Dmapred.job.tracker=localhost:19101 -kill job_201008031340_0170
2010-08-19 11:04:00,040 Stage-1 map = 0%,  reduce = 0%
2010-08-19 11:04:10,253 Stage-1 map = 50%,  reduce = 0%
2010-08-19 11:04:13,271 Stage-1 map = 100%,  reduce = 0%
2010-08-19 11:05:13,636 Stage-1 map = 100%,  reduce = 0%
2010-08-19 11:05:19,664 Stage-1 map = 50%,  reduce = 0%
2010-08-19 11:05:25,733 Stage-1 map = 100%,  reduce = 0%
2010-08-19 11:05:28,762 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201008031340_0170
OK
12  35  12  35
48  40  48  40
Time taken: 100.056 seconds

Expected output:
12  35  12  35
48  40  48  40
100 100 100 100
{noformat} 

The MapReduce Job launched for the join has 2 maps. Second map's first attempt 
(attempt_201008031340_0170_m_01_0)  fails with following expetion:
{noformat}
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
replace taskId from execContext 
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
new taskId: FS 00_0
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Final Path: FS 
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/00_0
2010-08-19 11:04:07,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Writing to temp file: FS 
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/_tmp.00_0
2010-08-19 11:04:07,196 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
New Final Path: FS 
hdfs://localhost:19000/tmp/hive-amarsri/hive_2010-08-19_11-03-49_024_6433980309871155022/_tmp.-ext-10001/00_0
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
5 finished. closing... 
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
5 forwarded 5 rows
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
5 Close done
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 
finished. closing... 
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 
forwarded 1 rows
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 
finished. closing... 
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 
forwarded 1 rows
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 
finished. closing... 
2010-08-19 11:05:08,290 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 
forwarded 0 rows
2010-08-19 11:05:08,656 ERROR ExecMapper: Hit error while closing operators - 
failing tree
2010-08-19 11:05:08,658 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:253)
at org.apache.hado

[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-18 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900176#action_12900176
 ] 

Namit Jain commented on HIVE-1561:
--

My bad, I did not see the entire results - so, based on what Joy is saying, it 
does not work in minimr mode

> smb_mapjoin_8.q returns different results in miniMr mode
> 
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer 
> join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-741) NULL is not handled correctly in join

2010-08-18 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-741:
-

Attachment: patch-741-2.txt

Patch fixes SMBMapJoinOperator also. I modified compareKeys(ArrayList 
k1, ArrayList k2) to do the following:
{code}
if (hasNullElements(k1) && hasNullElements(k2)) {
  return -1; // just return k1 is smaller than k2
} else if (hasNullElements(k1)) {
  return (0 - k2.size());
} else if (hasNullElements(k2)) {
  return k1.size();
}
   ... //the existing code.
{code}

Does the above make sense?

Updated the testcase with smb join queries. 

When I'm running smb join on my local machine (pseudo distributed mode), I'm 
getting different results. I think that is mostly because of HIVE-1561. Will 
update the issue with my findings.

> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-741-1.txt, patch-741-2.txt, patch-741.txt, 
> smbjoin_nulls.q.txt
>
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-18 Thread He Yongqiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900161#action_12900161
 ] 

He Yongqiang commented on HIVE-1561:


This is the complete result from Hive's smb_mapjoin_8.q.out, it's correct:
{noformat}
POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join 
smb_bucket4_2 b on a.key = b.key
POSTHOOK: type: QUERY
POSTHOOK: Input: defa...@smb_bucket4_2
POSTHOOK: Input: defa...@smb_bucket4_1
POSTHOOK: Output: 
file:/tmp/jssarma/hive_2010-07-21_12-02-34_137_8141051139723931378/1
POSTHOOK: Lineage: smb_bucket4_1.key SIMPLE 
[(smb_bucket_input)smb_bucket_input.FieldSchema(name:key, type:int, 
comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_1.value SIMPLE 
[(smb_bucket_input)smb_bucket_input.FieldSchema(name:value, type:string, 
comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_2.key SIMPLE 
[(smb_bucket_input)smb_bucket_input.FieldSchema(name:key, type:int, 
comment:from deserializer), ]
POSTHOOK: Lineage: smb_bucket4_2.value SIMPLE 
[(smb_bucket_input)smb_bucket_input.FieldSchema(name:value, type:string, 
comment:from deserializer), ]
4   val_356 NULLNULL
NULLNULL484 val_169
2000val_169 NULLNULL
NULLNULL3000val_169
4000val_125 NULLNULL
NULLNULL5000val_125
{noformat}




> smb_mapjoin_8.q returns different results in miniMr mode
> 
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer 
> join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1523:


Attachment: hive-1523.3.patch

small change - fix 0.20 version match to pick the right jetty version. 

> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch, hive-1523.2.patch, hive-1523.3.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900141#action_12900141
 ] 

John Sichi commented on HIVE-1523:
--

Yeah, shortreg/longreg split would be good.  The challenge is to keep longreg 
healthy since breakages don't get caught with every checkin, so we'll need

(a) automation to run it constantly and report failures
(b) people to actually fix failures in a timely fashion


> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch, hive-1523.2.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1563) HBase tests broken

2010-08-18 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1563:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks John -- running all the tests now to see if we need more log 
file updates

> HBase tests broken
> --
>
> Key: HIVE-1563
> URL: https://issues.apache.org/jira/browse/HIVE-1563
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1563.1.patch
>
>
> Broken by HIVE-1548, which did not update all log files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-08-18 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1546:


Assignee: Ashutosh Chauhan

> Ability to plug custom Semantic Analyzers for Hive Grammar
> --
>
> Key: HIVE-1546
> URL: https://issues.apache.org/jira/browse/HIVE-1546
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: hive-1546.patch
>
>
> It will be useful if Semantic Analysis phase is made pluggable such that 
> other projects can do custom analysis of hive queries before doing metastore 
> operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-08-18 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1546:
-

Fix Version/s: 0.7.0
Affects Version/s: 0.7.0

> Ability to plug custom Semantic Analyzers for Hive Grammar
> --
>
> Key: HIVE-1546
> URL: https://issues.apache.org/jira/browse/HIVE-1546
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: hive-1546.patch
>
>
> It will be useful if Semantic Analysis phase is made pluggable such that 
> other projects can do custom analysis of hive queries before doing metastore 
> operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-18 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900138#action_12900138
 ] 

Namit Jain commented on HIVE-1561:
--

Looked at the data in detail:

The tables should be:

smb_bucket4_1 

4 v356
2000  v169
4000 v125

smb_bucket4_2

484  v169
3000v169
5000v125


So, the above query should result in 6 rows - both the results are wrong

> smb_mapjoin_8.q returns different results in miniMr mode
> 
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer 
> join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-08-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900137#action_12900137
 ] 

Ashutosh Chauhan commented on HIVE-1546:


Btw, can someone assign this jira to me and add me to the list of contributors 
so that in future I can do that myself.

> Ability to plug custom Semantic Analyzers for Hive Grammar
> --
>
> Key: HIVE-1546
> URL: https://issues.apache.org/jira/browse/HIVE-1546
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Ashutosh Chauhan
> Attachments: hive-1546.patch
>
>
> It will be useful if Semantic Analysis phase is made pluggable such that 
> other projects can do custom analysis of hive queries before doing metastore 
> operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-08-18 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1546:
---

Attachment: hive-1546.patch

Attached patch adds the capability to Hive so that custom semantic analysis of 
query is possible before it is handed over to Hive. Plus there are few other 
miscellaneous refactoring around it. Changes include:
* Addition of SemanticAnalyzerFactoryInterface. If conf has a particular 
variable specified, a custom analyzer will be loaded and used, otherwise 
existing Hive Semantic Analyzer will be used. So, default behavior is preserved.
* Changed visibility of few methods in DDLSemanticAnalyzer and SemanticAnalyzer 
from private to protected as I wanted to override them in my custom analyzer.
* Changed file format specification in grammar, so that it can optionally take 
two more parameters (InputDriver and OutputDriver) in addition to InputFormat 
and OutputFormat. These are optional, so preserves the default behavior.
* In file format specification, currently SequenceFile, TextFile and RCFile are 
supported through keyword. Expanded that production so to accept an identifier 
so that its possible to provide support for more file formats without needing 
to change Hive grammar every time. Currently, that token results in exception 
since there are none, but when we add support for other file formats that could 
be changed. This preserves current behavior.

Note that there are no new test cases since its mostly code restructuring and 
doesnt add/modify current behavior, thus passing existing tests should suffice. 
 
I should point out most of these changes are driven by Howl and would like to 
thank John for suggesting the initial approach for these changes.
 

> Ability to plug custom Semantic Analyzers for Hive Grammar
> --
>
> Key: HIVE-1546
> URL: https://issues.apache.org/jira/browse/HIVE-1546
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Ashutosh Chauhan
> Attachments: hive-1546.patch
>
>
> It will be useful if Semantic Analysis phase is made pluggable such that 
> other projects can do custom analysis of hive queries before doing metastore 
> operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1563) HBase tests broken

2010-08-18 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900134#action_12900134
 ] 

Namit Jain commented on HIVE-1563:
--

running tests 

> HBase tests broken
> --
>
> Key: HIVE-1563
> URL: https://issues.apache.org/jira/browse/HIVE-1563
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1563.1.patch
>
>
> Broken by HIVE-1548, which did not update all log files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1563) HBase tests broken

2010-08-18 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1563:
-

Attachment: HIVE-1563.1.patch

> HBase tests broken
> --
>
> Key: HIVE-1563
> URL: https://issues.apache.org/jira/browse/HIVE-1563
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1563.1.patch
>
>
> Broken by HIVE-1548, which did not update all log files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1563) HBase tests broken

2010-08-18 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1563:
-

Status: Patch Available  (was: Open)

> HBase tests broken
> --
>
> Key: HIVE-1563
> URL: https://issues.apache.org/jira/browse/HIVE-1563
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1563.1.patch
>
>
> Broken by HIVE-1548, which did not update all log files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1563) HBase tests broken

2010-08-18 Thread John Sichi (JIRA)

HBase tests broken
--

 Key: HIVE-1563
 URL: https://issues.apache.org/jira/browse/HIVE-1563
 Project: Hadoop Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0


Broken by HIVE-1548, which did not update all log files.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1563) HBase tests broken

2010-08-18 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1563:
-

Hadoop Flags:   (was: [Reviewed])

> HBase tests broken
> --
>
> Key: HIVE-1563
> URL: https://issues.apache.org/jira/browse/HIVE-1563
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
>
> Broken by HIVE-1548, which did not update all log files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: alarming hive test failures in minimr mode

2010-08-18 Thread Namit Jain

Joy,

can you link the jiras from 1523 ? It will be easier to track that way.


Thanks,
-namit


From: Joydeep Sen Sarma [jssa...@facebook.com]
Sent: Wednesday, August 18, 2010 2:59 PM
To: hive-dev@hadoop.apache.org
Subject: alarming hive test failures in minimr mode

Hey Devs,

Since fixing 1523 - I have been trying to run hive queries in minimr mode.

I am alarmed by what I am seeing:
-  assertions firing deep inside hive in minimr mode
-  test results outright different from local mode results (and not 
because of limit or because of ordering).

I am going to file jiras as I can - please do assign them to yourself (or 
whoever u think the right person is). I think we should try to get these 
resolved asap - they seem to indicate significant bugs in features advertised 
by Hive (that do not get enough coverage on real clusters). Imho - this seems 
way more important than new feature dev.

Thanks,

Joydeep

[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-18 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900128#action_12900128
 ] 

Namit Jain commented on HIVE-1293:
--

Fixed a lot of bugs, added a lot of comments, tested it with a zooKeeper 
cluster of  3 nodes.
select * currently performs a dirty read, we can add a new parameter to change 
that behavior if need be.



> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive.1293.6.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1293) Concurreny Model for Hive

2010-08-18 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1293:
-

Status: Patch Available  (was: Open)

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive.1293.6.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1293) Concurreny Model for Hive

2010-08-18 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1293:
-

Attachment: hive.1293.6.patch

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive.1293.6.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-18 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900113#action_12900113
 ] 

Ning Zhang commented on HIVE-1510:
--

Other than the clean architecture concerns (Driver should be generic and should 
not assume tasks contain MR jobs), it seems also doesn't work if parallel 
execution is enabled: IOPrepareCache is thread local and parallel MR jobs are 
launched in different threads.



> HiveCombineInputFormat should not use prefix matching to find the 
> partitionDesc for a given path
> 
>
> Key: HIVE-1510
> URL: https://issues.apache.org/jira/browse/HIVE-1510
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1510.1.patch, hive-1510.3.patch
>
>
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> drop table combine_3_srcpart_seq_rc;
> create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
> (ds string, hr string) stored as sequencefile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="00") select * from src;
> alter table combine_3_srcpart_seq_rc set fileformat rcfile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="001") select * from src;
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00");
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001");
> select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key;
> drop table combine_3_srcpart_seq_rc;
> will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1556) tests broken

2010-08-18 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1556:
-

Affects Version/s: 0.7.0

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1556) tests broken

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900102#action_12900102
 ] 

John Sichi commented on HIVE-1556:
--

I'll regenerate the test output and post a separate JIRA.

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900098#action_12900098
 ] 

John Sichi commented on HIVE-1549:
--

Will commit when tests pass.


> Add ANSI SQL correlation aggregate function CORR(X,Y).
> --
>
> Key: HIVE-1549
> URL: https://issues.apache.org/jira/browse/HIVE-1549
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation 
> between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1556) tests broken

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900097#action_12900097
 ] 

John Sichi commented on HIVE-1556:
--

HBaseHandler does run with ant test on Hadoop 0.20, but not with 0.17, so it's 
important to run tests against both configurations.


> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1523:


Attachment: hive-1523.2.patch

with modified list of minimr tests:

+  

   

i took the ones that worked from John's list. also added a couple of tests that 
had 'add jar' and 'add file' commands (since their interaction with real 
cluster is quite different).




> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch, hive-1523.2.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1556) tests broken

2010-08-18 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900094#action_12900094
 ] 

Ning Zhang commented on HIVE-1556:
--

Could also because of HIVE-1548. Is HBaseHandler also excluded from ant test?

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1555) JDBC Storage Handler

2010-08-18 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1555:
-

Affects Version/s: (was: 0.5.0)
  Component/s: Drivers

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Drivers
>Reporter: Bob Robertson
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1558) introducing the "dual" table

2010-08-18 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1558:
-

Component/s: Query Processor

> introducing the "dual" table
> 
>
> Key: HIVE-1558
> URL: https://issues.apache.org/jira/browse/HIVE-1558
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Ning Zhang
>
> The "dual" table in MySQL and Oracle is very convenient in testing UDFs or 
> constructing rows without reading any other tables. 
> If dual is the only data source we could leverage the local mode execution. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-18 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1518:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed.  Thanks Mayank!


> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch, HIVE-1518.3.patch, 
> HIVE-1518.4.patch, HIVE-1518.5.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1556) tests broken

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900086#action_12900086
 ] 

John Sichi commented on HIVE-1556:
--

I got a failure running ant test just now for the HBase portion.

[junit] diff -a -I file: -I pfile: -I /tmp/ -I invalidscheme: -I lastUpdate\
Time -I lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeEx\
ception -I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [\
0-9]* more /data/users/jsichi/open/commit-trunk/build/hbase-handler/test/logs/h\
base-handler/hbase_bulk.m.out /data/users/jsichi/open/commit-trunk/hbase-handle\
r/src/test/results/hbase_bulk.m.out 
[junit] 109,110d108 
[junit] < PREHOOK: Input: defa...@hbsort
[junit] < PREHOOK: Output: defa...@hbsort   
[junit] 118d115 
[junit] < POSTHOOK: Input: defa...@hbsort   
[junit] 126,127d122 
[junit] < PREHOOK: Input: defa...@hbpartition   
[junit] < PREHOOK: Output: defa...@hbpartition  
[junit] 130d124 
[junit] < POSTHOOK: Input: defa...@hbpartition  
[junit] Exception: Client execution results failed with error code = 1  
[junit] junit.framework.AssertionFailedError: Client execution results fail\
ed with error code = 1  
[junit] at junit.framework.Assert.fail(Assert.java:47)  
[junit] at org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliD\
river_hbase_bulk(TestHBaseMinimrCliDriver.java:102) 
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce\
ssorImpl.java:39)   
[junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe\
thodAccessorImpl.java:25)   
[junit] at java.lang.reflect.Method.invoke(Method.java:597) 
[junit] at junit.framework.TestCase.runTest(TestCase.java:154)  
[junit] at junit.framework.TestCase.runBare(TestCase.java:127)  
 

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

2010-08-18 Thread Mayank Lahiri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900075#action_12900075
 ] 

Mayank Lahiri commented on HIVE-1549:
-

+1 looks good to me.

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> --
>
> Key: HIVE-1549
> URL: https://issues.apache.org/jira/browse/HIVE-1549
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation 
> between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-18 Thread He Yongqiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900074#action_12900074
 ] 

He Yongqiang commented on HIVE-1510:


About the additional hashmap added, it is used to match path to partitionDesc 
by discarding partitionDesc's schema information. 

In the long run, we should normalize all input path to let them contain full 
schema and authorization information. This is a must to let hive work with 
multiple hdfs clusters.

> HiveCombineInputFormat should not use prefix matching to find the 
> partitionDesc for a given path
> 
>
> Key: HIVE-1510
> URL: https://issues.apache.org/jira/browse/HIVE-1510
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1510.1.patch, hive-1510.3.patch
>
>
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> drop table combine_3_srcpart_seq_rc;
> create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
> (ds string, hr string) stored as sequencefile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="00") select * from src;
> alter table combine_3_srcpart_seq_rc set fileformat rcfile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="001") select * from src;
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00");
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001");
> select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key;
> drop table combine_3_srcpart_seq_rc;
> will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1562) sample10.q fails in minimr mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)

sample10.q fails in minimr mode
---

 Key: HIVE-1562
 URL: https://issues.apache.org/jira/browse/HIVE-1562
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma


followup from HIVE-1523. This is probably because of CombineHiveInputFormat:

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test

insert overwrite table srcpartbucket partition(ds, hr) select * from srcpart 
where ds is not null and key < 10
2010-08-18 15:13:54,378 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: query: insert overwrite table srcpartbucket partition(ds, hr) select 
*\
 from srcpart where ds is not null and key < 10
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: type: QUERY
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-08/hr=12
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=11
2010-08-18 15:13:54,379 ERROR SessionState (SessionState.java:printError(277)) 
- PREHOOK: Input: defa...@srcpart@ds=2008-04-09/hr=12
2010-08-18 15:13:54,704 WARN  mapred.JobClient 
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser 
for parsing the arguments. Applicati\
ons should implement Tool for the same.
2010-08-18 15:13:55,642 ERROR mapred.EagerTaskInitializationListener 
(EagerTaskInitializationListener.java:run(83)) - Job initialization failed:
java.lang.IllegalArgumentException: Network location name contains /: 
/default-rack
  at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)
  at org.apache.hadoop.net.NodeBase.(NodeBase.java:57)
  at 
org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2326)
  at 
org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2320)
  at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:343)
  at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:440)
  at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
  at java.lang.Thread.run(Thread.java:619)


2010-08-18 15:13:56,566 ERROR exec.MapRedTask 
(SessionState.java:printError(277)) - Ended Job = job_201008181513_0001 with 
errors
2010-08-18 15:13:56,597 ERROR ql.Driver (SessionState.java:printError(277)) - 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedT\
ask


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-192) Add TIMESTAMP column type

2010-08-18 Thread Shyam Sundar Sarkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam Sundar Sarkar updated HIVE-192:
-

Attachment: Hive-192.patch.txt

This is just the changes for thrift layer to see only string types being passed 
back and forth for Timestamp type.

> Add TIMESTAMP column type
> -
>
> Key: HIVE-192
> URL: https://issues.apache.org/jira/browse/HIVE-192
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Johan Oskarsson
>Assignee: Shyam Sundar Sarkar
> Attachments: create_2.q.txt, Hive-192.patch.txt, 
> TIMESTAMP_specification.txt
>
>
> create table something2 (test timestamp);
> ERROR: DDL specifying type timestamp which has not been defined
> java.lang.RuntimeException: specifying type timestamp which has not been 
> defined
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180)
>   at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-192) Add TIMESTAMP column type

2010-08-18 Thread Shyam Sundar Sarkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam Sundar Sarkar updated HIVE-192:
-

Status: Patch Available  (was: In Progress)

This is just the initial changes for others to look at and suggest.  I need 
suggestions about string to Timestamp conversion within Dynamic SerDe layer.

> Add TIMESTAMP column type
> -
>
> Key: HIVE-192
> URL: https://issues.apache.org/jira/browse/HIVE-192
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Johan Oskarsson
>Assignee: Shyam Sundar Sarkar
> Attachments: create_2.q.txt, TIMESTAMP_specification.txt
>
>
> create table something2 (test timestamp);
> ERROR: DDL specifying type timestamp which has not been defined
> java.lang.RuntimeException: specifying type timestamp which has not been 
> defined
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180)
>   at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-18 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang reassigned HIVE-1561:
--

Assignee: He Yongqiang

> smb_mapjoin_8.q returns different results in miniMr mode
> 
>
> Key: HIVE-1561
> URL: https://issues.apache.org/jira/browse/HIVE-1561
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: He Yongqiang
>
> follow on to HIVE-1523:
> ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test
> POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer 
> join smb_bucket4_2 b on a.key = b.key
> official results:
> 4 val_356 NULL  NULL
> NULL  NULL  484 val_169
> 2000  val_169 NULL  NULL
> NULL  NULL  3000  val_169
> 4000  val_125 NULL  NULL
> in minimr mode:
> 2000  val_169 NULL  NULL
> 4 val_356 NULL  NULL
> 2000  val_169 NULL  NULL
> 4000  val_125 NULL  NULL
> NULL  NULL  5000  val_125

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-18 Thread He Yongqiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900063#action_12900063
 ] 

He Yongqiang commented on HIVE-1510:


>>the IOPrepareCache is cleared in Driver, which should only contain generic 
>>code irrespect to task types. Can you do it in ExecDriver.execute()? This 
>>will new cache is only used in ExecDriver anyways.

ExecDriver is per map-reduce task. Driver is per query. We should do this for 
query granularity. I think the pathToPartitionDesc is also per query map?


>>some comments on why you need a new hash map keyed with the paths only will 
>>be helpful.
will do it in a next patch.

> HiveCombineInputFormat should not use prefix matching to find the 
> partitionDesc for a given path
> 
>
> Key: HIVE-1510
> URL: https://issues.apache.org/jira/browse/HIVE-1510
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1510.1.patch, hive-1510.3.patch
>
>
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> drop table combine_3_srcpart_seq_rc;
> create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
> (ds string, hr string) stored as sequencefile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="00") select * from src;
> alter table combine_3_srcpart_seq_rc set fileformat rcfile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="001") select * from src;
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00");
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001");
> select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key;
> drop table combine_3_srcpart_seq_rc;
> will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1561) smb_mapjoin_8.q returns different results in miniMr mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)

smb_mapjoin_8.q returns different results in miniMr mode


 Key: HIVE-1561
 URL: https://issues.apache.org/jira/browse/HIVE-1561
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma


follow on to HIVE-1523:

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver -Dqfile=smb_mapjoin_8.q test

POSTHOOK: query: select /*+mapjoin(a)*/ * from smb_bucket4_1 a full outer join 
smb_bucket4_2 b on a.key = b.key

official results:
4 val_356 NULL  NULL
NULL  NULL  484 val_169
2000  val_169 NULL  NULL
NULL  NULL  3000  val_169
4000  val_125 NULL  NULL


in minimr mode:
2000  val_169 NULL  NULL
4 val_356 NULL  NULL
2000  val_169 NULL  NULL
4000  val_125 NULL  NULL
NULL  NULL  5000  val_125


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

2010-08-18 Thread Pierre Huyn (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900062#action_12900062
 ] 

Pierre Huyn commented on HIVE-1549:
---

Thanks for your comments. The items have been taken care of in the patch #2.




> Add ANSI SQL correlation aggregate function CORR(X,Y).
> --
>
> Key: HIVE-1549
> URL: https://issues.apache.org/jira/browse/HIVE-1549
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation 
> between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

alarming hive test failures in minimr mode

2010-08-18 Thread Joydeep Sen Sarma

Hey Devs,

Since fixing 1523 - I have been trying to run hive queries in minimr mode.

I am alarmed by what I am seeing:
-  assertions firing deep inside hive in minimr mode
-  test results outright different from local mode results (and not 
because of limit or because of ordering).

I am going to file jiras as I can - please do assign them to yourself (or 
whoever u think the right person is). I think we should try to get these 
resolved asap - they seem to indicate significant bugs in features advertised 
by Hive (that do not get enough coverage on real clusters). Imho - this seems 
way more important than new feature dev.

Thanks,

Joydeep

[jira] Commented: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-18 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900057#action_12900057
 ] 

Ning Zhang commented on HIVE-1510:
--

As discussed offline with Yongqiang, we should clean up the pathToPartitionInfo 
to contain only canonical representations for each partition. This could result 
in much cleaner code. If we do that IOPrepareCache is not needed at all and the 
function getPartitionDescFromPath is just simple hash lookup. We can make it as 
a follow up JIRA along with cleaning up the unnecessary info in 
pathToPartitionInfo as well.

Here's some comments on the current patch:

 - the IOPrepareCache is cleared in Driver, which should only contain generic 
code irrespect to task types. Can you do it in ExecDriver.execute()? This will 
new cache is only used in ExecDriver anyways.
 - some comments on why you need a new hash map keyed with the paths only will 
be helpful.

> HiveCombineInputFormat should not use prefix matching to find the 
> partitionDesc for a given path
> 
>
> Key: HIVE-1510
> URL: https://issues.apache.org/jira/browse/HIVE-1510
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1510.1.patch, hive-1510.3.patch
>
>
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> drop table combine_3_srcpart_seq_rc;
> create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
> (ds string, hr string) stored as sequencefile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="00") select * from src;
> alter table combine_3_srcpart_seq_rc set fileformat rcfile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="001") select * from src;
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00");
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001");
> select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key;
> drop table combine_3_srcpart_seq_rc;
> will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

2010-08-18 Thread Pierre Huyn (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1549:
--

Attachment: HIVE-1549.2.patch

Fixed the 2 issues from Mayank's review.

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> --
>
> Key: HIVE-1549
> URL: https://issues.apache.org/jira/browse/HIVE-1549
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1549.1.patch, HIVE-1549.2.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation 
> between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900048#action_12900048
 ] 

Joydeep Sen Sarma commented on HIVE-1523:
-

i am running through the above qfiles and see what executes successfully on 
minimr (because many dont).

one concern is the length of the tests. i think we need to divide our tests 
into a short and long regression. otherwise development cycle is severely 
impacted if everything has to be tested on every iteration.

> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1560) binaryoutputformat.q failure in minimr mode

2010-08-18 Thread Joydeep Sen Sarma (JIRA)

binaryoutputformat.q failure in minimr mode
---

 Key: HIVE-1560
 URL: https://issues.apache.org/jira/browse/HIVE-1560
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Joydeep Sen Sarma


this is a followup to HIVE-1523.

ant -Dclustermode=miniMR -Dtestcase=TestCliDriver 
-Dqfile=binary_output_format.q test

fails in a significant manner. all the rows are flattened out into one row:

ntimeException -I at org -I at sun -I at java -I at junit -I Caused by: -I 
[.][.][.] [0-9]* more 
/data/users/jssarma/hive_testing/build/ql/test/logs/clientposit\
ive/binary_output_format.q.out 
/data/users/jssarma/hive_testing/ql/src/test/results/clientpositive/binary_output_format.q.out
[junit] 313c313,812
[junit] < 238 val_23886 val_86311 val_31127 val_27165 val_165409
...
[junit] ---
[junit] > 238 val_238
[junit] > 86  val_86
[junit] > 311 val_311
[junit] > 27  val_27
[junit] > 165 val_165
 ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1555) JDBC Storage Handler

2010-08-18 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900039#action_12900039
 ] 

Edward Capriolo commented on HIVE-1555:
---

I wonder if this could end up being a very effective way to query shared data 
stores. 

I think I saw something like this in futurama.. Dont worry about querying 
blank, let me worry about querying blank.
 
http://www.google.com/url?sa=t&source=web&cd=2&ved=0CBcQFjAB&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DB5cAwTEEGNE&ei=Qk9sTLAThIqXB__DzDw&usg=AFQjCNH_TOUS1cl6t0gZXefRURw0a_feZg

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.5.0
>Reporter: Bob Robertson
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

2010-08-18 Thread Mayank Lahiri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900029#action_12900029
 ] 

Mayank Lahiri commented on HIVE-1549:
-

Nice job Pierre! Just a couple of very trivial points:

-- UDAF file, line #116 and line #123, could you amend the error message to 
indicate that only numeric types are accepted (string is also included as of 
now).

-- I don't think you need the private boolean warned, line #273

Otherwise, it looks good and the numbers work out.
 

Incidentally, for the future, if your UDAF only stores a small number of values 
as a partial aggregation, you might just want to consider serializing the 
values as a list of doubles instead of a struct in terminatePartial() and 
merge(). It'll probably save you some time and reduce the amount of code in 
those parts. 

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> --
>
> Key: HIVE-1549
> URL: https://issues.apache.org/jira/browse/HIVE-1549
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1549.1.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation 
> between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900018#action_12900018
 ] 

John Sichi edited comment on HIVE-1523 at 8/18/10 4:30 PM:
---

+1.  Last time I talked to Ning about this, my take was that we should be able 
to re-run any subset of tests in either mode (without needing test codegen for 
it), but for now we can just get things working again this way.

Some candidates for existing tests to add in to the minimr suite:

jsichi-mac:clientpositive jsichi$ grep reducer *.q
bucket1.q:set hive.exec.reducers.max = 200;
bucket2.q:set hive.exec.reducers.max = 1;
bucket3.q:set hive.exec.reducers.max = 1;
bucket4.q:set hive.exec.reducers.max = 1;
bucketmapjoin6.q:set hive.exec.reducers.max=1;
disable_merge_for_bucketing.q:set hive.exec.reducers.max = 1;
reduce_deduplicate.q:set hive.exec.reducers.max = 1;
sample10.q:set hive.exec.reducers.max=4;
smb_mapjoin_6.q:set hive.exec.reducers.max = 1;
smb_mapjoin_7.q:set hive.exec.reducers.max = 1;
smb_mapjoin_8.q:set hive.exec.reducers.max = 1;
smb_mapjoin_8.q:set hive.exec.reducers.max = 1;
udaf_percentile_approx.q:set hive.exec.reducers.max=4


  was (Author: jvs):
+1.  Last time I talked to Ning about this, my take was that we should be 
able to re-run any subset of tests in either mode (without needing test codegen 
for it), but for now we can just get things working again this way.

Some candidates for existing tests to adding to the minimr suite:

jsichi-mac:clientpositive jsichi$ grep reducer *.q
bucket1.q:set hive.exec.reducers.max = 200;
bucket2.q:set hive.exec.reducers.max = 1;
bucket3.q:set hive.exec.reducers.max = 1;
bucket4.q:set hive.exec.reducers.max = 1;
bucketmapjoin6.q:set hive.exec.reducers.max=1;
disable_merge_for_bucketing.q:set hive.exec.reducers.max = 1;
reduce_deduplicate.q:set hive.exec.reducers.max = 1;
sample10.q:set hive.exec.reducers.max=4;
smb_mapjoin_6.q:set hive.exec.reducers.max = 1;
smb_mapjoin_7.q:set hive.exec.reducers.max = 1;
smb_mapjoin_8.q:set hive.exec.reducers.max = 1;
smb_mapjoin_8.q:set hive.exec.reducers.max = 1;
udaf_percentile_approx.q:set hive.exec.reducers.max=4

  
> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1523) ql tests no longer work in miniMR mode

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900018#action_12900018
 ] 

John Sichi commented on HIVE-1523:
--

+1.  Last time I talked to Ning about this, my take was that we should be able 
to re-run any subset of tests in either mode (without needing test codegen for 
it), but for now we can just get things working again this way.

Some candidates for existing tests to adding to the minimr suite:

jsichi-mac:clientpositive jsichi$ grep reducer *.q
bucket1.q:set hive.exec.reducers.max = 200;
bucket2.q:set hive.exec.reducers.max = 1;
bucket3.q:set hive.exec.reducers.max = 1;
bucket4.q:set hive.exec.reducers.max = 1;
bucketmapjoin6.q:set hive.exec.reducers.max=1;
disable_merge_for_bucketing.q:set hive.exec.reducers.max = 1;
reduce_deduplicate.q:set hive.exec.reducers.max = 1;
sample10.q:set hive.exec.reducers.max=4;
smb_mapjoin_6.q:set hive.exec.reducers.max = 1;
smb_mapjoin_7.q:set hive.exec.reducers.max = 1;
smb_mapjoin_8.q:set hive.exec.reducers.max = 1;
smb_mapjoin_8.q:set hive.exec.reducers.max = 1;
udaf_percentile_approx.q:set hive.exec.reducers.max=4


> ql tests no longer work in miniMR mode
> --
>
> Key: HIVE-1523
> URL: https://issues.apache.org/jira/browse/HIVE-1523
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-1523.1.patch
>
>
> as per title. here's the first exception i see:
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(743)) - Got exception: 
> java.io.FileNotFoun\
> dException File file:/build/ql/test/data/warehouse/dest_j1 does not exist.
> 2010-08-09 18:05:11,259 ERROR hive.log 
> (MetaStoreUtils.java:logAndThrowMetaException(746)) - 
> java.io.FileNotFoundException: Fil\
> e file:/build/ql/test/data/warehouse/dest_j1 does not exist.
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>   at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:136)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:677)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

2010-08-18 Thread Mayank Lahiri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900011#action_12900011
 ] 

Mayank Lahiri commented on HIVE-1549:
-

No problem, reviewing it now...

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> --
>
> Key: HIVE-1549
> URL: https://issues.apache.org/jira/browse/HIVE-1549
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1549.1.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation 
> between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900012#action_12900012
 ] 

John Sichi commented on HIVE-1518:
--

Running through tests now.


> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch, HIVE-1518.3.patch, 
> HIVE-1518.4.patch, HIVE-1518.5.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1299#action_1299
 ] 

John Sichi commented on HIVE-1549:
--

Mayank, if you get time, here's another one to take a look at.


> Add ANSI SQL correlation aggregate function CORR(X,Y).
> --
>
> Key: HIVE-1549
> URL: https://issues.apache.org/jira/browse/HIVE-1549
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1549.1.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation 
> between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1555) JDBC Storage Handler

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1291#action_1291
 ] 

John Sichi commented on HIVE-1555:
--

For an implementation possibility, see

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/db/DBInputFormat.html


> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.5.0
>Reporter: Bob Robertson
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1010) Implement INFORMATION_SCHEMA in Hive

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1288#action_1288
 ] 

John Sichi commented on HIVE-1010:
--

See HIVE-1555 for a JDBC storage handler.


> Implement INFORMATION_SCHEMA in Hive
> 
>
> Key: HIVE-1010
> URL: https://issues.apache.org/jira/browse/HIVE-1010
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor, Server Infrastructure
>Reporter: Jeff Hammerbacher
>
> INFORMATION_SCHEMA is part of the SQL92 standard and would be useful to 
> implement using our metastore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1558) introducing the "dual" table

2010-08-18 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1281#action_1281
 ] 

John Sichi commented on HIVE-1558:
--

SQL standard has the VALUES clause, so you can do

INSERT INTO t VALUES(3, 'hi');  -- inserts one row

INSERT INTO t VALUES (3, 'hi'), (4, 'bye');  -- inserts two rows

and

SELECT * FROM (VALUES(3, 'hi'), (4, 'bye'))  -- inline table results

If we add dual, it would also be nice to support at least the standard INSERT 
syntax since that has been around forever.


> introducing the "dual" table
> 
>
> Key: HIVE-1558
> URL: https://issues.apache.org/jira/browse/HIVE-1558
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Ning Zhang
>
> The "dual" table in MySQL and Oracle is very convenient in testing UDFs or 
> constructing rows without reading any other tables. 
> If dual is the only data source we could leverage the local mode execution. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1510) HiveCombineInputFormat should not use prefix matching to find the partitionDesc for a given path

2010-08-18 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1510:
---

Attachment: hive-1510.3.patch

> HiveCombineInputFormat should not use prefix matching to find the 
> partitionDesc for a given path
> 
>
> Key: HIVE-1510
> URL: https://issues.apache.org/jira/browse/HIVE-1510
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: hive-1510.1.patch, hive-1510.3.patch
>
>
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> drop table combine_3_srcpart_seq_rc;
> create table combine_3_srcpart_seq_rc (key int , value string) partitioned by 
> (ds string, hr string) stored as sequencefile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="00") select * from src;
> alter table combine_3_srcpart_seq_rc set fileformat rcfile;
> insert overwrite table combine_3_srcpart_seq_rc partition (ds="2010-08-03", 
> hr="001") select * from src;
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="00");
> desc extended combine_3_srcpart_seq_rc partition(ds="2010-08-03", hr="001");
> select * from combine_3_srcpart_seq_rc where ds="2010-08-03" order by key;
> drop table combine_3_srcpart_seq_rc;
> will fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1555) JDBC Storage Handler

2010-08-18 Thread Tim Perkins (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899975#action_12899975
 ] 

Tim Perkins commented on HIVE-1555:
---

This sounds great. We would love to be able to easily integrate our existing 
RDBMS reporting data directly into Hive.  Getting everything from one frontend 
connected to Hive would make things much simpler.

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.5.0
>Reporter: Bob Robertson
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-18 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Status: Patch Available  (was: In Progress)

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.patch, 
> HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-18 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Attachment: HIVE-1307.2.patch

Uploading a new full patch HIVE-1307.2.patch, containing the following 
additional changes:
 - more log file changes due to svn up to the latest revision (mostly due to 
conflict with another patch on lineage hooks).
 - minor change in FileUtils.java to include '{' and ']' as special characters 
to escape when they are used as partition column values.

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.patch, 
> HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1559) Contrib tests not run as part of 'ant test'

2010-08-18 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899961#action_12899961
 ] 

Ning Zhang commented on HIVE-1559:
--

BTW, TestContribParse[Negative], TestContribNegativeCliDriver should also be 
included in 'ant test'.

> Contrib tests not run as part of 'ant test'
> ---
>
> Key: HIVE-1559
> URL: https://issues.apache.org/jira/browse/HIVE-1559
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>
> Copying from https://issues.apache.org/jira/browse/HIVE-1556
> >> BTW, if I run 'ant test' in hive's root directory, it seems the 
> >> TestContrib* were not tested. Is it expected?
> TestContribCliDriver should be run as part of 'ant test'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1559) Contrib tests not run as part of 'ant test'

2010-08-18 Thread Namit Jain (JIRA)

Contrib tests not run as part of 'ant test'
---

 Key: HIVE-1559
 URL: https://issues.apache.org/jira/browse/HIVE-1559
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Namit Jain


Copying from https://issues.apache.org/jira/browse/HIVE-1556

>> BTW, if I run 'ant test' in hive's root directory, it seems the TestContrib* 
>> were not tested. Is it expected?


TestContribCliDriver should be run as part of 'ant test'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1556) tests broken

2010-08-18 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899953#action_12899953
 ] 

Namit Jain commented on HIVE-1556:
--

>> BTW, if I run 'ant test' in hive's root directory, it seems the TestContrib* 
>> were not tested. Is it expected?


No, this is not expected. Because of this, we missed it in the first place.
I will file a follow-up on this



> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1556) tests broken

2010-08-18 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1556:
-

   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Namit!

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1556) tests broken

2010-08-18 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899943#action_12899943
 ] 

Ning Zhang commented on HIVE-1556:
--

All the TestContrib* test cases passed in Hive root directory.

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1556) tests broken

2010-08-18 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899941#action_12899941
 ] 

Ning Zhang commented on HIVE-1556:
--

I cleaned up and ran it again under contrib/ and there are compilation errors 
(QTestUtil was not found). So I'm not sure if it is a good way to run it inside 
contrib, so I'm running it in hive's root directory. 

BTW, if I run 'ant test' in hive's root directory, it seems the TestContrib* 
were not tested. Is it expected?

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

2010-08-18 Thread Pierre Huyn (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1549:
--

  Status: Patch Available  (was: In Progress)
Release Note: This CORR udaf is implemented using a stable one-pass 
algorithm, similar to the one used in the COVAR_POP udaf.

This release is ready for code review.

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> --
>
> Key: HIVE-1549
> URL: https://issues.apache.org/jira/browse/HIVE-1549
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1549.1.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation 
> between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1549) Add ANSI SQL correlation aggregate function CORR(X,Y).

2010-08-18 Thread Pierre Huyn (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Huyn updated HIVE-1549:
--

Attachment: HIVE-1549.1.patch

This CORR  UDAF is implemented using a one-pass stable algorithm, very similar 
to the implementation of the COVAR_POP UPAF. This code release is now ready for 
review.

> Add ANSI SQL correlation aggregate function CORR(X,Y).
> --
>
> Key: HIVE-1549
> URL: https://issues.apache.org/jira/browse/HIVE-1549
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Pierre Huyn
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1549.1.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Aggregate function that computes the Pearson's coefficient of correlation 
> between a set of number pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1558) introducing the "dual" table

2010-08-18 Thread Ning Zhang (JIRA)

introducing the "dual" table


 Key: HIVE-1558
 URL: https://issues.apache.org/jira/browse/HIVE-1558
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Ning Zhang


The "dual" table in MySQL and Oracle is very convenient in testing UDFs or 
constructing rows without reading any other tables. 

If dual is the only data source we could leverage the local mode execution. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1556) tests broken

2010-08-18 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899928#action_12899928
 ] 

Namit Jain commented on HIVE-1556:
--

Both of them should work.

Can you post some of the diffs ?

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-18 Thread Mayank Lahiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1518:


Attachment: HIVE-1518.5.patch

> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch, HIVE-1518.3.patch, 
> HIVE-1518.4.patch, HIVE-1518.5.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1518) context_ngrams() UDAF for estimating top-k contextual n-grams

2010-08-18 Thread Mayank Lahiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1518:


Status: Patch Available  (was: Open)

It was the new hook format. This should fix it.

> context_ngrams() UDAF for estimating top-k contextual n-grams
> -
>
> Key: HIVE-1518
> URL: https://issues.apache.org/jira/browse/HIVE-1518
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.7.0
>
> Attachments: HIVE-1518.1.patch, HIVE-1518.2.patch, HIVE-1518.3.patch, 
> HIVE-1518.4.patch, HIVE-1518.5.patch
>
>
> Create a new context_ngrams() function that generalizes the ngrams() UDAF to 
> allow the user to specify context around n-grams. The analogy is 
> "fill-in-the-blanks", and is best illustrated with an example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null), 300) FROM 
> twitter;
> will estimate the top-300 words that follow the phrase "i love" in a database 
> of tweets. The position of the null(s) specifies where to generate the n-gram 
> from, and can be placed anywhere. For example:
> SELECT context_ngrams(sentences(tweets), array("i", "love", null, "but", 
> "hate", null), 300) FROM twitter;
> will estimate the top-300 word-pairs that fill in the blanks specified by 
> null.
> POSSIBLE USES:
> 1. Pre-computing search lookaheads
> 2. Sentiment analysis for products or entities -- e.g., querying with context 
> = array("twitter", "is", null)
> 3. Navigation path analysis in URL databases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1556) tests broken

2010-08-18 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899921#action_12899921
 ] 

Ning Zhang commented on HIVE-1556:
--

Namit, I ran ant test -Dtestcase=TestContribCliDriver -Dhadoop.version=0.20.0 
in hive's root directory and it succeeded. but if I run ant test in contrib 
subdirectory, there are diffs (related to the hooks). Do you know which is the 
correct way?

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HIVE-1293) Concurreny Model for Hive

2010-08-18 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899918#action_12899918
 ] 

Namit Jain edited comment on HIVE-1293 at 8/18/10 1:17 PM:
---

Agreed on the bug in getLockObjects() - will have a new patch.


Filed a new patch for the followup: 
https://issues.apache.org/jira/browse/HIVE-1557

  was (Author: namit):
Agreed on the bug in getLockObjects() - will have a new patch.


Filed a new patch for the followup: 
https://issues.apache.org/jira/browse/HIVE-1293
  
> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-08-18 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899918#action_12899918
 ] 

Namit Jain commented on HIVE-1293:
--

Agreed on the bug in getLockObjects() - will have a new patch.


Filed a new patch for the followup: 
https://issues.apache.org/jira/browse/HIVE-1293

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive.1293.2.patch, hive.1293.3.patch, 
> hive.1293.4.patch, hive.1293.5.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1557) increase concurrency

2010-08-18 Thread Namit Jain (JIRA)

increase concurrency


 Key: HIVE-1557
 URL: https://issues.apache.org/jira/browse/HIVE-1557
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain


Copying Joy's comment from https://issues.apache.org/jira/browse/HIVE-1293

a little bummed that locks need to be held for entire query execution. that 
could mean a writer blocking readers for hours.
hive's query plans seem to be of two distinct stages:
1. read a bunch of stuff, compute intermediate/final data
2. move final data into output locations

ie. - a single query never reads what it writes (into a final output location). 
even if #1 and #2 are mingled today - they can easily be put in order.

in that sense - we only need to get shared locks for all read entities involved 
in #1 to begin with. once phase #1 is done, we can drop all the read locks and 
get the exclusive locks for all the write entities in #2, perform #2 and quit. 
that way exclusive locks are held for a very short duration. i think this 
scheme is similarly deadlock free (now there are two independent lock 
acquire/release phases - and each of them can lock stuff in lex. order).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1556) tests broken

2010-08-18 Thread Ning Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899914#action_12899914
 ] 

Ning Zhang commented on HIVE-1556:
--

+1 will commit when tests pass. 

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1556) tests broken

2010-08-18 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1556:
-

Attachment: hive.1556.1.patch

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1556) tests broken

2010-08-18 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1556:
-

Status: Patch Available  (was: Open)

> tests broken
> 
>
> Key: HIVE-1556
> URL: https://issues.apache.org/jira/browse/HIVE-1556
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1556.1.patch
>
>
> Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver 
> is broken. Some test results need to be updated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1556) tests broken

2010-08-18 Thread Namit Jain (JIRA)

tests broken


 Key: HIVE-1556
 URL: https://issues.apache.org/jira/browse/HIVE-1556
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Namit Jain
Assignee: Namit Jain


Due to https://issues.apache.org/jira/browse/HIVE-1548, TestContribCliDriver is 
broken. Some test results need to be updated


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1554) Hive Unable to start due to metastore exception

2010-08-18 Thread Soundararajan Velu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soundararajan Velu updated HIVE-1554:
-

Description: 
When I try to restart Hive, sometimes it fails with weird exception around 
metastore.

Following is the error message that it spits out.

2010-08-17 09:17:03,916 ERROR metastore.HiveMetaStore 
(HiveMetaStore.java:(107)) - Unable to initialize the metastore 
:Exception thrown performing schema operation : Add classes to Catalog "", 
Schema "APP"

We are using Derby10.5.3 in server mode  we have connected to derby through 
this URL 
jdbc:derby://{IP}:{PORT}/metastore_db;create=true. 

If I remove the  metastore_db  from derby/bin, it starts and creates a new 
metastore_db in derby/bin.  I suspect metastore_db gets corrupted for some 
reason, but I am able to open the same metastore_db through other clients like 
squirrel, DBexplorer etc...

Here my doubt is metastore_db corrupted or some other problem. If it is 
corrupted how to recover the db.

  was:
Hive is running after restared some time it is not starting because of 
metastore excepiton . The exception is like this

2010-08-17 09:17:03,916 ERROR metastore.HiveMetaStore 
(HiveMetaStore.java:(107)) - Unable to initialize the metastore 
:Exception thrown performing schema operation : Add classes to Catalog "", 
Schema "APP"

We are using Derby10.5.3 in server mode  we have connected to derby through 
this URL 
jdbc:derby://{IP}:{PORT}/metastore_db;create=true. 

If i remove   metastore_db   from the derby/bin location it is starting and 
creating new metastore_db in derby/bin.  I suspect metastore_db is corrupted 
but i am able to open the same metastore_db through the squirrel client.

Here my doubt is metastore_db corrupted or some other problem. If it is 
corrupted how to recover the db.


> Hive Unable to start due to metastore exception
> ---
>
> Key: HIVE-1554
> URL: https://issues.apache.org/jira/browse/HIVE-1554
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: CLI, Metastore, Server Infrastructure
>Affects Versions: 0.5.0
> Environment: Suse Linux v 11, Hadoop v .20.1. , Derby10.5.3
>Reporter: Soundararajan Velu
>
> When I try to restart Hive, sometimes it fails with weird exception around 
> metastore.
> Following is the error message that it spits out.
> 2010-08-17 09:17:03,916 ERROR metastore.HiveMetaStore 
> (HiveMetaStore.java:(107)) - Unable to initialize the metastore 
> :Exception thrown performing schema operation : Add classes to Catalog "", 
> Schema "APP"
> We are using Derby10.5.3 in server mode  we have connected to derby through 
> this URL 
> jdbc:derby://{IP}:{PORT}/metastore_db;create=true. 
> If I remove the  metastore_db  from derby/bin, it starts and creates a new 
> metastore_db in derby/bin.  I suspect metastore_db gets corrupted for some 
> reason, but I am able to open the same metastore_db through other clients 
> like squirrel, DBexplorer etc...
> Here my doubt is metastore_db corrupted or some other problem. If it is 
> corrupted how to recover the db.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1555) JDBC Storage Handler

2010-08-18 Thread Bob Robertson (JIRA)

JDBC Storage Handler


 Key: HIVE-1555
 URL: https://issues.apache.org/jira/browse/HIVE-1555
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Bob Robertson


With the Cassandra and HBase Storage Handlers I thought it would make sense to 
include a generic JDBC RDBMS Storage Handler so that you could import a 
standard DB table into Hive. Many people must want to perform HiveQL joins, etc 
against tables in other systems etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1307) More generic and efficient merge method

2010-08-18 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1307:
-

Status: In Progress  (was: Patch Available)

There are additional log changes and a minor code change after hadoop 0.20 
tests. I'll upload a new patch once 0.17 finishs. 

> More generic and efficient merge method
> ---
>
> Key: HIVE-1307
> URL: https://issues.apache.org/jira/browse/HIVE-1307
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1307.0.patch, HIVE-1307.patch, 
> HIVE-1307_java_only.patch
>
>
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is 
> create to read the input files and output to one reducer for merging. This MR 
> job is created at compile time and one MR job for one partition. In the case 
> of dynamic partition case, multiple partitions could be created at execution 
> time and generating merging MR job at compile time is impossible. 
> We should generalize the merge framework to allow multiple partitions and 
> most of the time a map-only job should be sufficient if we use 
> CombineHiveInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1554) Hive Unable to start due to metastore exception

2010-08-18 Thread Soundararajan Velu (JIRA)

Hive Unable to start due to metastore exception
---

 Key: HIVE-1554
 URL: https://issues.apache.org/jira/browse/HIVE-1554
 Project: Hadoop Hive
  Issue Type: Bug
  Components: CLI, Metastore, Server Infrastructure
Affects Versions: 0.5.0
 Environment: Suse Linux v 11, Hadoop v .20.1. , Derby10.5.3
Reporter: Soundararajan Velu


Hive is running after restared some time it is not starting because of 
metastore excepiton . The exception is like this

2010-08-17 09:17:03,916 ERROR metastore.HiveMetaStore 
(HiveMetaStore.java:(107)) - Unable to initialize the metastore 
:Exception thrown performing schema operation : Add classes to Catalog "", 
Schema "APP"

We are using Derby10.5.3 in server mode  we have connected to derby through 
this URL 
jdbc:derby://{IP}:{PORT}/metastore_db;create=true. 

If i remove   metastore_db   from the derby/bin location it is starting and 
creating new metastore_db in derby/bin.  I suspect metastore_db is corrupted 
but i am able to open the same metastore_db through the squirrel client.

Here my doubt is metastore_db corrupted or some other problem. If it is 
corrupted how to recover the db.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1553) NPE when using complex string UDF

2010-08-18 Thread Wojciech Langiewicz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899855#action_12899855
 ] 

Wojciech Langiewicz commented on HIVE-1553:
---

It also happens on columns that are not allowed to be NULL (and are not), so 
https://issues.apache.org/jira/browse/HIVE-1011 probably won't fix this.

> NPE when using complex string UDF
> -
>
> Key: HIVE-1553
> URL: https://issues.apache.org/jira/browse/HIVE-1553
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 0.5.0
> Environment: CDH3B2 version on debian
>Reporter: Wojciech Langiewicz
>
> When executing this query: {code}select explode(split(city, "")) as char from 
> users;{code} I get NPE: {code}java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:43)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:347)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:170){code}
> But in case of this query:{code}select explode(split(city, "")) as char from 
> users where id = 234234;{code} NPE does not occur, but in case of this query: 
> {code}select explode(split(city, "")) as char from users where id > 0;{code}  
> Some mappers succed, but most of them fails, so whole task fails.
> city is a string column and maximum users.id is about 30M.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1553) NPE when using complex string UDF

2010-08-18 Thread Wojciech Langiewicz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wojciech Langiewicz updated HIVE-1553:
--

Description: 
When executing this query: {code}select explode(split(city, "")) as char from 
users;{code} I get NPE: {code}java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70)
at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:43)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:347)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170){code}
But in case of this query:{code}select explode(split(city, "")) as char from 
users where id = 234234;{code} NPE does not occur, but in case of this query: 
{code}select explode(split(city, "")) as char from users where id > 0;{code}  
Some mappers succed, but most of them fails, so whole task fails.
city is a string column and maximum users.id is about 30M.

  was:
When executing this query: {code}select explode(split(city, "")) as char from 
users;{code} I get NPE: {code}java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70)
at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:43)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:347)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170){code}
But in case of this query:{code}select explode(split(city, "")) as char from 
users where id = 234234;{code} NPE does not occur.


> NPE when using complex string UDF
> -
>
> Key: HIVE-1553
> URL: https://issues.apache.org/jira/browse/HIVE-1553
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 0.5.0
> Environment: CDH3B2 version on debian
>Reporter: Wojciech Langiewicz
>
> When executing this query: {code}select explode(split(city, "")) as char from 
> users;{code} I get NPE: {code}java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:43)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
>   at 
> org

[jira] Created: (HIVE-1553) NPE when using complex string UDF

2010-08-18 Thread Wojciech Langiewicz (JIRA)

NPE when using complex string UDF
-

 Key: HIVE-1553
 URL: https://issues.apache.org/jira/browse/HIVE-1553
 Project: Hadoop Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.5.0
 Environment: CDH3B2 version on debian
Reporter: Wojciech Langiewicz


When executing this query: {code}select explode(split(city, "")) as char from 
users;{code} I get NPE: {code}java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70)
at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:43)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:347)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170){code}
But in case of this query:{code}select explode(split(city, "")) as char from 
users where id = 234234;{code} NPE does not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1552) Nulls are not handled in Sort Merge MapJoin

2010-08-18 Thread Amareshwari Sriramadasu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899811#action_12899811
 ] 

Amareshwari Sriramadasu commented on HIVE-1552:
---

Are NULL values allowed for a sorted column?
I think that the answer is Yes, because insert/load does not complain about 
null values. 

> Nulls are not handled in Sort Merge MapJoin
> ---
>
> Key: HIVE-1552
> URL: https://issues.apache.org/jira/browse/HIVE-1552
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
>
> If SMBMAPJoinOperator finds null keys in Join it fails with 
> NullPointerException :
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.io.IntWritable.compareTo(IntWritable.java:60)
>   at 
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:115)
>   at 
> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.compareKeys(SMBMapJoinOperator.java:389)
>   at 
> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processKey(SMBMapJoinOperator.java:438)
>   at 
> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:205)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:458)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:698)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:458)
>   at 
> org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.fetchOneRow(SMBMapJoinOperator.java:479)
>   ... 17 more
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1544) Filtering out NULL-keyed rows in ReduceSinkOperator when no outer join involved

2010-08-18 Thread Amareshwari Sriramadasu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899805#action_12899805
 ] 

Amareshwari Sriramadasu commented on HIVE-1544:
---

Also,see Namit's 
[comment|https://issues.apache.org/jira/browse/HIVE-741?focusedCommentId=12899177&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12899177]
 on HIVE-741

> Filtering out NULL-keyed rows in ReduceSinkOperator when no outer join 
> involved
> ---
>
> Key: HIVE-1544
> URL: https://issues.apache.org/jira/browse/HIVE-1544
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>
> As discussed in HIVE-741, if a plan indicates that a non-outer join is the 
> first operator in the reducer, the ReduceSinkOperator should filter out (not 
> sending) rows with NULL as keys since they will not generate any results 
> anyways. This should save both bandwidth and processing power. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1552) Nulls are not handled in Sort Merge MapJoin

2010-08-18 Thread Amareshwari Sriramadasu (JIRA)

Nulls are not handled in Sort Merge MapJoin
---

 Key: HIVE-1552
 URL: https://issues.apache.org/jira/browse/HIVE-1552
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu


If SMBMAPJoinOperator finds null keys in Join it fails with 
NullPointerException :
{noformat}
Caused by: java.lang.NullPointerException
at org.apache.hadoop.io.IntWritable.compareTo(IntWritable.java:60)
at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:115)
at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.compareKeys(SMBMapJoinOperator.java:389)
at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processKey(SMBMapJoinOperator.java:438)
at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:205)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:458)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:698)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:458)
at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.fetchOneRow(SMBMapJoinOperator.java:479)
... 17 more
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-741) NULL is not handled correctly in join

2010-08-18 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-741:
-

Status: Patch Available  (was: Open)

Submitting patch for the review of patch-741-1.txt 

> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-741-1.txt, patch-741.txt, smbjoin_nulls.q.txt
>
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-741) NULL is not handled correctly in join

2010-08-18 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-741:
-

Attachment: patch-741-1.txt
smbjoin_nulls.q.txt

Attaching patch that fixes the bugs in earlier patch, that Ning has found. Also 
adds more testcases.

bq. Can you also add one or few tests for sort merge join? 
Attached file smbjoin_nulls.q.txt has tests for sort merge join. But it fails 
with NPE as mentioned as earlier. I tried to fix the NPE, but could not come up 
with a fix. Shall I do it on followup jira?

bq. For inner, left and right outer joins, a simpler fix would be to add a 
filter on top.
I think this can be done as part of HIVE-1544 as an improvement.

bq. @Amareshwari, sorry the syntax was wrong for the 3 table joins. 
Ning, Hive was not complaining about the syntax. So, included this also in the 
testcase. The results are fine with the latest patch.


> NULL is not handled correctly in join
> -
>
> Key: HIVE-741
> URL: https://issues.apache.org/jira/browse/HIVE-741
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-741-1.txt, patch-741.txt, smbjoin_nulls.q.txt
>
>
> With the following data in table input4_cb:
> KeyValue
> --   
> NULL 325
> 18  NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL32518   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

94 matches

Mail list logo