[jira] [Created] (HIVE-13315) Option to reuse existing restored HBase snapshots

2016-03-18 Thread Liyin Tang (JIRA)
Liyin Tang created HIVE-13315:
-

 Summary: Option to reuse existing restored HBase snapshots
 Key: HIVE-13315
 URL: https://issues.apache.org/jira/browse/HIVE-13315
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Liyin Tang
Assignee: Sushanth Sowmyan


HiveHBaseTableSnapshotInputFormat needs to restore HBase snapshot for each 
query.  It will be great to have an option in the table properties to specify 
an existing restored snapshot. And if such property is set, the job can skip 
the restoring stage to reduce query time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2095) auto convert map join bug

2011-04-07 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017208#comment-13017208
 ] 

Liyin Tang commented on HIVE-2095:
--

it looks good to me. Thanks Yongqiang

> auto convert map join bug
> -
>
> Key: HIVE-2095
> URL: https://issues.apache.org/jira/browse/HIVE-2095
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2095.1.patch
>
>
> 1) 
> when considering to choose one table as the big table candidate for a map 
> join, if at compile time, hive can find out that the total known size of all 
> other tables excluding the big table in consideration is bigger than a 
> configured value, this big table candidate is a bad one, and should not put 
> into plan. Otherwise, at runtime to filter this out may cause more time.
> 2)
> added a null check for back up tasks. Otherwise will see NullPointerException
> 3)
> CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
> it will make wrong decision.
> 4)
> changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
> aliasToSize (alias's input size that is known at compile time, by 
> inputSummary), and intermediate dir path.
> So the logic is, go over all the pathToAliases, and for each path, if it is 
> from intermediate dir path, add this path's size to all aliases. And finally 
> based on the size information and others like aliasToTask to choose the big 
> table. 
> 5)
> Conditional task's children contains wrong options, which may cause join fail 
> or incorrect results. Basically when getting all possible children for the 
> conditional task, should use a whitelist of big tables. Only tables in this 
> while list can be considered as a big table.
> Here is the logic:
> +   * Get a list of big table candidates. Only the tables in the returned set 
> can
> +   * be used as big table in the join operation.
> +   * 
> +   * The logic here is to scan the join condition array from left to right. 
> If
> +   * see a inner join and the bigTableCandidates is empty, add both side of 
> this
> +   * inner join to big table candidates. If see a left outer join, and the
> +   * bigTableCandidates is empty, add the left side to it, and if the
> +   * bigTableCandidates is not empty, do nothing (which means the
> +   * bigTableCandidates is from left side). If see a right outer join, clear 
> the
> +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
> means
> +   * the right side of a right outer join always win. If see a full outer 
> join,
> +   * return null immediately (no one can be the big table, can not do a
> +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2095) auto convert map join bug

2011-04-07 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016871#comment-13016871
 ] 

Liyin Tang commented on HIVE-2095:
--

I will take a look

> auto convert map join bug
> -
>
> Key: HIVE-2095
> URL: https://issues.apache.org/jira/browse/HIVE-2095
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-2095.1.patch
>
>
> 1) 
> when considering to choose one table as the big table candidate for a map 
> join, if at compile time, hive can find out that the total known size of all 
> other tables excluding the big table in consideration is bigger than a 
> configured value, this big table candidate is a bad one, and should not put 
> into plan. Otherwise, at runtime to filter this out may cause more time.
> 2)
> added a null check for back up tasks. Otherwise will see NullPointerException
> 3)
> CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise 
> it will make wrong decision.
> 4)
> changes made to the ConditionalResolverCommonJoin: added pathToAliases, 
> aliasToSize (alias's input size that is known at compile time, by 
> inputSummary), and intermediate dir path.
> So the logic is, go over all the pathToAliases, and for each path, if it is 
> from intermediate dir path, add this path's size to all aliases. And finally 
> based on the size information and others like aliasToTask to choose the big 
> table. 
> 5)
> Conditional task's children contains wrong options, which may cause join fail 
> or incorrect results. Basically when getting all possible children for the 
> conditional task, should use a whitelist of big tables. Only tables in this 
> while list can be considered as a big table.
> Here is the logic:
> +   * Get a list of big table candidates. Only the tables in the returned set 
> can
> +   * be used as big table in the join operation.
> +   * 
> +   * The logic here is to scan the join condition array from left to right. 
> If
> +   * see a inner join and the bigTableCandidates is empty, add both side of 
> this
> +   * inner join to big table candidates. If see a left outer join, and the
> +   * bigTableCandidates is empty, add the left side to it, and if the
> +   * bigTableCandidates is not empty, do nothing (which means the
> +   * bigTableCandidates is from left side). If see a right outer join, clear 
> the
> +   * bigTableCandidates, and add right side to the bigTableCandidates, it 
> means
> +   * the right side of a right outer join always win. If see a full outer 
> join,
> +   * return null immediately (no one can be the big table, can not do a
> +   * mapjoin).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1966) mapjoin operator should not load hashtable for each new inputfile if the hashtable to be loaded is already there.

2011-03-23 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010510#comment-13010510
 ] 

Liyin Tang commented on HIVE-1966:
--

+1

> mapjoin operator should not load hashtable for each new inputfile if the 
> hashtable to be loaded is already there.
> -
>
> Key: HIVE-1966
> URL: https://issues.apache.org/jira/browse/HIVE-1966
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: Liyin Tang
> Attachments: HIVE-1966.1.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1965) Auto convert mapjoin should not throw exception if the top operator is union operator.

2011-03-23 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010508#comment-13010508
 ] 

Liyin Tang commented on HIVE-1965:
--

+1

> Auto convert mapjoin should not throw exception if the top operator is union 
> operator.
> --
>
> Key: HIVE-1965
> URL: https://issues.apache.org/jira/browse/HIVE-1965
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: Liyin Tang
> Attachments: HIVE-1965.1.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1845) Some attributes in the Eclipse template file is deprecated

2010-12-09 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1845:
-

Status: Patch Available  (was: Open)

> Some attributes in the Eclipse template file is deprecated  
> 
>
> Key: HIVE-1845
> URL: https://issues.apache.org/jira/browse/HIVE-1845
> Project: Hive
>  Issue Type: Bug
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1845-1.patch
>
>
> In the eclipse template file, it will reference this jar file, which is 
> deprecated.
> /@PROJECT@/build/metastore/hive-mod...@hive_version@.jar
> So the correct one should be:
> /@PROJECT@/build/metastore/hive-metasto...@hive_version@.jar
> Just update all the eclipse template files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1845) Some attributes in the Eclipse template file is deprecated

2010-12-09 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1845:
-

Attachment: hive-1845-1.patch

Update all the eclipse template files.

> Some attributes in the Eclipse template file is deprecated  
> 
>
> Key: HIVE-1845
> URL: https://issues.apache.org/jira/browse/HIVE-1845
> Project: Hive
>  Issue Type: Bug
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1845-1.patch
>
>
> In the eclipse template file, it will reference this jar file, which is 
> deprecated.
> /@PROJECT@/build/metastore/hive-mod...@hive_version@.jar
> So the correct one should be:
> /@PROJECT@/build/metastore/hive-metasto...@hive_version@.jar
> Just update all the eclipse template files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1845) Some attributes in the Eclipse template file is deprecated

2010-12-09 Thread Liyin Tang (JIRA)
Some attributes in the Eclipse template file is deprecated  


 Key: HIVE-1845
 URL: https://issues.apache.org/jira/browse/HIVE-1845
 Project: Hive
  Issue Type: Bug
Reporter: Liyin Tang
Assignee: Liyin Tang


In the eclipse template file, it will reference this jar file, which is 
deprecated.
/@PROJECT@/build/metastore/hive-mod...@hive_version@.jar

So the correct one should be:
/@PROJECT@/build/metastore/hive-metasto...@hive_version@.jar

Just update all the eclipse template files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1842) Add the local flag to all the map red tasks, if the query is running locally.

2010-12-08 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1842:
-

Attachment: hive-1842-1.patch

Add the local flag to all the map red tasks, if the query is running locally. 

> Add the local flag to all the map red tasks, if the query is running locally.
> -
>
> Key: HIVE-1842
> URL: https://issues.apache.org/jira/browse/HIVE-1842
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Affects Versions: 0.4.1
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1842-1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1842) Add the local flag to all the map red tasks, if the query is running locally.

2010-12-08 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1842:
-

Status: Patch Available  (was: Open)

Add the local flag to all the map red tasks, if the query is running locally. 

> Add the local flag to all the map red tasks, if the query is running locally.
> -
>
> Key: HIVE-1842
> URL: https://issues.apache.org/jira/browse/HIVE-1842
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Affects Versions: 0.4.1
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1842-1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1842) Add the local flag to all the map red tasks, if the query is running locally.

2010-12-08 Thread Liyin Tang (JIRA)
Add the local flag to all the map red tasks, if the query is running locally.
-

 Key: HIVE-1842
 URL: https://issues.apache.org/jira/browse/HIVE-1842
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: 0.4.1
Reporter: Liyin Tang
Assignee: Liyin Tang




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1830) mappers in group followed by joins may die OOM

2010-12-08 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1830:
-

Attachment: hive-1830-5.patch

1) Remove the debug statements
2) Add the memory threshold to group by desc.

> mappers in group followed by joins may die OOM
> --
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, 
> hive-1830-4.patch, hive-1830-5.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1830) mappers in group followed by joins may die OOM

2010-12-07 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1830:
-

Attachment: hive-1830-4.patch

1) Add more descriptions in the config file
2) Set the memory usage of hashtable sink op and group by op into their desc. 
The memory usage is deterministic after compiling stage.


> mappers in group followed by joins may die OOM
> --
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, 
> hive-1830-4.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1830) mappers in group followed by joins may die OOM

2010-12-06 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1830:
-

Attachment: hive-1830-3.patch

Carefully measure the memory usage of map side group by.
Flush frequently, if the left memory is less than a threshold.

> mappers in group followed by joins may die OOM
> --
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1830) mappers in group followed by joins may die OOM

2010-12-06 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1830:
-

Attachment: hive-1830-2.patch

Add a new test: auto_join26.q 

> mappers in group followed by joins may die OOM
> --
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1830-1.patch, hive-1830-2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1830) mappers in group followed by joins may die OOM

2010-12-06 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1830:
-

Attachment: hive-1830-1.patch

> mappers in group followed by joins may die OOM
> --
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1830-1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1827) Audit how many queries will be run in the local mode

2010-12-06 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1827:
-

Attachment: hive-1827-1.patch

Add a new attribute isLocalMode in Task.


> Audit how many queries will be run in the local mode
> 
>
> Key: HIVE-1827
> URL: https://issues.apache.org/jira/browse/HIVE-1827
> Project: Hive
>  Issue Type: New Feature
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1827-1.patch
>
>
> Hive can run query in local mode. It would be nice to track and audit how 
> many queries will be run in the local mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1832) Dynamically allocate and measure memory usage when a map join op followed by a group by op

2010-12-06 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12967268#action_12967268
 ] 

Liyin Tang commented on HIVE-1832:
--

Duplicate of Hive-1830

> Dynamically allocate and measure memory usage when a map join op followed by 
> a group by op
> --
>
> Key: HIVE-1832
> URL: https://issues.apache.org/jira/browse/HIVE-1832
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Liyin Tang
>Assignee: Liyin Tang
>
> Right now, if a map join operator followed by a map-side group by, this map 
> reduce task will be memory intensive task.
> Memory usage should be carefully measured and bounded in order not to run out 
> of memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1832) Dynamically allocate and measure memory usage when a map join op followed by a group by op

2010-12-06 Thread Liyin Tang (JIRA)
Dynamically allocate and measure memory usage when a map join op followed by a 
group by op
--

 Key: HIVE-1832
 URL: https://issues.apache.org/jira/browse/HIVE-1832
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Liyin Tang
Assignee: Liyin Tang


Right now, if a map join operator followed by a map-side group by, this map 
reduce task will be memory intensive task.
Memory usage should be carefully measured and bounded in order not to run out 
of memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1827) Audit how many queries will be run in the local mode

2010-12-03 Thread Liyin Tang (JIRA)
Audit how many queries will be run in the local mode


 Key: HIVE-1827
 URL: https://issues.apache.org/jira/browse/HIVE-1827
 Project: Hive
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang


Hive can run query in local mode. It would be nice to track and audit how many 
queries will be run in the local mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1700) Optimiza JDBM to make mapjoin faster

2010-12-01 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang reassigned HIVE-1700:


Assignee: Liyin Tang

> Optimiza JDBM to make mapjoin faster
> 
>
> Key: HIVE-1700
> URL: https://issues.apache.org/jira/browse/HIVE-1700
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: Liyin Tang
>
> copied from email:
> From: Joydeep Sen Sarma
> Sent: Tuesday, October 12, 2010 11:11 AM
> To: Yongqiang He; Liyin Tang; Namit Jain
> Subject: RE: Optimize jdbm
> seems like we should move all deserialization to hive land. jdbm should just 
> work on byte arrays for both keys and values. (since the output of the 
> serializer used by hive is byte comparable - that seems to suffice)
> 
> From: Yongqiang He
> Sent: Tuesday, October 12, 2010 10:22 AM
> To: Liyin Tang; Namit Jain
> Cc: Joydeep Sen Sarma
> Subject: Optimize jdbm
>   1.  Htree.get() cost 70% total time.  It could help a lot if there is bloom 
> filter here to avoid unneeded get() if we know for sure the given key is not 
> in JDBM. (we can generate the bloom filter when doing the jdbm sink, and read 
> into memory when doing read. )
>   2.  HTree.get() will deserialize both key and value until find a matched 
> key. We can only de-serialize the key, and de-serialize the value until  the 
> key match.
> Any others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1700) Optimiza JDBM to make mapjoin faster

2010-12-01 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HIVE-1700.
--

  Resolution: Won't Fix
Release Note: 
The JDBM component has been removed from Hive.
No need to optimize this any more.

The JDBM component has been removed from Hive.
No need to optimize this any more.

> Optimiza JDBM to make mapjoin faster
> 
>
> Key: HIVE-1700
> URL: https://issues.apache.org/jira/browse/HIVE-1700
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: Liyin Tang
>
> copied from email:
> From: Joydeep Sen Sarma
> Sent: Tuesday, October 12, 2010 11:11 AM
> To: Yongqiang He; Liyin Tang; Namit Jain
> Subject: RE: Optimize jdbm
> seems like we should move all deserialization to hive land. jdbm should just 
> work on byte arrays for both keys and values. (since the output of the 
> serializer used by hive is byte comparable - that seems to suffice)
> 
> From: Yongqiang He
> Sent: Tuesday, October 12, 2010 10:22 AM
> To: Liyin Tang; Namit Jain
> Cc: Joydeep Sen Sarma
> Subject: Optimize jdbm
>   1.  Htree.get() cost 70% total time.  It could help a lot if there is bloom 
> filter here to avoid unneeded get() if we know for sure the given key is not 
> in JDBM. (we can generate the bloom filter when doing the jdbm sink, and read 
> into memory when doing read. )
>   2.  HTree.get() will deserialize both key and value until find a matched 
> key. We can only de-serialize the key, and de-serialize the value until  the 
> key match.
> Any others?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1811) Show the time the local task takes

2010-11-24 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1811:
-

Attachment: hive-1811-1.patch

The original showTime code has potential bug if the local task takes more than 
60 sec.
This patch fixes this bug.

> Show the time the local task takes
> --
>
> Key: HIVE-1811
> URL: https://issues.apache.org/jira/browse/HIVE-1811
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1811-1.patch
>
>
> After the local tasks finished, show the how much  time it takes

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1804) Mapjoin will fail if there are no files associating with the join tables

2010-11-24 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1804:
-

Attachment: hive-1804-3.patch

Since there are some other patches committed recently, I regenerate the patch 
after svn update.
Please review.

> Mapjoin will fail if there are no files associating with the join tables
> 
>
> Key: HIVE-1804
> URL: https://issues.apache.org/jira/browse/HIVE-1804
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1804-1.patch, hive-1804-2.patch, hive-1804-3.patch
>
>
> If there are some empty tables without any file associated, the map join will 
> fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1811) Show the time the local task takes

2010-11-24 Thread Liyin Tang (JIRA)
Show the time the local task takes
--

 Key: HIVE-1811
 URL: https://issues.apache.org/jira/browse/HIVE-1811
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0


After the local tasks finished, show the how much  time it takes

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1792) track the joins which are being converted to map-join automatically

2010-11-24 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1792:
-

Attachment: hive-1792-4.patch

1) Remove unrelated change from this patch
2) Set the backup tag in the common join resolver

by the way, I generated the diff camp.

> track the joins which are being converted to map-join automatically
> ---
>
> Key: HIVE-1792
> URL: https://issues.apache.org/jira/browse/HIVE-1792
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1792-1.patch, hive-1792-2.patch, hive-1792-3.patch, 
> hive-1792-4.patch
>
>
> We should be able to track how many queries (join) got converted to
> map-join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1792) track the joins which are being converted to map-join automatically

2010-11-23 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935234#action_12935234
 ] 

Liyin Tang commented on HIVE-1792:
--

There will be 2 cases to run the common join.
One is when the resolver of the conditional task returns the common join.
Another is when the map join local task fails.

If not reset the tag during the getting the backup task,
how to distinguish these 2 cases?



> track the joins which are being converted to map-join automatically
> ---
>
> Key: HIVE-1792
> URL: https://issues.apache.org/jira/browse/HIVE-1792
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1792-1.patch, hive-1792-2.patch, hive-1792-3.patch
>
>
> We should be able to track how many queries (join) got converted to
> map-join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1792) track the joins which are being converted to map-join automatically

2010-11-23 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1792:
-

Attachment: hive-1792-3.patch

Since Hive-1785 has been committed, I generate the diff again.
So this diff does not include any code in Hive-1785.
Please review.

> track the joins which are being converted to map-join automatically
> ---
>
> Key: HIVE-1792
> URL: https://issues.apache.org/jira/browse/HIVE-1792
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1792-1.patch, hive-1792-2.patch, hive-1792-3.patch
>
>
> We should be able to track how many queries (join) got converted to
> map-join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext

2010-11-23 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935186#action_12935186
 ] 

Liyin Tang commented on HIVE-1785:
--

Thanks John's review and I have created a sub task (Hive-1810) to change the 
xml description.
Please take a look.



> change Pre/Post Query Hooks to take in 1 parameter: HookContext
> ---
>
> Key: HIVE-1785
> URL: https://issues.apache.org/jira/browse/HIVE-1785
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1785_3.patch, hive-1785_4.patch, hive-1785_6.patch, 
> hive_1785_1.patch, hive_1785_2.patch
>
>
> This way, it would be possible to add new parameters to the hooks without 
> changing the existing hooks.
> This will be a incompatible change, and all the hooks need to change to the 
> new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1810) a followup patch for changing the description of hive.exec.pre/post.hooks in conf/hive-default.xml

2010-11-23 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1810:
-

Status: Patch Available  (was: Open)

Patch is available

> a followup patch for changing the description of hive.exec.pre/post.hooks in 
> conf/hive-default.xml
> --
>
> Key: HIVE-1810
> URL: https://issues.apache.org/jira/browse/HIVE-1810
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1810-1.patch
>
>
> a followup patch for changing the description of hive.exec.pre/post.hooks in 
> conf/hive-default.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1810) a followup patch for changing the description of hive.exec.pre/post.hooks in conf/hive-default.xml

2010-11-23 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1810:
-

Attachment: hive-1810-1.patch

change the hive-default.xml.
So new pre/post hook should implements the ExecuteWithHookContext interface.

> a followup patch for changing the description of hive.exec.pre/post.hooks in 
> conf/hive-default.xml
> --
>
> Key: HIVE-1810
> URL: https://issues.apache.org/jira/browse/HIVE-1810
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1810-1.patch
>
>
> a followup patch for changing the description of hive.exec.pre/post.hooks in 
> conf/hive-default.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1810) a followup patch for changing the description of hive.exec.pre/post.hooks in conf/hive-default.xml

2010-11-23 Thread Liyin Tang (JIRA)
a followup patch for changing the description of hive.exec.pre/post.hooks in 
conf/hive-default.xml
--

 Key: HIVE-1810
 URL: https://issues.apache.org/jira/browse/HIVE-1810
 Project: Hive
  Issue Type: Sub-task
Reporter: Liyin Tang
Assignee: Liyin Tang


a followup patch for changing the description of hive.exec.pre/post.hooks in 
conf/hive-default.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1792) track the joins which are being converted to map-join automatically

2010-11-23 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935171#action_12935171
 ] 

Liyin Tang commented on HIVE-1792:
--

Still need this change to tag on all the join tasks

> track the joins which are being converted to map-join automatically
> ---
>
> Key: HIVE-1792
> URL: https://issues.apache.org/jira/browse/HIVE-1792
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1792-1.patch, hive-1792-2.patch
>
>
> We should be able to track how many queries (join) got converted to
> map-join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1792) track the joins which are being converted to map-join automatically

2010-11-23 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1792:
-

Attachment: hive-1792-2.patch

> track the joins which are being converted to map-join automatically
> ---
>
> Key: HIVE-1792
> URL: https://issues.apache.org/jira/browse/HIVE-1792
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1792-1.patch, hive-1792-2.patch
>
>
> We should be able to track how many queries (join) got converted to
> map-join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1808) but in auto_join25.q

2010-11-23 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang reassigned HIVE-1808:


Assignee: Liyin Tang

> but in auto_join25.q
> 
>
> Key: HIVE-1808
> URL: https://issues.apache.org/jira/browse/HIVE-1808
> Project: Hive
>  Issue Type: Bug
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1808-1.patch
>
>
> In this test case, there are 2 SET statements:
> set hive.mapjoin.localtask.max.memory.usage = 0.0001;
> set hive.mapjoin.check.memory.rows = 2;
> But in HiveConf, the names of these 2 conf variable do not match with each 
> other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1808) bug in auto_join25.q

2010-11-23 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1808:
-

Summary: bug in auto_join25.q  (was: but in auto_join25.q)

> bug in auto_join25.q
> 
>
> Key: HIVE-1808
> URL: https://issues.apache.org/jira/browse/HIVE-1808
> Project: Hive
>  Issue Type: Bug
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1808-1.patch
>
>
> In this test case, there are 2 SET statements:
> set hive.mapjoin.localtask.max.memory.usage = 0.0001;
> set hive.mapjoin.check.memory.rows = 2;
> But in HiveConf, the names of these 2 conf variable do not match with each 
> other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1808) but in auto_join25.q

2010-11-23 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1808:
-

Attachment: hive-1808-1.patch

The bug fixed in this patch

> but in auto_join25.q
> 
>
> Key: HIVE-1808
> URL: https://issues.apache.org/jira/browse/HIVE-1808
> Project: Hive
>  Issue Type: Bug
>Reporter: Liyin Tang
> Attachments: hive-1808-1.patch
>
>
> In this test case, there are 2 SET statements:
> set hive.mapjoin.localtask.max.memory.usage = 0.0001;
> set hive.mapjoin.check.memory.rows = 2;
> But in HiveConf, the names of these 2 conf variable do not match with each 
> other.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1792) track the joins which are being converted to map-join automatically

2010-11-23 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1792:
-

Attachment: (was: hive-1792-2.patch)

> track the joins which are being converted to map-join automatically
> ---
>
> Key: HIVE-1792
> URL: https://issues.apache.org/jira/browse/HIVE-1792
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1792-1.patch
>
>
> We should be able to track how many queries (join) got converted to
> map-join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




[jira] Updated: (HIVE-1792) track the joins which are being converted to map-join automatically

2010-11-23 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1792:
-

Attachment: hive-1792-2.patch

The previous patch includes the fix of another jira(Hive-1808)
Now I separate the previous patch into 2 patches.

This patch includes the diff only related to the map join measurement.

> track the joins which are being converted to map-join automatically
> ---
>
> Key: HIVE-1792
> URL: https://issues.apache.org/jira/browse/HIVE-1792
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1792-1.patch, hive-1792-2.patch
>
>
> We should be able to track how many queries (join) got converted to
> map-join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1808) but in auto_join25.q

2010-11-23 Thread Liyin Tang (JIRA)
but in auto_join25.q


 Key: HIVE-1808
 URL: https://issues.apache.org/jira/browse/HIVE-1808
 Project: Hive
  Issue Type: Bug
Reporter: Liyin Tang


In this test case, there are 2 SET statements:
set hive.mapjoin.localtask.max.memory.usage = 0.0001;
set hive.mapjoin.check.memory.rows = 2;

But in HiveConf, the names of these 2 conf variable do not match with each 
other.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1792) track the joins which are being converted to map-join automatically

2010-11-23 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1792:
-

Attachment: hive-1792-1.patch

Add a new hook: MapJoinCounterHook, which will measure how many joins converted 
into common joins and how many map join revert back to common join.

And this new post hook implements the new hook interface with HookContext

Please review.

> track the joins which are being converted to map-join automatically
> ---
>
> Key: HIVE-1792
> URL: https://issues.apache.org/jira/browse/HIVE-1792
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1792-1.patch
>
>
> We should be able to track how many queries (join) got converted to
> map-join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1804) Mapjoin will fail if there are no files associating with the join tables

2010-11-23 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1804:
-

Attachment: hive-1804-2.patch

Remove all the debug print statements.
Please review

> Mapjoin will fail if there are no files associating with the join tables
> 
>
> Key: HIVE-1804
> URL: https://issues.apache.org/jira/browse/HIVE-1804
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1804-1.patch, hive-1804-2.patch
>
>
> If there are some empty tables without any file associated, the map join will 
> fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext

2010-11-22 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1785:
-

Attachment: hive-1785_6.patch

Thanks for the careful review and sorry to submit the wrong patch before.
This patch makes the all changes according to the discussion before and clears 
irrelevant code.
Please review.

> change Pre/Post Query Hooks to take in 1 parameter: HookContext
> ---
>
> Key: HIVE-1785
> URL: https://issues.apache.org/jira/browse/HIVE-1785
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1785_3.patch, hive-1785_4.patch, hive-1785_6.patch, 
> hive_1758_5.patch, hive_1785_1.patch, hive_1785_2.patch
>
>
> This way, it would be possible to add new parameters to the hooks without 
> changing the existing hooks.
> This will be a incompatible change, and all the hooks need to change to the 
> new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1804) Mapjoin will fail if there are no files associating with the join tables

2010-11-22 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1804:
-

Attachment: hive-1804-1.patch

If the partition desc is empty, then create an empty hashtable file for it.

> Mapjoin will fail if there are no files associating with the join tables
> 
>
> Key: HIVE-1804
> URL: https://issues.apache.org/jira/browse/HIVE-1804
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1804-1.patch
>
>
> If there are some empty tables without any file associated, the map join will 
> fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1804) Mapjoin will fail if there are no files associating with the join tables

2010-11-22 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1804:
-

Status: Patch Available  (was: Open)

If the parition desc is empty, just create a empty hash table file

> Mapjoin will fail if there are no files associating with the join tables
> 
>
> Key: HIVE-1804
> URL: https://issues.apache.org/jira/browse/HIVE-1804
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
>
> If there are some empty tables without any file associated, the map join will 
> fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext

2010-11-22 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1785:
-

Status: Patch Available  (was: Open)

> change Pre/Post Query Hooks to take in 1 parameter: HookContext
> ---
>
> Key: HIVE-1785
> URL: https://issues.apache.org/jira/browse/HIVE-1785
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1785_3.patch, hive-1785_4.patch, hive_1758_5.patch, 
> hive_1785_1.patch, hive_1785_2.patch
>
>
> This way, it would be possible to add new parameters to the hooks without 
> changing the existing hooks.
> This will be a incompatible change, and all the hooks need to change to the 
> new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1797) Compressed the hashtable dump file before put into distributed cache

2010-11-22 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1797:
-

Status: Patch Available  (was: Open)

> Compressed the hashtable dump file before put into distributed cache
> 
>
> Key: HIVE-1797
> URL: https://issues.apache.org/jira/browse/HIVE-1797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1797.patch, hive-1797_3.patch
>
>
> Clearly, the size of small table is the performance bottleneck for map join.
> Because the size of the small table will affect the memory usage and dumped 
> hashtable file.
> That means there are 2 boundaries of the map join performance.
> 1)The memory usage for local task and mapred task
> 2)The dumped hashtable file size for distributed cache
> The reason that test case in last email spends most of the execution time on 
> initializing is because it hits the second boundary.
> Since we have already bound the memory usage, one thing we can do is to let 
> the performance never hits the secondary bound before it hits the first 
> boundary.
> Assuming the heap size is 1.6 G and the small table file size is 15M 
> compressed (75M uncompressed),
> local  task can roughly hold that 1.5M unique rows in memory. 
> Roughly the dumped file size will be 150M, which is too large to put into the 
> distributed cache.
>  
> From experiments, we can basically conclude when the dumped file size is 
> smaller than 30M. 
> The distributed cache works well and all the mappers will  be initialized in 
> a short time (less than 30 secs).
> One easy implementation is to compress the hashtable file. 
> I use the gzip to compress the hashtable file and the file size is compressed 
> from 100M to 13M.
> After several tests, all the mappers will be initialized in less than 23 secs.
> But this solution adds some decompression overhead to each mapper.
> Mappers on the same machine will do the duplicated decompression work.
> Maybe in the future, we can let the distributed cache to support this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1804) Mapjoin will fail if there are no files associating with the join tables

2010-11-22 Thread Liyin Tang (JIRA)
Mapjoin will fail if there are no files associating with the join tables


 Key: HIVE-1804
 URL: https://issues.apache.org/jira/browse/HIVE-1804
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0


If there are some empty tables without any file associated, the map join will 
fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext

2010-11-22 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1785:
-

Attachment: hive_1758_5.patch

1) make the old interface be deprecated
2) let the existing Prehook and Posthook implements the new interface.
3) the task tag for each task

> change Pre/Post Query Hooks to take in 1 parameter: HookContext
> ---
>
> Key: HIVE-1785
> URL: https://issues.apache.org/jira/browse/HIVE-1785
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1785_3.patch, hive-1785_4.patch, hive_1758_5.patch, 
> hive_1785_1.patch, hive_1785_2.patch
>
>
> This way, it would be possible to add new parameters to the hooks without 
> changing the existing hooks.
> This will be a incompatible change, and all the hooks need to change to the 
> new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1797) Compressed the hashtable dump file before put into distributed cache

2010-11-20 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1797:
-

Attachment: hive-1797_3.patch

In this patch, all the hashtable dumped files will be compressed and packaged 
as a tar.gz file.
And the put this tar file to distributed cache. The distributed cache will 
decompress the file for the mapper. If multiple mappers are in the same 
machine, only distributed cache will only decompress once. 

> Compressed the hashtable dump file before put into distributed cache
> 
>
> Key: HIVE-1797
> URL: https://issues.apache.org/jira/browse/HIVE-1797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1797.patch, hive-1797_3.patch
>
>
> Clearly, the size of small table is the performance bottleneck for map join.
> Because the size of the small table will affect the memory usage and dumped 
> hashtable file.
> That means there are 2 boundaries of the map join performance.
> 1)The memory usage for local task and mapred task
> 2)The dumped hashtable file size for distributed cache
> The reason that test case in last email spends most of the execution time on 
> initializing is because it hits the second boundary.
> Since we have already bound the memory usage, one thing we can do is to let 
> the performance never hits the secondary bound before it hits the first 
> boundary.
> Assuming the heap size is 1.6 G and the small table file size is 15M 
> compressed (75M uncompressed),
> local  task can roughly hold that 1.5M unique rows in memory. 
> Roughly the dumped file size will be 150M, which is too large to put into the 
> distributed cache.
>  
> From experiments, we can basically conclude when the dumped file size is 
> smaller than 30M. 
> The distributed cache works well and all the mappers will  be initialized in 
> a short time (less than 30 secs).
> One easy implementation is to compress the hashtable file. 
> I use the gzip to compress the hashtable file and the file size is compressed 
> from 100M to 13M.
> After several tests, all the mappers will be initialized in less than 23 secs.
> But this solution adds some decompression overhead to each mapper.
> Mappers on the same machine will do the duplicated decompression work.
> Maybe in the future, we can let the distributed cache to support this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1797) Compressed the hashtable dump file before put into distributed cache

2010-11-20 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1797:
-

Attachment: hive-1797_2.patch

In this patch, all the hashtable dumped files will be compressed and packaged 
as a tar.gz file. 
And the put this tar file to distributed cache. The distributed cache will 
decompress the file for the mapper. If multiple mappers are in the same 
machine, only distributed cache will only decompress once.

Please review.

> Compressed the hashtable dump file before put into distributed cache
> 
>
> Key: HIVE-1797
> URL: https://issues.apache.org/jira/browse/HIVE-1797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1797.patch
>
>
> Clearly, the size of small table is the performance bottleneck for map join.
> Because the size of the small table will affect the memory usage and dumped 
> hashtable file.
> That means there are 2 boundaries of the map join performance.
> 1)The memory usage for local task and mapred task
> 2)The dumped hashtable file size for distributed cache
> The reason that test case in last email spends most of the execution time on 
> initializing is because it hits the second boundary.
> Since we have already bound the memory usage, one thing we can do is to let 
> the performance never hits the secondary bound before it hits the first 
> boundary.
> Assuming the heap size is 1.6 G and the small table file size is 15M 
> compressed (75M uncompressed),
> local  task can roughly hold that 1.5M unique rows in memory. 
> Roughly the dumped file size will be 150M, which is too large to put into the 
> distributed cache.
>  
> From experiments, we can basically conclude when the dumped file size is 
> smaller than 30M. 
> The distributed cache works well and all the mappers will  be initialized in 
> a short time (less than 30 secs).
> One easy implementation is to compress the hashtable file. 
> I use the gzip to compress the hashtable file and the file size is compressed 
> from 100M to 13M.
> After several tests, all the mappers will be initialized in less than 23 secs.
> But this solution adds some decompression overhead to each mapper.
> Mappers on the same machine will do the duplicated decompression work.
> Maybe in the future, we can let the distributed cache to support this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1797) Compressed the hashtable dump file before put into distributed cache

2010-11-20 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1797:
-

Attachment: (was: hive-1797_2.patch)

> Compressed the hashtable dump file before put into distributed cache
> 
>
> Key: HIVE-1797
> URL: https://issues.apache.org/jira/browse/HIVE-1797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1797.patch
>
>
> Clearly, the size of small table is the performance bottleneck for map join.
> Because the size of the small table will affect the memory usage and dumped 
> hashtable file.
> That means there are 2 boundaries of the map join performance.
> 1)The memory usage for local task and mapred task
> 2)The dumped hashtable file size for distributed cache
> The reason that test case in last email spends most of the execution time on 
> initializing is because it hits the second boundary.
> Since we have already bound the memory usage, one thing we can do is to let 
> the performance never hits the secondary bound before it hits the first 
> boundary.
> Assuming the heap size is 1.6 G and the small table file size is 15M 
> compressed (75M uncompressed),
> local  task can roughly hold that 1.5M unique rows in memory. 
> Roughly the dumped file size will be 150M, which is too large to put into the 
> distributed cache.
>  
> From experiments, we can basically conclude when the dumped file size is 
> smaller than 30M. 
> The distributed cache works well and all the mappers will  be initialized in 
> a short time (less than 30 secs).
> One easy implementation is to compress the hashtable file. 
> I use the gzip to compress the hashtable file and the file size is compressed 
> from 100M to 13M.
> After several tests, all the mappers will be initialized in less than 23 secs.
> But this solution adds some decompression overhead to each mapper.
> Mappers on the same machine will do the duplicated decompression work.
> Maybe in the future, we can let the distributed cache to support this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1798) Clear empty files in Hive

2010-11-17 Thread Liyin Tang (JIRA)
Clear empty files in  Hive
--

 Key: HIVE-1798
 URL: https://issues.apache.org/jira/browse/HIVE-1798
 Project: Hive
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang


There are 4 empty files in Hive right now. 
We should delete them from trunk.

D  ql/src/java/org/apache/hadoop/hive/ql/exec/JDBMDummyOperator.java
D  ql/src/java/org/apache/hadoop/hive/ql/exec/JDBMSinkOperator.java
D  ql/src/java/org/apache/hadoop/hive/ql/plan/JDBMSinkDesc.java
D  ql/src/java/org/apache/hadoop/hive/ql/plan/JDBMDummyDesc.java
D  ql/src/java/org/apache/hadoop/hive/ql/util/JoinUtil.java


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1797) Compressed the hashtable dump file before put into distributed cache

2010-11-17 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1797:
-

Attachment: hive-1797.patch

Compress the hashtable dumped file by gzip before adding to distributed cache.



> Compressed the hashtable dump file before put into distributed cache
> 
>
> Key: HIVE-1797
> URL: https://issues.apache.org/jira/browse/HIVE-1797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: hive-1797.patch
>
>
> Clearly, the size of small table is the performance bottleneck for map join.
> Because the size of the small table will affect the memory usage and dumped 
> hashtable file.
> That means there are 2 boundaries of the map join performance.
> 1)The memory usage for local task and mapred task
> 2)The dumped hashtable file size for distributed cache
> The reason that test case in last email spends most of the execution time on 
> initializing is because it hits the second boundary.
> Since we have already bound the memory usage, one thing we can do is to let 
> the performance never hits the secondary bound before it hits the first 
> boundary.
> Assuming the heap size is 1.6 G and the small table file size is 15M 
> compressed (75M uncompressed),
> local  task can roughly hold that 1.5M unique rows in memory. 
> Roughly the dumped file size will be 150M, which is too large to put into the 
> distributed cache.
>  
> From experiments, we can basically conclude when the dumped file size is 
> smaller than 30M. 
> The distributed cache works well and all the mappers will  be initialized in 
> a short time (less than 30 secs).
> One easy implementation is to compress the hashtable file. 
> I use the gzip to compress the hashtable file and the file size is compressed 
> from 100M to 13M.
> After several tests, all the mappers will be initialized in less than 23 secs.
> But this solution adds some decompression overhead to each mapper.
> Mappers on the same machine will do the duplicated decompression work.
> Maybe in the future, we can let the distributed cache to support this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1797) Compressed the hashtable dump file before put into distributed cache

2010-11-17 Thread Liyin Tang (JIRA)
Compressed the hashtable dump file before put into distributed cache


 Key: HIVE-1797
 URL: https://issues.apache.org/jira/browse/HIVE-1797
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang


Clearly, the size of small table is the performance bottleneck for map join.
Because the size of the small table will affect the memory usage and dumped 
hashtable file.
That means there are 2 boundaries of the map join performance.
1)  The memory usage for local task and mapred task
2)  The dumped hashtable file size for distributed cache

The reason that test case in last email spends most of the execution time on 
initializing is because it hits the second boundary.
Since we have already bound the memory usage, one thing we can do is to let the 
performance never hits the secondary bound before it hits the first boundary.

Assuming the heap size is 1.6 G and the small table file size is 15M compressed 
(75M uncompressed),
local  task can roughly hold that 1.5M unique rows in memory. 
Roughly the dumped file size will be 150M, which is too large to put into the 
distributed cache.
 
>From experiments, we can basically conclude when the dumped file size is 
>smaller than 30M. 
The distributed cache works well and all the mappers will  be initialized in a 
short time (less than 30 secs).

One easy implementation is to compress the hashtable file. 
I use the gzip to compress the hashtable file and the file size is compressed 
from 100M to 13M.
After several tests, all the mappers will be initialized in less than 23 secs.

But this solution adds some decompression overhead to each mapper.
Mappers on the same machine will do the duplicated decompression work.
Maybe in the future, we can let the distributed cache to support this.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext

2010-11-17 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1785:
-

Attachment: hive-1785_4.patch

In this patch, I add the Hook interface over Pre/PostExecute and 
ExecuteWithHookContext interface.

In the future, user can only implements ExecuteWithHookContext instead of  
Pre/PostExecute.

Also it is compatible with old hooks.

> change Pre/Post Query Hooks to take in 1 parameter: HookContext
> ---
>
> Key: HIVE-1785
> URL: https://issues.apache.org/jira/browse/HIVE-1785
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1785_3.patch, hive-1785_4.patch, hive_1785_1.patch, 
> hive_1785_2.patch
>
>
> This way, it would be possible to add new parameters to the hooks without 
> changing the existing hooks.
> This will be a incompatible change, and all the hooks need to change to the 
> new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext

2010-11-17 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933096#action_12933096
 ] 

Liyin Tang commented on HIVE-1785:
--

How about adding one more layer over Pre/PostExecute interface, call it Hook.

So both ExecuteWithHookContext and Pre/PostExecute implements this Hook 
interface

During the run time, using reflection to see whether the hook is 
ExecuteWithHookContext  or Pre/PostExecute.




> change Pre/Post Query Hooks to take in 1 parameter: HookContext
> ---
>
> Key: HIVE-1785
> URL: https://issues.apache.org/jira/browse/HIVE-1785
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1785_3.patch, hive_1785_1.patch, hive_1785_2.patch
>
>
> This way, it would be possible to add new parameters to the hooks without 
> changing the existing hooks.
> This will be a incompatible change, and all the hooks need to change to the 
> new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext

2010-11-17 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1785:
-

Attachment: hive-1785_3.patch

In order to be compatible, we check whether the hook implements the interface, 
which runs with the hook context.
If not, just call the originally interface.

> change Pre/Post Query Hooks to take in 1 parameter: HookContext
> ---
>
> Key: HIVE-1785
> URL: https://issues.apache.org/jira/browse/HIVE-1785
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive-1785_3.patch, hive_1785_1.patch, hive_1785_2.patch
>
>
> This way, it would be possible to add new parameters to the hooks without 
> changing the existing hooks.
> This will be a incompatible change, and all the hooks need to change to the 
> new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-16 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1642:
-

Attachment: hive-1642_11.patch

When the local task runs out of memory, do NOT  print any thing out and just 
return from this process.
Because calling l4j to print will make it worse.

Sorry for so many minor changes in this afternoon.



> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1642_10.patch, hive-1642_11.patch, 
> hive-1642_5.patch, hive-1642_6.patch, hive-1642_7.patch, hive-1642_9.patch, 
> hive_1642_1.patch, hive_1642_2.patch, hive_1642_4.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-16 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1642:
-

Attachment: hive-1642_10.patch

After discussing, we think the function: replaceWithConditionalTask is not such 
general to be put int the Task Class.
So we move this function back to the CommonJoinResolver Class.



> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1642_10.patch, hive-1642_5.patch, 
> hive-1642_6.patch, hive-1642_7.patch, hive-1642_9.patch, hive_1642_1.patch, 
> hive_1642_2.patch, hive_1642_4.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-16 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1642:
-

Attachment: hive-1642_9.patch

some minor changes in ConditionalResolverCommonJoin.java


> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1642_5.patch, hive-1642_6.patch, hive-1642_7.patch, 
> hive-1642_9.patch, hive_1642_1.patch, hive_1642_2.patch, hive_1642_4.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-16 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1642:
-

Attachment: hive-1642_7.patch

In Task.java

public void replaceWithConditionalTask(ConditionalTask cndTsk, PhysicalContext 
physicalContext) {
// take care of parent tasks

...
// take care of children tasks
List> oldChildTasks = this.getChildTasks();
if (oldChildTasks != null) {
  for (Task tsk : cndTsk.getListTasks()) {
if (tsk.equals(this)) {
  // avoid redundantly add this task again 
  continue;
}
for (Task oldChild : oldChildTasks) {
  tsk.addDependentTask(oldChild);
}
  }
}


> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1642_5.patch, hive-1642_6.patch, hive-1642_7.patch, 
> hive_1642_1.patch, hive_1642_2.patch, hive_1642_4.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-16 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1642:
-

Attachment: hive-1642_6.patch

Remove the getBackupTask interface from all the Conditional Resolver


> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1642_5.patch, hive-1642_6.patch, hive_1642_1.patch, 
> hive_1642_2.patch, hive_1642_4.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-16 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1642:
-

Attachment: (was: hive-1642_5.patch)

> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1642_5.patch, hive_1642_1.patch, hive_1642_2.patch, 
> hive_1642_4.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-16 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1642:
-

Attachment: hive-1642_5.patch

Add more detailed description on configuration xml file
Revert the DriverContext.java, since there should be no change on this file.


> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1642_5.patch, hive_1642_1.patch, hive_1642_2.patch, 
> hive_1642_4.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-16 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1642:
-

Attachment: hive-1642_5.patch

Add more descriptions to the configuration files.
Revert the DriverContext.

> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive-1642_5.patch, hive_1642_1.patch, hive_1642_2.patch, 
> hive_1642_4.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext

2010-11-16 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932623#action_12932623
 ] 

Liyin Tang commented on HIVE-1785:
--

I generate the diff based on the Hive-1642.
Please ignore the irrelevant code and output file. 
Sorry for the inconvenient.

> change Pre/Post Query Hooks to take in 1 parameter: HookContext
> ---
>
> Key: HIVE-1785
> URL: https://issues.apache.org/jira/browse/HIVE-1785
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive_1785_1.patch, hive_1785_2.patch
>
>
> This way, it would be possible to add new parameters to the hooks without 
> changing the existing hooks.
> This will be a incompatible change, and all the hooks need to change to the 
> new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext

2010-11-16 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1785:
-

Attachment: hive_1785_2.patch

Thanks for John's comments.

Now in the Driver.java:
for (PostExecute peh : getPostExecHooks()) {
if (peh instanceof ExecuteWithHookContext) {
  ((ExecuteWithHookContext) peh).run(hookContext);
} else {
  peh.run(SessionState.get(), plan.getInputs(), plan.getOutputs(),
  (SessionState.get() != null ? 
SessionState.get().getLineageState().getLineageInfo()
  : null), ShimLoader.getHadoopShims().getUGIForConf(conf));
}
   }

Let's discuss about the interface of HookContext. How much information we 
should keep in the HookContext.
Right now, I keep the query plan, job conf and completed tasks. 


> change Pre/Post Query Hooks to take in 1 parameter: HookContext
> ---
>
> Key: HIVE-1785
> URL: https://issues.apache.org/jira/browse/HIVE-1785
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive_1785_1.patch, hive_1785_2.patch
>
>
> This way, it would be possible to add new parameters to the hooks without 
> changing the existing hooks.
> This will be a incompatible change, and all the hooks need to change to the 
> new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-16 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1642:
-

Attachment: hive_1642_4.patch

This patch formats the output of local task.

> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive_1642_1.patch, hive_1642_2.patch, hive_1642_4.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-16 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1642:
-

Attachment: hive_1642_2.patch

Thanks for the comments. 
I have updated the patch according to the review comments.



> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive_1642_1.patch, hive_1642_2.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-15 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932132#action_12932132
 ] 

Liyin Tang commented on HIVE-1642:
--

There are 2 kinds of backup. 1) task level 2) branch level. I think the way you 
mentioned above is the branch level. The conditional task maintains a tree, if 
one branch fails, then try another branch. 
I think, both of them is fine right now. But the branch level is more 
complicated to implement, because the back up task may not be a single task but 
a tree of tasks. The design goal is to replace one branch of task with another 
branch.

I think the problem right now is that there 2 tasks involved in MapJoin. Image 
that, 3 months ago, there is no map join local task. It will be very easy to 
implement this. Once the mapjoin task fails, we replace with the backup task. 
It is the task level backup. 

The problem is we split the map join task into 2 tasks.  But we can still 
logically argue that the local task is PART of the map reduce task.  Actually, 
they do come from the same task. That's why if it is the local task, we look 
ahead one more task. 

In the future, we may have more this kinds of situation, splitting one task 
into multiple tasks. Then we may need a loop here. Say if this task is split 
from other tasks, keep looking ahead.

Any other thoughts.

> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive_1642_1.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-14 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931995#action_12931995
 ] 

Liyin Tang commented on HIVE-1642:
--

Thanks for reviewing.

1. I will add these parameters in the config xml file.

2. By default hive.auto.convert.join = false right now, all the existing test 
cases won't be affected

3. I am also thinking about putting the backup task into task directly, which 
is the simplest way to implement this. My only concern is that it will take 
more than time de/serializing the task. 

4. I will remove this the print statement.
5. The same as point 3.
6. I will fix it, some svn synchronization problem.

7. Right now the back up task is generated during the execution time. That's 
why it is not easy to work with explain task. But if we put backup task into 
task directly, we can solve this problem. Also we should set the backup task 
during the compile time instead of execution time. The only cost is the task 
serialization time.

8. Because we need to reuse the code of MapJoinProcessor, which uses join tree 
and row resolver to generate the new map join operator. So each time when 
generating a new map join operator, we need a deep copy of join tree and op 
context. Several classes need to be Serializable.

9. I generated these test cases output by set the hive.auto.convert.join = 
false first, then reset the flag as true. So I can compare whether the result 
is correct or not. 
Since right now, the join result is correct, I can add explain into test case.

10.I will fix the conditional task to make it more generic.


> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive_1642_1.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1792) track the joins which are being converted to map-join automatically

2010-11-14 Thread Liyin Tang (JIRA)
track the joins which are being converted to map-join automatically
---

 Key: HIVE-1792
 URL: https://issues.apache.org/jira/browse/HIVE-1792
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0


We should be able to track how many queries (join) got converted to
map-join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1785) change Pre/Post Query Hooks to take in 1 parameter: HookContext

2010-11-14 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1785:
-

Attachment: hive_1785_1.patch

In this patch I have changed the interface of pre hoook and post hook. 
So all the hooks will take the HookContext as parameter.

In HookContext, it has the QueryPlan, HiveConf and a list of Completed Tasks.
It will be easier to extend HookContext in the future if more information the 
hook needs.

By the way, I generate diff based on Hive-1642( converting join into map join), 
assuming this patch will be committed after Hive-1642.
Please review :)


> change Pre/Post Query Hooks to take in 1 parameter: HookContext
> ---
>
> Key: HIVE-1785
> URL: https://issues.apache.org/jira/browse/HIVE-1785
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: hive_1785_1.patch
>
>
> This way, it would be possible to add new parameters to the hooks without 
> changing the existing hooks.
> This will be a incompatible change, and all the hooks need to change to the 
> new API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1688) In the MapJoinOperator, the code uses tag as alias, which is not always true

2010-11-13 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HIVE-1688.
--

   Resolution: Fixed
Fix Version/s: 0.7.0

> In the MapJoinOperator, the code uses tag as alias, which is not always true
> 
>
> Key: HIVE-1688
> URL: https://issues.apache.org/jira/browse/HIVE-1688
> Project: Hive
>  Issue Type: Bug
>  Components: Drivers
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In the MapJoinOperator and SMBMapJoinOperator, the code uses tag as alias, 
> which is not always true.
> Actually, alias = order[tag]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1688) In the MapJoinOperator, the code uses tag as alias, which is not always true

2010-11-13 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931710#action_12931710
 ] 

Liyin Tang commented on HIVE-1688:
--

This bug has been fixed in Hive-1641 earlier and that patch has been committed.

Thanks

> In the MapJoinOperator, the code uses tag as alias, which is not always true
> 
>
> Key: HIVE-1688
> URL: https://issues.apache.org/jira/browse/HIVE-1688
> Project: Hive
>  Issue Type: Bug
>  Components: Drivers
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In the MapJoinOperator and SMBMapJoinOperator, the code uses tag as alias, 
> which is not always true.
> Actually, alias = order[tag]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-13 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931705#action_12931705
 ] 

Liyin Tang commented on HIVE-1642:
--

In the case: A left outer join B right outer join C, A must be small table.

I have a test case: auto_join25.q to test the backup test. There are several 
query in this test case.

The idea is just set the  hive.hashtable.max.memory.usage = 0.1. It means 
if the local task uses more than 0.1% of memory, it will abort. Obviously, 
all local tasks will always fail in this task case. So the back up will run 
after the local task failed.




> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive_1642_1.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-12 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1642:
-

Attachment: hive_1642_1.patch

> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: hive_1642_1.patch
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-11-12 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931599#action_12931599
 ] 

Liyin Tang commented on HIVE-1642:
--

I just finished converting common join into map join based on the file size.  
There are 2 flags to control this optimization.
1)  set hive.auto.convert.join = true; It means this optimization is 
enabled. By default right now, this flag is disabled in order not to break any 
existing test cases. Also I put 25 additional test cases, auto_join0.q - 
auto_join25.q, which covers this optimization code.
2)  Set hive.hashtable.max.memory.usage = 0.9;  It means if the memory 
usage of local task is more than 90% of its heap size, then the local task will 
abort by itself. The Driver will know the local work fails and it won't submit 
the MapJoinTask (a Map Only MapRedTask)  to Hadoop, but instead, it will submit 
the originally CommonJoinTask to Hadoop to run.
3)  Set hive.smalltable.filesize = 2500L;  It means if the summary of 
the small table file size is less than 25M, then it will run the map join task. 
If not, just run the originally common join task.
 The following is the basic flow how it works. For each common join, create a 
conditional task.
1)  For each join table, generate a mapjoin task by assuming this table is 
big table. 
a.  The left side of right outer join must be small table.
b.  The right side of left outer join must be small table.
c.  No full outer join can be optimized. 
d.  Eg. A left outer join B right outer join C. Only C can be big table 
table.
e.  Eg. A right outer join B left outer join C. Only B can be big table 
table.
f.  Eg. A left outer join B left outer join C. Only A can be big table 
table.
g.  Eg. A right outer join B right outer join C. Both B and C can be big 
table table.
2)  Put all these generated map join tasks into conditional task and set 
the mapping between big table's alias with the corresponding map join task.
3)  During the execution time, the resolver will read the input file size. 
If the input file size of small table is less than a threshold, than run the 
converted map join task. 
4)  Set each map join task with a backup task. The backup task is the 
originally common join task.
This mapping relationship is set during execution time.
5)  If the map join task return abnormally, launch the backup task.



> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join

2010-11-11 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1754:
-

Attachment: hive-1754_9.patch

> Remove JDBM component from Map Join
> ---
>
> Key: HIVE-1754
> URL: https://issues.apache.org/jira/browse/HIVE-1754
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch, 
> hive-1754_4.patch, hive-1754_5.patch, hive-1754_7.patch, hive-1754_9.patch
>
>
> Right now, JDBM is the major performance bottleneck of performance.
> With the growth of the small table, the PUT and GET operation will take most 
> of execution time.
> Map Join is designed to load the data of small table into memory. 
> If the data is too large to hold in memory, then there is no need to use the 
> map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join

2010-11-10 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1754:
-

Attachment: hive-1754_7.patch

Change the code style according to the reviewer comments

> Remove JDBM component from Map Join
> ---
>
> Key: HIVE-1754
> URL: https://issues.apache.org/jira/browse/HIVE-1754
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch, 
> hive-1754_4.patch, hive-1754_5.patch, hive-1754_7.patch
>
>
> Right now, JDBM is the major performance bottleneck of performance.
> With the growth of the small table, the PUT and GET operation will take most 
> of execution time.
> Map Join is designed to load the data of small table into memory. 
> If the data is too large to hold in memory, then there is no need to use the 
> map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1647) Incorrect initialization of thread local variable inside IOContext ( implementation is not threadsafe )

2010-11-09 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HIVE-1647.
--

  Resolution: Fixed
Release Note: This problem is fixed in Hive-1754

> Incorrect initialization of thread local variable inside IOContext ( 
> implementation is not threadsafe ) 
> 
>
> Key: HIVE-1647
> URL: https://issues.apache.org/jira/browse/HIVE-1647
> Project: Hive
>  Issue Type: Bug
>  Components: Server Infrastructure
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Raman Grover
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: HIVE-1647.patch
>
>   Original Estimate: 0.17h
>  Remaining Estimate: 0.17h
>
> Bug in org.apache.hadoop.hive.ql.io.IOContext
> in relation to initialization of thread local variable.
>  
> public class IOContext {
>  
>   private static ThreadLocal threadLocal = new 
> ThreadLocal(){ };
>  
>   static {
> if (threadLocal.get() == null) {
>   threadLocal.set(new IOContext());
> }
>   }
>  
> In a multi-threaded environment, the thread that gets to load the class first 
> for the JVM (assuming threads share the classloader),
> gets to initialize itself correctly by executing the code in the static 
> block. Once the class is loaded, 
> any subsequent threads would  have their respective threadlocal variable as 
> null.  Since IOContext
> is set during initialization of HiveRecordReader, In a scenario where 
> multiple threads get to acquire
>  an instance of HiveRecordReader, it would result in a NPE for all but the 
> first thread that gets to load the class in the VM.
>  
> Is the above scenario of multiple threads initializing HiveRecordReader a 
> typical one ?  or we could just provide the following fix...
>  
>   private static ThreadLocal threadLocal = new 
> ThreadLocal(){
> protected synchronized IOContext initialValue() {
>   return new IOContext();
> }  
>   };

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1775) Assertation on inputObjInspectors.length in Groupy operator

2010-11-09 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HIVE-1775.
--

  Resolution: Fixed
Release Note: This bug is fixed in Hive-1754

> Assertation on inputObjInspectors.length in Groupy operator
> ---
>
> Key: HIVE-1775
> URL: https://issues.apache.org/jira/browse/HIVE-1775
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
>
> In the Groupby Operator:
> Line 188: assert (inputObjInspectors.length == 1); 
> But this assertion may not necessary true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join

2010-11-09 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1754:
-

Attachment: hive-1754_5.patch

This patch clears the conflicts of test output file completely.

> Remove JDBM component from Map Join
> ---
>
> Key: HIVE-1754
> URL: https://issues.apache.org/jira/browse/HIVE-1754
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch, 
> hive-1754_4.patch, hive-1754_5.patch
>
>
> Right now, JDBM is the major performance bottleneck of performance.
> With the growth of the small table, the PUT and GET operation will take most 
> of execution time.
> Map Join is designed to load the data of small table into memory. 
> If the data is too large to hold in memory, then there is no need to use the 
> map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join

2010-11-09 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1754:
-

Attachment: hive-1754_4.patch

Resolved all the output conflicts in this patch

> Remove JDBM component from Map Join
> ---
>
> Key: HIVE-1754
> URL: https://issues.apache.org/jira/browse/HIVE-1754
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch, 
> hive-1754_4.patch
>
>
> Right now, JDBM is the major performance bottleneck of performance.
> With the growth of the small table, the PUT and GET operation will take most 
> of execution time.
> Map Join is designed to load the data of small table into memory. 
> If the data is too large to hold in memory, then there is no need to use the 
> map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join

2010-11-08 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1754:
-

Attachment: Hive-1754_3.patch

Fix a bug when join value is null
Also fix the hive-1775

> Remove JDBM component from Map Join
> ---
>
> Key: HIVE-1754
> URL: https://issues.apache.org/jira/browse/HIVE-1754
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch
>
>
> Right now, JDBM is the major performance bottleneck of performance.
> With the growth of the small table, the PUT and GET operation will take most 
> of execution time.
> Map Join is designed to load the data of small table into memory. 
> If the data is too large to hold in memory, then there is no need to use the 
> map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1775) Assertation on inputObjInspectors.length in Groupy operator

2010-11-08 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929869#action_12929869
 ] 

Liyin Tang commented on HIVE-1775:
--

resolved in patch hive-1754

> Assertation on inputObjInspectors.length in Groupy operator
> ---
>
> Key: HIVE-1775
> URL: https://issues.apache.org/jira/browse/HIVE-1775
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
>
> In the Groupby Operator:
> Line 188: assert (inputObjInspectors.length == 1); 
> But this assertion may not necessary true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1775) Assertation on inputObjInspectors.length in Groupy operator

2010-11-07 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929480#action_12929480
 ] 

Liyin Tang commented on HIVE-1775:
--

Yes. I can just comment out this assertion.

> Assertation on inputObjInspectors.length in Groupy operator
> ---
>
> Key: HIVE-1775
> URL: https://issues.apache.org/jira/browse/HIVE-1775
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
>
> In the Groupby Operator:
> Line 188: assert (inputObjInspectors.length == 1); 
> But this assertion may not necessary true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1775) Assertation on inputObjInspectors.length in Groupy operator

2010-11-07 Thread Liyin Tang (JIRA)
Assertation on inputObjInspectors.length in Groupy operator
---

 Key: HIVE-1775
 URL: https://issues.apache.org/jira/browse/HIVE-1775
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0


In the Groupby Operator:
Line 188: assert (inputObjInspectors.length == 1); 
But this assertion may not necessary true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join

2010-11-04 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1754:
-

Attachment: Hive-1754_2.patch

Remove JDBM from Hive completely 

> Remove JDBM component from Map Join
> ---
>
> Key: HIVE-1754
> URL: https://issues.apache.org/jira/browse/HIVE-1754
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1754.patch, Hive-1754_2.patch
>
>
> Right now, JDBM is the major performance bottleneck of performance.
> With the growth of the small table, the PUT and GET operation will take most 
> of execution time.
> Map Join is designed to load the data of small table into memory. 
> If the data is too large to hold in memory, then there is no need to use the 
> map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1754) Remove JDBM component from Map Join

2010-11-04 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928262#action_12928262
 ] 

Liyin Tang commented on HIVE-1754:
--

This patch has some potential bugs. I will fix it today and upload a new one.

> Remove JDBM component from Map Join
> ---
>
> Key: HIVE-1754
> URL: https://issues.apache.org/jira/browse/HIVE-1754
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1754.patch
>
>
> Right now, JDBM is the major performance bottleneck of performance.
> With the growth of the small table, the PUT and GET operation will take most 
> of execution time.
> Map Join is designed to load the data of small table into memory. 
> If the data is too large to hold in memory, then there is no need to use the 
> map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join

2010-11-01 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1754:
-

Attachment: Hive-1754.patch

> Remove JDBM component from Map Join
> ---
>
> Key: HIVE-1754
> URL: https://issues.apache.org/jira/browse/HIVE-1754
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1754.patch
>
>
> Right now, JDBM is the major performance bottleneck of performance.
> With the growth of the small table, the PUT and GET operation will take most 
> of execution time.
> Map Join is designed to load the data of small table into memory. 
> If the data is too large to hold in memory, then there is no need to use the 
> map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join

2010-11-01 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1754:
-

Status: Patch Available  (was: Open)

This patch modifies the following things
1) Remove the JDBM from Hive
2) All the data in the small table will be stored in in-memory hashtable.
3) Create a light-weight RowContainer: MapJoinRowContainer.
4) Optimize MapJoinObjectKey. If there are only one join key or two join keys, 
it will use MapJoinSingleKey or MapJoinDoulbeKeys instead of MapJoinObjectKey.

> Remove JDBM component from Map Join
> ---
>
> Key: HIVE-1754
> URL: https://issues.apache.org/jira/browse/HIVE-1754
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1754.patch
>
>
> Right now, JDBM is the major performance bottleneck of performance.
> With the growth of the small table, the PUT and GET operation will take most 
> of execution time.
> Map Join is designed to load the data of small table into memory. 
> If the data is too large to hold in memory, then there is no need to use the 
> map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1702) optimize JDBM to make mapjoin faster

2010-11-01 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HIVE-1702.
--

  Resolution: Won't Fix
Release Note: JDBM will be removed from Hive

> optimize JDBM to make mapjoin faster
> 
>
> Key: HIVE-1702
> URL: https://issues.apache.org/jira/browse/HIVE-1702
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
>
> Htree.get() cost 70% total time. It could help a lot if there is bloom filter 
> here to avoid unneeded get() if we know for sure the given key is not in 
> JDBM. (we can generate the bloom filter when doing the jdbm sink, and read 
> into memory when doing read. )
> Copied from https://issues.apache.org/jira/browse/HIVE-1700

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1733) Make the bucket size of JDBM configurable

2010-11-01 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HIVE-1733.
--

  Resolution: Not A Problem
Release Note: The JDBM will be removed from Hive

> Make the bucket size of JDBM configurable 
> --
>
> Key: HIVE-1733
> URL: https://issues.apache.org/jira/browse/HIVE-1733
> Project: Hive
>  Issue Type: Task
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Liyin Tang
>Assignee: Liyin Tang
>
> Right now the bucket size of jdbm bucket is hard coded as 256.
> To better config and improve the performance of the jdbm component,
> it is necessary to make the bucket size configurable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1756) failures in fatal.q in TestNegativeCliDriver

2010-10-28 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1756:
-

Attachment: Hive-1756.patch

remove fatal.q

> failures in fatal.q in TestNegativeCliDriver
> 
>
> Key: HIVE-1756
> URL: https://issues.apache.org/jira/browse/HIVE-1756
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Attachments: Hive-1756.patch
>
>
> This is probably caused by HIVE-1641

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1757) test cleanup for Hive-1641

2010-10-27 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1757:
-

Attachment: Hive-1757.patch

Remove some unnecessary print out statements

> test cleanup for Hive-1641
> --
>
> Key: HIVE-1757
> URL: https://issues.apache.org/jira/browse/HIVE-1757
> Project: Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: Liyin Tang
> Attachments: Hive-1757.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1641) add map joined table to distributed cache

2010-10-27 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1641:
-

Attachment: Hive-1641(6).patch

new diff about removing some print and log statements

> add map joined table to distributed cache
> -
>
> Key: HIVE-1641
> URL: https://issues.apache.org/jira/browse/HIVE-1641
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1641(3).txt, Hive-1641(4).patch, 
> Hive-1641(5).patch, Hive-1641(6).patch, Hive-1641.patch
>
>
> Currently, the mappers directly read the map-joined table from HDFS, which 
> makes it difficult to scale.
> We end up getting lots of timeouts once the number of mappers are beyond a 
> few thousand, due to 
> concurrent mappers.
> It would be good idea to put the mapped file into distributed cache and read 
> from there instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1756) failures in fatal.q in TestNegativeCliDriver

2010-10-27 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925628#action_12925628
 ] 

Liyin Tang commented on HIVE-1756:
--

The fatal.q is :
set hive.mapjoin.maxsize=1;
set hive.task.progress=true;
select /*+ mapjoin(b) */ * from src a join src b on (a.key=b.key);

But right now, there is no max size for map join, so the MapRed task returns 
normally(0). So junit fails this test query.
Shall I support the parameter max size or just skip this test case?


> failures in fatal.q in TestNegativeCliDriver
> 
>
> Key: HIVE-1756
> URL: https://issues.apache.org/jira/browse/HIVE-1756
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
>
> This is probably caused by HIVE-1641

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1723) The result of left semi join is not correct

2010-10-26 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HIVE-1723.
--

  Resolution: Fixed
Release Note: This bug is resolved in Hive-1641

This bug is resolved in Hive-1641

> The result of left semi join is not correct
> ---
>
> Key: HIVE-1723
> URL: https://issues.apache.org/jira/browse/HIVE-1723
> Project: Hive
>  Issue Type: Bug
>Reporter: Liyin Tang
>Assignee: Liyin Tang
>
> In the test case semijoin.q, there is a query:
> select /*+ mapjoin(b) */ a.key from t3 a left semi join t1 b on a.key = b.key 
> sort by a.key;
> I think this query will return a wrong result if table t1 is larger than 
> 25000 different keys
> To be simple, I tried a very similar query:
> select /*+ mapjoin(b) */ a.key from test_semijoin a left semi join 
> test_semijoin b on a.key = b.key sort by a.key;
> The table of test_semijoin is like
> 0 0
> 1 1
> 2 2
> 3 3
> 4 4
> 5 5
> ......
> ...  
> 25000   25000
> 25001   25001
> ...  
> ...  
> 25999   25999
> 26000   26000
> So we can easily estimate the correct result of this query should be the same 
> keys from table test_semijoin itsel.
> Actually, the result is only part of that: only from 0 to 24544.
> 0
> 1
> 2
> ..
> ..
> 24543
> 24544

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   >